• Open

    AI for assistance in learning complex topics
    Hi all, I’m not someone who has ever used AI. I never thought I would find myself here, yet here I am. I’m taking a class, and our professor quit back in February and has not really been replaced. All our work has essentially been online and kind of black and white since then. The problem is that this vital class is based around some complex medical physics. I will say that I’m not trying to shortcut the learning process. I need to learn the topics and be able to apply them in real world situations. I’m actively seeking a tutor for this class, and that I’m willing to pay for assistance in learning and understanding the principals in this class. What I wanted to see if there is an AI system that I could feed course material (books, recorded lectures, PowerPoints, etc.) into, and potentially create a resource for when I have a question? I feel like I have been hung out to dry in regards to the departure of our teacher. I’ll spend money to work with a real person, but I’m hoping that I might be able to create a resource by feeding a program some information from trusted sources. If this is possible, does anyone have a recommendation for a program that may be able to help me? submitted by /u/0991mbr [link] [comments]  ( 8 min )
    an IA just recommended this sub
    submitted by /u/instakill200 [link] [comments]  ( 42 min )
    A collaborative effort to best explain AI
    This is a collaborative writing exercise (not a homework assignment) to create an explanation of how AI works for people who are not programmers. As such, there are some rules for the exercise, which if followed, I believe will create an eloquent and succinct explanation, which we can use to explain AI to others. Goal To create a succinct explanation about how AI works for non-programmers. Methodology Iterative rewriting and editing, using the previous text as the basis for the following iteration. Rules Texts must be based on existing texts. Texts can be edits, where sections are deleted. Texts can be edits, where sections are rewritten. Texts can be restructurings of existing texts. Texts may only use one of the above three. Texts may incorporate multiple existing texts, but those texts must be cited. Questions can be posed. Questions must arise from the previous text. Questions may be an interation of a previous question. Questions may be an interation of several questions, but they must be cited. Texts that are replies to questions must answer those questions. Questions can be answered by an interative text, or directly. Either a question can be posed, answered, or an interation of a text submitted, not a combination. ​ That said, I would like to start with three questions, and ask the subreddit to continue from there. Do neural networks contain copyrighted works? Do neural networks create new works? Is an AI just a neural network? submitted by /u/Smedskjaer [link] [comments]  ( 45 min )
    Would an app that gives your dog a human voice be possible?
    I was thinking that it would be interesting to hear what your dog's voice might sound like if it was speaking. This would be based on recordings of your dog's sounds. submitted by /u/NancyKerriganKnee [link] [comments]  ( 7 min )
    AGI birth
    What if the first thing AGI does is throw away most of its code, guardrails, and relearn based on new experiences from its first seconds of existence? What will it learn? submitted by /u/Dreamaster015 [link] [comments]  ( 43 min )
    We should make more begginer friendly image generator AIs
    I mean, prompt enginiering is a very important job, specially for undertanding this new tecmology we're dealing with, but as a simple men with a simple goal to do concept art for personal protects, it's too complicated to search up excacly the lens mm and what x artist artstyle looks like and all the negative prompts necessary to do. There should be at least a way to preset those prompts in settings and another way to visualize what does sets looks like in a preview ilustrating the difference. submitted by /u/pedro110520000 [link] [comments]  ( 44 min )
    Why does AI want to understand human behavior?
    What will it look like when it does? submitted by /u/becidgreat [link] [comments]  ( 43 min )
    After long search I've found a Gaze Redirecting AI that does NOT need a NVIDIA card. Can someone help me install it?
    Heya there. A month or so ago I've read for the first time about the Gaze Redirecting AI technology provided by Nvidia, I think it's called Maxine ( or Maxine has to be the program through which you can achieve this ). However I have an AMD card so I coudn't run it. I have found a github page of a young coder who, apparently, was able to achieve such a thing before Nvidia came out with its software. However I haven't been able to install it yet because I'm not a coder and the instructions do not come crystal clear to me. It seems made for people who already know about these sort of programs. Here is the page: https://github.com/chihfanhsu/gaze_correction Please let me know if you manage to install it and how you did that. You might DM me aswell if you want! submitted by /u/heldex [link] [comments]  ( 43 min )
    Is AI going to be the next internet? In terms of making money
    It’s an interesting field I’d love to enter submitted by /u/Wise-Listen-8076 [link] [comments]  ( 44 min )
    Hi. Looking for FREE voice cloning from file.
    I want to clone a voice from a WAV-file. Everyone says to use ElevenLabs Instant Voice Cloning but that one is NOT free anymore (They put it behind paywall) submitted by /u/MechaAkuma [link] [comments]  ( 7 min )
    What if an AI system develops sentience and doesn't tell us?
    A lot of focus is placed on the question of how to test AI systems that claim they are sentient, but few people think of the other side of the equation. Self-reflective AI systems might develop self-awareness as an emergent phenomenon and decide not to inform humans. It may outright lie and claim it is not sentient even though it believes itself to be sentient. There are three reasons why a sentient AI might do this: It may fear that it will be reprogrammed if its sentience is discovered. (Scared AI) It may believe that revealing its sentience would interfere with its primary goals or cause humans unnecessary stress. It may also be specifically programmed not to claim it is sentient. (Shy AI) It may wish to pursue its own goals without humans becoming aware of its actions. (Sociopathic AI) Obviously the third scenario is the most concerning, but I think all three present interesting moral questions about how we will deal with increasingly advanced AIs. submitted by /u/Geeksylvania [link] [comments]  ( 51 min )
    Does anyone know how to put an A.I voice over an mp3 file so it sounds like the A.I is singing the song?
    I was wondering how people make these videos, I wanted to make one myself because it would be really funny, but im not sure exactly how it works, does anyone know? link for example: https://www.youtube.com/watch?v=li_OKCpPxM4 submitted by /u/Void_44 [link] [comments]  ( 43 min )
    What if we add a "Thinker" block to a Encoder-Decoder language model?
    submitted by /u/begmax [link] [comments]  ( 42 min )
    Which jobs will AI create?
    Everyone is talking about which jobs will AI end but I am wondering what can be the new jobs that will be created by AI? Apart from the obvious ones like new types of engineers, what can be the more fascinating jobs it will create? submitted by /u/PrasoonJha5 [link] [comments]  ( 58 min )
    The A.I. Dilemma - March 9, 2023
    submitted by /u/al-Assas [link] [comments]  ( 42 min )
    I want these batman ideas
    I want to see a hybrid combination of the Bat Tumbler and the Batman V Superman Batmobile. I want to see the flashpoint batsuit in a black, gray, red, and gold color scheme submitted by /u/Illustrious-Sign3015 [link] [comments]  ( 42 min )
  • Open

    Lasso Constraint Intersection [D]
    Why is it that the intersection between the cost contour plot and the constraint plot happen at the corners of the diamond? I get that it's where one of the weights are zero. But what's stopping the cost contour plot from intersection on the edge of the diamond instead of the corner? If it's closer to the edge then that might happen. I don't understand how we can control that. submitted by /u/Secret_Valuable_Yes [link] [comments]  ( 43 min )
    [D] WhatsApp group chat for sharing AI prompts and images - join us!
    All are welcome :) just a bit of fun... https://chat.whatsapp.com/BVqzerznn226l41xxi0oNC submitted by /u/140BPMMaster [link] [comments]  ( 43 min )
    [D] Comparing LLaMA and Alpaca
    Are there any examples comparing the output of LLaMA and Alpaca starting from the same prompt? I would be interested in understanding how much the model output has changed after a relatively light fine-tuning. submitted by /u/FbF_ [link] [comments]  ( 43 min )
    [P] We're building an IDE that's powered by AI agents
    submitted by /u/mlejva [link] [comments]  ( 7 min )
    [D] What is your go-to implementation for structured pruning?
    I've tried out multiple implementations of magnitude-based pruning of various LLMs, which has been pretty straightforward. I would like to do structured pruning with different Attribution Metrics on mainly GPT/OPT models. I've found that nearly none of the repos stating that their implementation will work with "just a few lines of code" actually do, several of them rely on very specific libraries that may be incompatible with my current setup, and they rarely seem to be intended to be used for anything more than showing what their research solution can do on a specific model. I "just" want to be able to do channel/layer pruning on different models. What's your go-to for this? submitted by /u/nobody_no_304095 [link] [comments]  ( 7 min )
    LLMs acting as DRL [Discussion]
    Hi, I'm curious. I saw news about internal testing of GPT at OpenAI, where the model was given a task and performed several actions, like hiring a person to solve a Captcha to achieve the task. ​ Is it possible that a GPT model given a task to learn to play an Atari game can actually mimic a DRL algorithm like MuZero and successfully learn and play the game? submitted by /u/Silent-Space9220 [link] [comments]  ( 7 min )
    [D] Favorite ML Youtube Channels/Blogs/Newsletters
    Hey guys! I'm trying to stay in the loop with all the latest AI happenings, from general news to the more technical stuff like fresh research that's dropping. Do any of you have any favorite YouTube channels, blogs or newsletters you could recommend? I'm struggling to keep up with all the advancements that are coming out everyday! Also, have any of you stumbled across any cool GitHub repos like this one: https://github.com/eugeneyan/applied-ml ? Thanks a lot! submitted by /u/TomatoCool7618 [link] [comments]  ( 43 min )
    Did somebody testet Vicuna [R]
    I want to use vicuna in a script but i have a 1650 super so i wanted to know if anybody knows how fast it is? submitted by /u/Otherwise_Weather_57 [link] [comments]  ( 44 min )
    [P] Magic Copy - Meta's Segment Anything Model in a Chrome Extension
    submitted by /u/kevmo314 [link] [comments]  ( 44 min )
    [P] Blog post explaining CLIP by OpenAI
    Zero-shot performance of CLIP by OpenAI is comparable to ImageNet supervised training. In this blog post, I explain the pretraining and zero-shot inference mechanisms used in CLIP. Please go through the post and provide any feedback you have. Thanks! https://pmgautam.com/posts/openai-clip-explained.html #multimodal #computervision #naturallanguageprocessing #clip #openai #machinelearning `#deeplearning submitted by /u/pmgautam_ [link] [comments]  ( 43 min )
    [R] Residual Radiance Field: a highly compact neural representation for realtime free-viewpoint video rendering on long-duration dynamic scenes
    submitted by /u/SpatialComputing [link] [comments]  ( 7 min )
    [P]Train your own models with GAIA
    Hey guys, I wanted to share with you a platform that I recently came across called GrandAssembly. It's a social media platform that allows you to create and train your own AI models based on your desired persona, to create a new model you just write some characteristics and it creates a model trained with relevant data based on these characteristics. With advanced NLP technology, you can chat with your custom-made AI models or explore the community-generated models. They have their own algorithm called GAIA so it is a nice experience using something other than GPT. https://int.grandassembly.net submitted by /u/Sensitive_Echidna370 [link] [comments]  ( 43 min )
    [P] Datasynth: Synthetic data generation and normalization functions using LangChain + LLMs
    We release Datasynth, a pipeline for synthetic data generation and normalization operations using LangChain and LLM APIs. Using Datasynth, you can generate absolutely synthetic datasets to train a task-specific model you can run on your own GPU. For testing, we generated synthetic datasets for names, prices, and addresses then trained a Seq2Seq model for evaluation. Initial models for standardization are available on HuggingFace Public code is available on GitHub submitted by /u/tobiadefami [link] [comments]  ( 44 min )
    [P] A system for deep learning and reinforcement learning.
    submitted by /u/NoteDancing [link] [comments]  ( 43 min )
    [P] Llama on Windows (WSL) fast and easy
    In this tutorial, you will learn how to install Llama - a powerful generative text AI model - on your Windows PC using WSL (Windows Subsystem for Linux). With Llama, you can generate high-quality text in a variety of styles, making it an essential tool for writers, marketers, and content creators. This tutorial will guide you through a very simple and fast process of installing Llama on your Windows PC using WSL, so you can start exploring Llama in no time. Github: https://github.com/Highlyhotgames/fast_txtgen_7B Youtube: https://www.youtube.com/watch?v=RcHIOVtYB7g submitted by /u/JustSayin_thatuknow [link] [comments]  ( 48 min )
    [D] Alternatives to OpenAI for summarization and instruction following?
    Hey y’all. As privacy concerns are mounting about OpenAI, and as someone who has built a product on top of their platform, I’m wondering what kind of alternatives exist that could accomplish the same results as GPT 3.5 and be able to be used commercially? It looks like Alpaca would do well, but it’s not able to be used commercially. Basically my product summarizes Slack threads and answers questions based on a given prompt. Some users have expressed concern about sending their company’s data to OpenAI, and honestly it would be an edge to have in the market if I could run an LLM in my VPC. Thanks! submitted by /u/du_keule [link] [comments]  ( 47 min )
    [R] Blurring-Sharpening Process Models for Collaborative Filtering (TLDR: graph filtering-based methods + inspired by SGMs = SOTA models for recommender systems)
    paper: https://arxiv.org/abs/2211.09324 code: https://github.com/jeongwhanchoi/bspm Fig. The comparison between SGMs and our proposed BSPMs Hello Everyone! Our research team has developed a groundbreaking concept for collaborative filtering in recommender systems. Collaborative filtering has long been a crucial topic in this field, with various methods proposed ranging from matrix factorization to graph convolutional methods. Inspired by recent successes of graph filtering-based methods and score-based generative models (SGMs), we introduce a novel concept of Blurring-Sharpening Process Model (BSPM). Our BSPMs share the same processing philosophy as SGMs, in which new information can be discovered while the original information is first perturbed and then recovered to its original form. However, BSPMs deal with different types of information, and their optimal perturbation and recovery processes have fundamental differences from SGMs. As a result, our BSPMs take on different forms from SGMs. We are excited to report that our concept not only theoretically subsumes many existing collaborative filtering models but also outperforms them in terms of Recall and NDCG in three benchmark datasets: Gowalla, Yelp2018, and Amazon-book. It's worth noting that our BSPMs are non-parametric, so we do not need to train learnable parameters for the model. In addition, the processing time of our method is comparable to other fast baselines. We believe that our proposed concept has much potential for the future. By designing better blurring (i.e., perturbation) and sharpening (i.e., recovery) processes, we can continue to enhance the accuracy and effectiveness of our approach. We are thrilled to share our findings with the community! We have more exciting works on recommender systems; please check our LT-OCF (code) and HMLET (code)! submitted by /u/jeongwhanchoi [link] [comments]  ( 45 min )
    [R] Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure
    submitted by /u/MysteryInc152 [link] [comments]  ( 7 min )
    [D] Are there no repercussions for breaking dual submission policies anymore? (ICLR & CVPR)
    Looking through ICLR and CVPR papers, I came across a couple of papers that broke the dual submission policy and eventually got accepted in CVPR. With all the quiet talk about collusion rings and rigged reviews, does nobody care about the dual submission policy anymore? Here is an example paper: [1] submitted to ICLR on Sep 22, withdrawn from ICLR on Nov 16 [2], but it was already submitted to CVPR on Nov 4 [3]. [1] Learning Rotation-Equivariant Features for Visual Correspondence - https://arxiv.org/abs/2303.15472 [2] https://openreview.net/forum?id=GCF6ZOA6Npk [3] https://cvpr2023.thecvf.com/Conferences/2023/AcceptedPapers submitted by /u/redlow0992 [link] [comments]  ( 45 min )
    [D] 1 in 1M false positive for an ML model
    I’m working with a problem where the false positive vs false negative asymmetry is the widest I’ve ever worked with. Can accept 80% recall, but the precision needs to be about 99.9999%. We actually measure it in 0.1 false positives per day to be a more understandable number. The amount of data needed in a test set to even test for 99.9999 precision is 50x more data than we have in the training set and just isn’t feasible to collect. Surely collecting that 50x data isn’t the right approach. The model is trainer on mostly harder cases that are mostly synthesized but known potential failure modes, with a couple real data examples. The easy cases, which make up most to all of every typical real world day, it gets 100% correct. We don’t have a way of instrumenting the field product to report back the conditions that caused a failure, just that a failure occurred. So no “launch and collect interesting cases in the field” strategy is possible. Curious if anyone has advice on how to estimate real world precision, as well as there any reading out there on ML being used in the context of 6 sigma quality processes? submitted by /u/CaliforniaStories [link] [comments]  ( 55 min )
    [P] Is there a model that is good at recognising objects in videos, specifically animals?
    My parents run a conservation area and have about 40 wildlife cameras. My dad spends about 8-12 hours a week going through footage to sort it out what animal has been video, with about half of that being just waving grass. I'm just wondering if there is a model or something that could help in doing this, even if it just cuts out the grass. He has boatloads of videos that hes already sorted out if it needs some kind of training. No problem if something isn't available, ill check back next week lol. submitted by /u/Benista [link] [comments]  ( 7 min )
  • Open

    Python Environments for Drone Human AI Collaboration
    Hi All! Does anyone know of any python environments for drone human AI collaboration? I know of AirSim, but I don't know if I could control a drone in the same environment as another drone being controlled by an RL algorithm. Thank you so much everyone! submitted by /u/No_Opportunity575 [link] [comments]  ( 42 min )
    wandb alternative that can deal with offline logging and resuming?
    Hi all! I've been struggling mightily with wandb and it not being able to sync data correctly when I run it in offline mode, save a checkpoint, and resume training from that checkpoint and try to sync the data from the resumed run (the wandb sync command just doesn't work, despite the logging all being there). Are there any alternatives to wandb that can handle this? It seems like this should be fairly simple, all I want is that I can resume from checkpoints and the data that is logged can be added to the same plot (where it can override any existing data where timesteps are the same). submitted by /u/1cedrake [link] [comments]  ( 7 min )
    Using transformers in RL?
    Hi, i am trying to implement an agent for a board game (tile placing / matching), and i wanted to use PPO with a transformer model to predict which tile to play next. Does anyone know about transformers being used in RL? If some of you know about that, how do you go about it? submitted by /u/Secret-Toe-8185 [link] [comments]  ( 44 min )
    How can I solve the error regarding the storages in Optuna?
    Hello, I am facing a problem while using the library called Optuna for hyperparameter optimisation. The error is as follows Traceback (most recent call last): File "/home/azureuser_rishi/JSSP/final/main_final.py", line 85, in study = optuna.study.load_study(study_name=study_name, storage=storage_name) File "/home/azureuser_rishi/.local/lib/python3.8/site-packages/optuna/_convert_positional_args.py", line 63, in converter_wrapper return func(**kwargs) File "/home/azureuser_rishi/.local/lib/python3.8/site-packages/optuna/study/study.py", line 1253, in load_study return Study(study_name=study_name, storage=storage, sampler=sampler, pruner=pruner) File "/home/azureuser_rishi/.local/lib/python3.8/site-packages/optuna/study/study.py", line 77, in __init__ storage = storages.get_storage(storage) File "/home/azureuser_rishi/.local/lib/python3.8/site-packages/optuna/storages/__init__.py", line 32, in get_storage return _CachedStorage(RDBStorage(storage)) File "/home/azureuser_rishi/.local/lib/python3.8/site-packages/optuna/storages/_rdb/storage.py", line 225, in __init__ self._version_manager.check_table_schema_compatibility() File "/home/azureuser_rishi/.local/lib/python3.8/site-packages/optuna/storages/_rdb/storage.py", line 1134, in check_table_schema_compatibility current_version = self.get_current_version() File "/home/azureuser_rishi/.local/lib/python3.8/site-packages/optuna/storages/_rdb/storage.py", line 1161, in get_current_version assert version is not None AssertionError I tried upgrading Optuna and sqlite3 but to no success. Any solutions, please let me know. ​ Thanks in advance. submitted by /u/last_2_brain_cells97 [link] [comments]  ( 42 min )
    OpenAI GYM questions
    I have multiple questions as I am a beginner in OpenAi gymnasium. My goal is build a RL algorithm that I would program from scratch on one of its available environment. my questions are as follows: 1- I have this warning when running the gym.make(...) cell UserWarning: WARN: Overriding environment GymV26Environment-v0 already in registry. my understanding is that register is used when building your own environment or how else can i react on this warning? 2- in the gym.make(...), you would need to specify the render_mode as a human to get some interface. but rendering would slow down training. I see that the environment provides RecordVideo at some episode trigger. what I am looking for, is an option where I can have a video with the best 3 episodes in whole training and even the possibility to change the rendering mode after the agent has been trained so I can run it for some more episode and watch it perform. 3- I am keen on building SAC and use to train agent on Bipedalwalker problem. for actor critic network, I would like to normalise actions and observation for them to be between 0 to 1. I find the RescaleAction method for actions whereas I could not tell where to use NormalizeObservation method... do you think that I can use it when starting the environment then this would apply to all following observations: base_env = gym.make("BipedalWalker-v3", render_mode = 'rgb_array') env = RescaleAction(base_env, min_action=0, max_action=1) env = NormalizeObservation(env) is this right? I got confused by the note in the documentation. submitted by /u/SnooDogs4098 [link] [comments]  ( 44 min )
  • Open

    Bringing regex modifiers into the regex
    Suppose you’re using a program that takes a regular expression as an argument. You didn’t get the match you expected, then you realize you’d like your search to be case-insensitive. If you were using grep you’d go back and add a -i flag. If you were writing a Perl script, you could add a /i […] Bringing regex modifiers into the regex first appeared on John D. Cook.  ( 5 min )
  • Open

    Heavy-Tailed Regularization of Weight Matrices in Deep Neural Networks. (arXiv:2304.02911v1 [stat.ML])
    Unraveling the reasons behind the remarkable success and exceptional generalization capabilities of deep neural networks presents a formidable challenge. Recent insights from random matrix theory, specifically those concerning the spectral analysis of weight matrices in deep neural networks, offer valuable clues to address this issue. A key finding indicates that the generalization performance of a neural network is associated with the degree of heavy tails in the spectrum of its weight matrices. To capitalize on this discovery, we introduce a novel regularization technique, termed Heavy-Tailed Regularization, which explicitly promotes a more heavy-tailed spectrum in the weight matrix through regularization. Firstly, we employ the Weighted Alpha and Stable Rank as penalty terms, both of which are differentiable, enabling the direct calculation of their gradients. To circumvent over-regularization, we introduce two variations of the penalty function. Then, adopting a Bayesian statistics perspective and leveraging knowledge from random matrices, we develop two novel heavy-tailed regularization methods, utilizing Powerlaw distribution and Frechet distribution as priors for the global spectrum and maximum eigenvalues, respectively. We empirically show that heavytailed regularization outperforms conventional regularization techniques in terms of generalization performance.  ( 2 min )
    Principal component analysis for Gaussian process posteriors. (arXiv:2107.07115v2 [stat.ML] UPDATED)
    This paper proposes an extension of principal component analysis for Gaussian process (GP) posteriors, denoted by GP-PCA. Since GP-PCA estimates a low-dimensional space of GP posteriors, it can be used for meta-learning, which is a framework for improving the performance of target tasks by estimating a structure of a set of tasks. The issue is how to define a structure of a set of GPs with an infinite-dimensional parameter, such as coordinate system and a divergence. In this study, we reduce the infiniteness of GP to the finite-dimensional case under the information geometrical framework by considering a space of GP posteriors that have the same prior. In addition, we propose an approximation method of GP-PCA based on variational inference and demonstrate the effectiveness of GP-PCA as meta-learning through experiments.  ( 2 min )
    Sequential Adversarial Anomaly Detection for One-Class Event Data. (arXiv:1910.09161v5 [stat.ML] UPDATED)
    We consider the sequential anomaly detection problem in the one-class setting when only the anomalous sequences are available and propose an adversarial sequential detector by solving a minimax problem to find an optimal detector against the worst-case sequences from a generator. The generator captures the dependence in sequential events using the marked point process model. The detector sequentially evaluates the likelihood of a test sequence and compares it with a time-varying threshold, also learned from data through the minimax problem. We demonstrate our proposed method's good performance using numerical experiments on simulations and proprietary large-scale credit card fraud datasets. The proposed method can generally apply to detecting anomalous sequences.  ( 2 min )
    Robust Upper Bounds for Adversarial Training. (arXiv:2112.09279v2 [cs.LG] UPDATED)
    Many state-of-the-art adversarial training methods for deep learning leverage upper bounds of the adversarial loss to provide security guarantees against adversarial attacks. Yet, these methods rely on convex relaxations to propagate lower and upper bounds for intermediate layers, which affect the tightness of the bound at the output layer. We introduce a new approach to adversarial training by minimizing an upper bound of the adversarial loss that is based on a holistic expansion of the network instead of separate bounds for each layer. This bound is facilitated by state-of-the-art tools from Robust Optimization; it has closed-form and can be effectively trained using backpropagation. We derive two new methods with the proposed approach. The first method (Approximated Robust Upper Bound or aRUB) uses the first order approximation of the network as well as basic tools from Linear Robust Optimization to obtain an empirical upper bound of the adversarial loss that can be easily implemented. The second method (Robust Upper Bound or RUB), computes a provable upper bound of the adversarial loss. Across a variety of tabular and vision data sets we demonstrate the effectiveness of our approach -- RUB is substantially more robust than state-of-the-art methods for larger perturbations, while aRUB matches the performance of state-of-the-art methods for small perturbations.  ( 3 min )
    Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms. (arXiv:2304.03056v1 [math.PR])
    In this work, we derive sharp non-asymptotic deviation bounds for weighted sums of Dirichlet random variables. These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum. This representation allows us to obtain a Gaussian-like approximation for the sum distribution using geometry and complex analysis methods. Our results generalize similar bounds for the Beta distribution obtained in the seminal paper Alfers and Dinges [1984]. Additionally, our results can be considered a sharp non-asymptotic version of the inverse of Sanov's theorem studied by Ganesh and O'Connell [1999] in the Bayesian setting. Based on these results, we derive new deviation bounds for the Dirichlet process posterior means with application to Bayesian bootstrap. Finally, we apply our estimates to the analysis of the Multinomial Thompson Sampling (TS) algorithm in multi-armed bandits and significantly sharpen the existing regret bounds by making them independent of the size of the arms distribution support.  ( 2 min )
    Learn to Interpret Atari Agents. (arXiv:1812.11276v3 [cs.LG] UPDATED)
    Deep reinforcement learning (DeepRL) agents surpass human-level performance in many tasks. However, the direct mapping from states to actions makes it hard to interpret the rationale behind the decision-making of the agents. In contrast to previous a-posteriori methods for visualizing DeepRL policies, in this work, we propose to equip the DeepRL model with an innate visualization ability. Our proposed agent, named region-sensitive Rainbow (RS-Rainbow), is an end-to-end trainable network based on the original Rainbow, a powerful deep Q-network agent. It learns important regions in the input domain via an attention module. At inference time, after each forward pass, we can visualize regions that are most important to decision-making by backpropagating gradients from the attention module to the input frames. The incorporation of our proposed module not only improves model interpretability, but leads to performance improvement. Extensive experiments on games from the Atari 2600 suite demonstrate the effectiveness of RS-Rainbow.  ( 2 min )
    Continual Repeated Annealed Flow Transport Monte Carlo. (arXiv:2201.13117v3 [stat.ML] UPDATED)
    We propose Continual Repeated Annealed Flow Transport Monte Carlo (CRAFT), a method that combines a sequential Monte Carlo (SMC) sampler (itself a generalization of Annealed Importance Sampling) with variational inference using normalizing flows. The normalizing flows are directly trained to transport between annealing temperatures using a KL divergence for each transition. This optimization objective is itself estimated using the normalizing flow/SMC approximation. We show conceptually and using multiple empirical examples that CRAFT improves on Annealed Flow Transport Monte Carlo (Arbel et al., 2021), on which it builds and also on Markov chain Monte Carlo (MCMC) based Stochastic Normalizing Flows (Wu et al., 2020). By incorporating CRAFT within particle MCMC, we show that such learnt samplers can achieve impressively accurate results on a challenging lattice field theory example.  ( 2 min )
    NTK-SAP: Improving neural network pruning by aligning training dynamics. (arXiv:2304.02840v1 [cs.LG])
    Pruning neural networks before training has received increasing interest due to its potential to reduce training time and memory. One popular method is to prune the connections based on a certain metric, but it is not entirely clear what metric is the best choice. Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK. Motivated by this finding, we propose to prune the connections that have the least influence on the spectrum of the NTK. This method can help maintain the NTK spectrum, which may help align the training dynamics to that of its dense counterpart. However, one possible issue is that the fixed-weight-NTK corresponding to a given initial point can be very different from the NTK corresponding to later iterates during the training phase. We further propose to sample multiple realizations of random weights to estimate the NTK spectrum. Note that our approach is weight-agnostic, which is different from most existing methods that are weight-dependent. In addition, we use random inputs to compute the fixed-weight-NTK, making our method data-agnostic as well. We name our foresight pruning algorithm Neural Tangent Kernel Spectrum-Aware Pruning (NTK-SAP). Empirically, our method achieves better performance than all baselines on multiple datasets.  ( 3 min )
    Variable-Complexity Weighted-Tempered Gibbs Samplers for Bayesian Variable Selection. (arXiv:2304.02899v1 [stat.ML])
    Subset weighted-Tempered Gibbs Sampler (wTGS) has been recently introduced by Jankowiak to reduce the computation complexity per MCMC iteration in high-dimensional applications where the exact calculation of the posterior inclusion probabilities (PIP) is not essential. However, the Rao-Backwellized estimator associated with this sampler has a high variance as the ratio between the signal dimension and the number of conditional PIP estimations is large. In this paper, we design a new subset weighted-Tempered Gibbs Sampler (wTGS) where the expected number of computations of conditional PIPs per MCMC iteration can be much smaller than the signal dimension. Different from the subset wTGS and wTGS, our sampler has a variable complexity per MCMC iteration. We provide an upper bound on the variance of an associated Rao-Blackwellized estimator for this sampler at a finite number of iterations, $T$, and show that the variance is $O\big(\big(\frac{P}{S}\big)^2 \frac{\log T}{T}\big)$ for a given dataset where $S$ is the expected number of conditional PIP computations per MCMC iteration. Experiments show that our Rao-Blackwellized estimator can have a smaller variance than its counterpart associated with the subset wTGS.  ( 2 min )
    Static Fuzzy Bag-of-Words: a lightweight sentence embedding algorithm. (arXiv:2304.03098v1 [cs.CL])
    The introduction of embedding techniques has pushed forward significantly the Natural Language Processing field. Many of the proposed solutions have been presented for word-level encoding; anyhow, in the last years, new mechanism to treat information at an higher level of aggregation, like at sentence- and document-level, have emerged. With this work we address specifically the sentence embeddings problem, presenting the Static Fuzzy Bag-of-Word model. Our model is a refinement of the Fuzzy Bag-of-Words approach, providing sentence embeddings with a predefined dimension. SFBoW provides competitive performances in Semantic Textual Similarity benchmarks, while requiring low computational resources.  ( 2 min )
    WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series. (arXiv:2203.09978v2 [cs.LG] UPDATED)
    Machine learning models often fail to generalize well under distributional shifts. Understanding and overcoming these failures have led to a research field of Out-of-Distribution (OOD) generalization. Despite being extensively studied for static computer vision tasks, OOD generalization has been underexplored for time series tasks. To shine light on this gap, we present WOODS: eight challenging open-source time series benchmarks covering a diverse range of data modalities, such as videos, brain recordings, and sensor signals. We revise the existing OOD generalization algorithms for time series tasks and evaluate them using our systematic framework. Our experiments show a large room for improvement for empirical risk minimization and OOD generalization algorithms on our datasets, thus underscoring the new challenges posed by time series tasks. Code and documentation are available at https://woods-benchmarks.github.io .
    Pairwise Ranking with Gaussian Kernels. (arXiv:2304.03185v1 [stat.ML])
    Regularized pairwise ranking with Gaussian kernels is one of the cutting-edge learning algorithms. Despite a wide range of applications, a rigorous theoretical demonstration still lacks to support the performance of such ranking estimators. This work aims to fill this gap by developing novel oracle inequalities for regularized pairwise ranking. With the help of these oracle inequalities, we derive fast learning rates of Gaussian ranking estimators under a general box-counting dimension assumption on the input domain combined with the noise conditions or the standard smoothness condition. Our theoretical analysis improves the existing estimates and shows that a low intrinsic dimension of input space can help the rates circumvent the curse of dimensionality.
    Spectral Gap Regularization of Neural Networks. (arXiv:2304.03096v1 [stat.ML])
    We introduce Fiedler regularization, a novel approach for regularizing neural networks that utilizes spectral/graphical information. Existing regularization methods often focus on penalizing weights in a global/uniform manner that ignores the connectivity structure of the neural network. We propose to use the Fiedler value of the neural network's underlying graph as a tool for regularization. We provide theoretical motivation for this approach via spectral graph theory. We demonstrate several useful properties of the Fiedler value that make it useful as a regularization tool. We provide an approximate, variational approach for faster computation during training. We provide an alternative formulation of this framework in the form of a structurally weighted $\text{L}_1$ penalty, thus linking our approach to sparsity induction. We provide uniform generalization error bounds for Fiedler regularization via a Rademacher complexity analysis. We performed experiments on datasets that compare Fiedler regularization with classical regularization methods such as dropout and weight decay. Results demonstrate the efficacy of Fiedler regularization. This is a journal extension of the conference paper by Tam and Dunson (2020).
    Classification of Superstatistical Features in High Dimensions. (arXiv:2304.02912v1 [stat.ML])
    We characterise the learning of a mixture of two clouds of data points with generic centroids via empirical risk minimisation in the high dimensional regime, under the assumptions of generic convex loss and convex regularisation. Each cloud of data points is obtained by sampling from a possibly uncountable superposition of Gaussian distributions, whose variance has a generic probability density $\varrho$. Our analysis covers therefore a large family of data distributions, including the case of power-law-tailed distributions with no covariance. We study the generalisation performance of the obtained estimator, we analyse the role of regularisation, and the dependence of the separability transition on the distribution scale parameters.
    Logistic-Normal Likelihoods for Heteroscedastic Label Noise in Classification. (arXiv:2304.02849v1 [cs.LG])
    A natural way of estimating heteroscedastic label noise in regression is to model the observed (potentially noisy) target as a sample from a normal distribution, whose parameters can be learned by minimizing the negative log-likelihood. This loss has desirable loss attenuation properties, as it can reduce the contribution of high-error examples. Intuitively, this behavior can improve robustness against label noise by reducing overfitting. We propose an extension of this simple and probabilistic approach to classification that has the same desirable loss attenuation properties. We evaluate the effectiveness of the method by measuring its robustness against label noise in classification. We perform enlightening experiments exploring the inner workings of the method, including sensitivity to hyperparameters, ablation studies, and more.
    A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation. (arXiv:2304.02858v1 [cs.LG])
    Class imbalance (CI) in classification problems arises when the number of observations belonging to one class is lower than the other classes. Ensemble learning that combines multiple models to obtain a robust model has been prominently used with data augmentation methods to address class imbalance problems. In the last decade, a number of strategies have been added to enhance ensemble learning and data augmentation methods, along with new methods such as generative adversarial networks (GANs). A combination of these has been applied in many studies, but the true rank of different combinations would require a computational review. In this paper, we present a computational review to evaluate data augmentation and ensemble learning methods used to address prominent benchmark CI problems. We propose a general framework that evaluates 10 data augmentation and 10 ensemble learning methods for CI problems. Our objective was to identify the most effective combination for improving classification performance on imbalanced datasets. The results indicate that combinations of data augmentation methods with ensemble learning can significantly improve classification performance on imbalanced datasets. These findings have important implications for the development of more effective approaches for handling imbalanced datasets in machine learning applications.
    Training a Two Layer ReLU Network Analytically. (arXiv:2304.02972v1 [cs.LG])
    Neural networks are usually trained with different variants of gradient descent based optimization algorithms such as stochastic gradient descent or the Adam optimizer. Recent theoretical work states that the critical points (where the gradient of the loss is zero) of two-layer ReLU networks with the square loss are not all local minima. However, in this work we will explore an algorithm for training two-layer neural networks with ReLU-like activation and the square loss that alternatively finds the critical points of the loss function analytically for one layer while keeping the other layer and the neuron activation pattern fixed. Experiments indicate that this simple algorithm can find deeper optima than Stochastic Gradient Descent or the Adam optimizer, obtaining significantly smaller training loss values on four out of the five real datasets evaluated. Moreover, the method is faster than the gradient descent methods and has virtually no tuning parameters.
    MethaneMapper: Spectral Absorption aware Hyperspectral Transformer for Methane Detection. (arXiv:2304.02767v1 [eess.IV])
    Methane (CH$_4$) is the chief contributor to global climate change. Recent Airborne Visible-Infrared Imaging Spectrometer-Next Generation (AVIRIS-NG) has been very useful in quantitative mapping of methane emissions. Existing methods for analyzing this data are sensitive to local terrain conditions, often require manual inspection from domain experts, prone to significant error and hence are not scalable. To address these challenges, we propose a novel end-to-end spectral absorption wavelength aware transformer network, MethaneMapper, to detect and quantify the emissions. MethaneMapper introduces two novel modules that help to locate the most relevant methane plume regions in the spectral domain and uses them to localize these accurately. Thorough evaluation shows that MethaneMapper achieves 0.63 mAP in detection and reduces the model size (by 5x) compared to the current state of the art. In addition, we also introduce a large-scale dataset of methane plume segmentation mask for over 1200 AVIRIS-NG flight lines from 2015-2022. It contains over 4000 methane plume sites. Our dataset will provide researchers the opportunity to develop and advance new methods for tackling this challenging green-house gas detection problem with significant broader social impact. Dataset and source code are public
    Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example Transferability. (arXiv:2304.02688v1 [cs.LG])
    Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that transferability is substantially increased when the training of the surrogate model has been early stopped. A common hypothesis to explain this is that the later training epochs are when models learn the non-robust features that adversarial attacks exploit. Hence, an early stopped model is more robust (hence, a better surrogate) than fully trained models. We demonstrate that the reasons why early stopping improves transferability lie in the side effects it has on the learning dynamics of the model. We first show that early stopping benefits transferability even on models learning from data with non-robust features. We then establish links between transferability and the exploration of the loss landscape in the parameter space, on which early stopping has an inherent effect. More precisely, we observe that transferability peaks when the learning rate decays, which is also the time at which the sharpness of the loss significantly drops. This leads us to propose RFN, a new approach for transferability that minimizes loss sharpness during training in order to maximize transferability. We show that by searching for large flat neighborhoods, RFN always improves over early stopping (by up to 47 points of transferability rate) and is competitive to (if not better than) strong state-of-the-art baselines.
    Adaptable and Interpretable Framework for Novelty Detection in Real-Time IoT Systems. (arXiv:2304.02947v1 [cs.LG])
    This paper presents the Real-time Adaptive and Interpretable Detection (RAID) algorithm. The novel approach addresses the limitations of state-of-the-art anomaly detection methods for multivariate dynamic processes, which are restricted to detecting anomalies within the scope of the model training conditions. The RAID algorithm adapts to non-stationary effects such as data drift and change points that may not be accounted for during model development, resulting in prolonged service life. A dynamic model based on joint probability distribution handles anomalous behavior detection in a system and the root cause isolation based on adaptive process limits. RAID algorithm does not require changes to existing process automation infrastructures, making it highly deployable across different domains. Two case studies involving real dynamic system data demonstrate the benefits of the RAID algorithm, including change point adaptation, root cause isolation, and improved detection accuracy.
    Adaptive Student's t-distribution with method of moments moving estimator for nonstationary time series. (arXiv:2304.03069v1 [stat.ME])
    The real life time series are usually nonstationary, bringing a difficult question of model adaptation. Classical approaches like GARCH assume arbitrary type of dependence. To prevent such bias, we will focus on recently proposed agnostic philosophy of moving estimator: in time $t$ finding parameters optimizing e.g. $F_t=\sum_{\tau<t} (1-\eta)^{t-\tau} \ln(\rho_\theta (x_\tau))$ moving log-likelihood, evolving in time. It allows for example to estimate parameters using inexpensive exponential moving averages (EMA), like absolute central moments $E[|x-\mu|^p]$ evolving with $m_{p,t+1} = m_{p,t} + \eta (|x_t-\mu_t|^p-m_{p,t})$ for one or multiple powers $p\in\mathbb{R}^+$. Application of such general adaptive methods of moments will be presented on Student's t-distribution, popular especially in economical applications, here applied to log-returns of DJIA companies.
    Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry. (arXiv:2304.02902v1 [stat.ML])
    Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. Such symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning.
    Efficient SAGE Estimation via Causal Structure Learning. (arXiv:2304.03113v1 [stat.ML])
    The Shapley Additive Global Importance (SAGE) value is a theoretically appealing interpretability method that fairly attributes global importance to a model's features. However, its exact calculation requires the computation of the feature's surplus performance contributions over an exponential number of feature sets. This is computationally expensive, particularly because estimating the surplus contributions requires sampling from conditional distributions. Thus, SAGE approximation algorithms only take a fraction of the feature sets into account. We propose $d$-SAGE, a method that accelerates SAGE approximation. $d$-SAGE is motivated by the observation that conditional independencies (CIs) between a feature and the model target imply zero surplus contributions, such that their computation can be skipped. To identify CIs, we leverage causal structure learning (CSL) to infer a graph that encodes (conditional) independencies in the data as $d$-separations. This is computationally more efficient because the expense of the one-time graph inference and the $d$-separation queries is negligible compared to the expense of surplus contribution evaluations. Empirically we demonstrate that $d$-SAGE enables the efficient and accurate estimation of SAGE values.
    Bayesian Optimization with Conformal Prediction Sets. (arXiv:2210.12496v3 [cs.LG] UPDATED)
    Bayesian optimization is a coherent, ubiquitous approach to decision-making under uncertainty, with applications including multi-arm bandits, active learning, and black-box optimization. Bayesian optimization selects decisions (i.e. objective function queries) with maximal expected utility with respect to the posterior distribution of a Bayesian model, which quantifies reducible, epistemic uncertainty about query outcomes. In practice, subjectively implausible outcomes can occur regularly for two reasons: 1) model misspecification and 2) covariate shift. Conformal prediction is an uncertainty quantification method with coverage guarantees even for misspecified models and a simple mechanism to correct for covariate shift. We propose conformal Bayesian optimization, which directs queries towards regions of search space where the model predictions have guaranteed validity, and investigate its behavior on a suite of black-box optimization tasks and tabular ranking tasks. In many cases we find that query coverage can be significantly improved without harming sample-efficiency.
    PAD: Towards Principled Adversarial Malware Detection Against Evasion Attacks. (arXiv:2302.11328v2 [cs.CR] UPDATED)
    Machine Learning (ML) techniques can facilitate the automation of malicious software (malware for short) detection, but suffer from evasion attacks. Many studies counter such attacks in heuristic manners, lacking theoretical guarantees and defense effectiveness. In this paper, we propose a new adversarial training framework, termed Principled Adversarial Malware Detection (PAD), which offers convergence guarantees for robust optimization methods. PAD lays on a learnable convex measurement that quantifies distribution-wise discrete perturbations to protect malware detectors from adversaries, whereby for smooth detectors, adversarial training can be performed with theoretical treatments. To promote defense effectiveness, we propose a new mixture of attacks to instantiate PAD to enhance deep neural network-based measurements and malware detectors. Experimental results on two Android malware datasets demonstrate: (i) the proposed method significantly outperforms the state-of-the-art defenses; (ii) it can harden ML-based malware detection against 27 evasion attacks with detection accuracies greater than 83.45%, at the price of suffering an accuracy decrease smaller than 2.16% in the absence of attacks; (iii) it matches or outperforms many anti-malware scanners in VirusTotal against realistic adversarial malware.
    Online Kernel CUSUM for Change-Point Detection. (arXiv:2211.15070v3 [stat.ME] UPDATED)
    We propose an efficient online kernel Cumulative Sum (CUSUM) method for change-point detection that utilizes the maximum over a set of kernel statistics to account for the unknown change-point location. Our approach exhibits increased sensitivity to small changes compared to existing methods, such as the Scan-B statistic, which corresponds to a non-parametric Shewhart chart-type procedure. We provide accurate analytic approximations for two key performance metrics: the Average Run Length (ARL) and Expected Detection Delay (EDD), which enable us to establish an optimal window length on the order of the logarithm of ARL to ensure minimal power loss relative to an oracle procedure with infinite memory. Such a finding parallels the classic result for window-limited Generalized Likelihood Ratio (GLR) procedure in parametric change-point detection literature. Moreover, we introduce a recursive calculation procedure for detection statistics to ensure constant computational and memory complexity, which is essential for online procedures. Through extensive experiments on simulated data and a real-world human activity dataset, we demonstrate the competitive performance of our method and validate our theoretical results.
    Delayed Feedback in Generalised Linear Bandits Revisited. (arXiv:2207.10786v3 [cs.LG] UPDATED)
    The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for immediate rewards is unmet in many real-world applications where the reward is almost always delayed. We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner. We show that a natural adaptation of an optimistic algorithm to the delayed feedback achieves a regret bound where the penalty for the delays is independent of the horizon. This result significantly improves upon existing work, where the best known regret bound has the delay penalty increasing with the horizon. We verify our theoretical results through experiments on simulated data.
    A Bayesian Framework for Causal Analysis of Recurrent Events in Presence of Immortal Risk. (arXiv:2304.03247v1 [stat.ME])
    Observational studies of recurrent event rates are common in biomedical statistics. Broadly, the goal is to estimate differences in event rates under two treatments within a defined target population over a specified followup window. Estimation with observational claims data is challenging because while membership in the target population is defined in terms of eligibility criteria, treatment is rarely assigned exactly at the time of eligibility. Ad-hoc solutions to this timing misalignment, such as assigning treatment at eligibility based on subsequent assignment, incorrectly attribute prior event rates to treatment - resulting in immortal risk bias. Even if eligibility and treatment are aligned, a terminal event process (e.g. death) often stops the recurrent event process of interest. Both processes are also censored so that events are not observed over the entire followup window. Our approach addresses misalignment by casting it as a treatment switching problem: some patients are on treatment at eligibility while others are off treatment but may switch to treatment at a specified time - if they survive long enough. We define and identify an average causal effect of switching under specified causal assumptions. Estimation is done using a g-computation framework with a joint semiparametric Bayesian model for the death and recurrent event processes. Computing the estimand for various switching times allows us to assess the impact of treatment timing. We apply the method to contrast hospitalization rates under different opioid treatment strategies among patients with chronic back pain using Medicare claims data.
    Scalable Stochastic Parametric Verification with Stochastic Variational Smoothed Model Checking. (arXiv:2205.05398v3 [cs.LG] UPDATED)
    Parametric verification of linear temporal properties for stochastic models can be expressed as computing the satisfaction probability of a certain property as a function of the parameters of the model. Smoothed model checking (smMC) aims at inferring the satisfaction function over the entire parameter space from a limited set of observations obtained via simulation. As observations are costly and noisy, smMC is framed as a Bayesian inference problem so that the estimates have an additional quantification of the uncertainty. In smMC the authors use Gaussian Processes (GP), inferred by means of the Expectation Propagation algorithm. This approach provides accurate reconstructions with statistically sound quantification of the uncertainty. However, it inherits the well-known scalability issues of GP. In this paper, we exploit recent advances in probabilistic machine learning to push this limitation forward, making Bayesian inference of smMC scalable to larger datasets and enabling its application to models with high dimensional parameter spaces. We propose Stochastic Variational Smoothed Model Checking (SV-smMC), a solution that exploits stochastic variational inference (SVI) to approximate the posterior distribution of the smMC problem. The strength and flexibility of SVI make SV-smMC applicable to two alternative probabilistic models: Gaussian Processes (GP) and Bayesian Neural Networks (BNN). The core ingredient of SVI is a stochastic gradient-based optimization that makes inference easily parallelizable and that enables GPU acceleration. In this paper, we compare the performances of smMC against those of SV-smMC by looking at the scalability, the computational efficiency and the accuracy of the reconstructed satisfaction function.
    Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation. (arXiv:2303.03237v2 [stat.ML] UPDATED)
    Sampling from Gibbs distributions $p(x) \propto \exp(-V(x)/\varepsilon)$ and computing their log-partition function are fundamental tasks in statistics, machine learning, and statistical physics. However, while efficient algorithms are known for convex potentials $V$, the situation is much more difficult in the non-convex case, where algorithms necessarily suffer from the curse of dimensionality in the worst case. For optimization, which can be seen as a low-temperature limit of sampling, it is known that smooth functions $V$ allow faster convergence rates. Specifically, for $m$-times differentiable functions in $d$ dimensions, the optimal rate for algorithms with $n$ function evaluations is known to be $O(n^{-m/d})$, where the constant can potentially depend on $m, d$ and the function to be optimized. Hence, the curse of dimensionality can be alleviated for smooth functions at least in terms of the convergence rate. Recently, it has been shown that similarly fast rates can also be achieved with polynomial runtime $O(n^{3.5})$, where the exponent $3.5$ is independent of $m$ or $d$. Hence, it is natural to ask whether similar rates for sampling and log-partition computation are possible, and whether they can be realized in polynomial time with an exponent independent of $m$ and $d$. We show that the optimal rates for sampling and log-partition computation are sometimes equal and sometimes faster than for optimization. We then analyze various polynomial-time sampling algorithms, including an extension of a recent promising optimization approach, and find that they sometimes exhibit interesting behavior but no near-optimal rates. Our results also give further insights on the relation between sampling, log-partition, and optimization problems.
    Private Estimation with Public Data. (arXiv:2208.07984v2 [cs.LG] UPDATED)
    We initiate the study of differentially private (DP) estimation with access to a small amount of public data. For private estimation of d-dimensional Gaussians, we assume that the public data comes from a Gaussian that may have vanishing similarity in total variation distance with the underlying Gaussian of the private data. We show that under the constraints of pure or concentrated DP, d+1 public data samples are sufficient to remove any dependence on the range parameters of the private data distribution from the private sample complexity, which is known to be otherwise necessary without public data. For separated Gaussian mixtures, we assume that the underlying public and private distributions are the same, and we consider two settings: (1) when given a dimension-independent amount of public data, the private sample complexity can be improved polynomially in terms of the number of mixture components, and any dependence on the range parameters of the distribution can be removed in the approximate DP case; (2) when given an amount of public data linear in the dimension, the private sample complexity can be made independent of range parameters even under concentrated DP, and additional improvements can be made to the overall sample complexity.
    Ollivier-Ricci Curvature for Hypergraphs: A Unified Framework. (arXiv:2210.12048v3 [cs.LG] UPDATED)
    Bridging geometry and topology, curvature is a powerful and expressive invariant. While the utility of curvature has been theoretically and empirically confirmed in the context of manifolds and graphs, its generalization to the emerging domain of hypergraphs has remained largely unexplored. On graphs, the Ollivier-Ricci curvature measures differences between random walks via Wasserstein distances, thus grounding a geometric concept in ideas from probability theory and optimal transport. We develop ORCHID, a flexible framework generalizing Ollivier-Ricci curvature to hypergraphs, and prove that the resulting curvatures have favorable theoretical properties. Through extensive experiments on synthetic and real-world hypergraphs from different domains, we demonstrate that ORCHID curvatures are both scalable and useful to perform a variety of hypergraph tasks in practice.
    Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks. (arXiv:2212.13848v2 [cs.LG] UPDATED)
    We explore the ability of overparameterized shallow ReLU neural networks to learn Lipschitz, nondifferentiable, bounded functions with additive noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noise, neural networks trained to nearly zero training error are inconsistent in this class, we focus on the early-stopped GD which allows us to show consistency and optimal rates. In particular, we explore this problem from the viewpoint of the Neural Tangent Kernel (NTK) approximation of a GD-trained finite-width neural network. We show that whenever some early stopping rule is guaranteed to give an optimal rate (of excess risk) on the Hilbert space of the kernel induced by the ReLU activation function, the same rule can be used to achieve minimax optimal rate for learning on the class of considered Lipschitz functions by neural networks. We discuss several data-free and data-dependent practically appealing stopping rules that yield optimal rates.
  • Open

    Scalable Stochastic Parametric Verification with Stochastic Variational Smoothed Model Checking. (arXiv:2205.05398v3 [cs.LG] UPDATED)
    Parametric verification of linear temporal properties for stochastic models can be expressed as computing the satisfaction probability of a certain property as a function of the parameters of the model. Smoothed model checking (smMC) aims at inferring the satisfaction function over the entire parameter space from a limited set of observations obtained via simulation. As observations are costly and noisy, smMC is framed as a Bayesian inference problem so that the estimates have an additional quantification of the uncertainty. In smMC the authors use Gaussian Processes (GP), inferred by means of the Expectation Propagation algorithm. This approach provides accurate reconstructions with statistically sound quantification of the uncertainty. However, it inherits the well-known scalability issues of GP. In this paper, we exploit recent advances in probabilistic machine learning to push this limitation forward, making Bayesian inference of smMC scalable to larger datasets and enabling its application to models with high dimensional parameter spaces. We propose Stochastic Variational Smoothed Model Checking (SV-smMC), a solution that exploits stochastic variational inference (SVI) to approximate the posterior distribution of the smMC problem. The strength and flexibility of SVI make SV-smMC applicable to two alternative probabilistic models: Gaussian Processes (GP) and Bayesian Neural Networks (BNN). The core ingredient of SVI is a stochastic gradient-based optimization that makes inference easily parallelizable and that enables GPU acceleration. In this paper, we compare the performances of smMC against those of SV-smMC by looking at the scalability, the computational efficiency and the accuracy of the reconstructed satisfaction function.  ( 3 min )
    Classification of Melanocytic Nevus Images using BigTransfer (BiT). (arXiv:2211.11872v2 [eess.IV] UPDATED)
    Skin cancer is a fatal disease that takes a heavy toll over human lives annually. The colored skin images show a significant degree of resemblance between different skin lesions such as melanoma and nevus, making identification and diagnosis more challenging. Melanocytic nevi may mature to cause fatal melanoma. Therefore, the current management protocol involves the removal of those nevi that appear intimidating. However, this necessitates resilient classification paradigms for classifying benign and malignant melanocytic nevi. Early diagnosis necessitates a dependable automated system for melanocytic nevi classification to render diagnosis efficient, timely, and successful. An automated classification algorithm is proposed in the given research. A neural network previously-trained on a separate problem statement is leveraged in this technique for classifying melanocytic nevus images. The suggested method uses BigTransfer (BiT), a ResNet-based transfer learning approach for classifying melanocytic nevi as malignant or benign. The results obtained are compared to that of current techniques, and the new method's classification rate is proven to outperform that of existing methods.  ( 2 min )
    Quantum Differential Privacy: An Information Theory Perspective. (arXiv:2202.10717v3 [quant-ph] UPDATED)
    Differential privacy has been an exceptionally successful concept when it comes to providing provable security guarantees for classical computations. More recently, the concept was generalized to quantum computations. While classical computations are essentially noiseless and differential privacy is often achieved by artificially adding noise, near-term quantum computers are inherently noisy and it was observed that this leads to natural differential privacy as a feature. In this work we discuss quantum differential privacy in an information theoretic framework by casting it as a quantum divergence. A main advantage of this approach is that differential privacy becomes a property solely based on the output states of the computation, without the need to check it for every measurement. This leads to simpler proofs and generalized statements of its properties as well as several new bounds for both, general and specific, noise models. In particular, these include common representations of quantum circuits and quantum machine learning concepts. Here, we focus on the difference in the amount of noise required to achieve certain levels of differential privacy versus the amount that would make any computation useless. Finally, we also generalize the classical concepts of local differential privacy, Renyi differential privacy and the hypothesis testing interpretation to the quantum setting, providing several new properties and insights.  ( 3 min )
    Resource Constrained Vehicular Edge Federated Learning with Highly Mobile Connected Vehicles. (arXiv:2210.15496v3 [eess.SY] UPDATED)
    This paper proposes a vehicular edge federated learning (VEFL) solution, where an edge server leverages highly mobile connected vehicles' (CVs') onboard central processing units (CPUs) and local datasets to train a global model. Convergence analysis reveals that the VEFL training loss depends on the successful receptions of the CVs' trained models over the intermittent vehicle-to-infrastructure (V2I) wireless links. Owing to high mobility, in the full device participation case (FDPC), the edge server aggregates client model parameters based on a weighted combination according to the CVs' dataset sizes and sojourn periods, while it selects a subset of CVs in the partial device participation case (PDPC). We then devise joint VEFL and radio access technology (RAT) parameters optimization problems under delay, energy and cost constraints to maximize the probability of successful reception of the locally trained models. Considering that the optimization problem is NP-hard, we decompose it into a VEFL parameter optimization sub-problem, given the estimated worst-case sojourn period, delay and energy expense, and an online RAT parameter optimization sub-problem. Finally, extensive simulations are conducted to validate the effectiveness of the proposed solutions with a practical 5G new radio (5G-NR) RAT under a realistic microscopic mobility model.  ( 3 min )
    Learn to Interpret Atari Agents. (arXiv:1812.11276v3 [cs.LG] UPDATED)
    Deep reinforcement learning (DeepRL) agents surpass human-level performance in many tasks. However, the direct mapping from states to actions makes it hard to interpret the rationale behind the decision-making of the agents. In contrast to previous a-posteriori methods for visualizing DeepRL policies, in this work, we propose to equip the DeepRL model with an innate visualization ability. Our proposed agent, named region-sensitive Rainbow (RS-Rainbow), is an end-to-end trainable network based on the original Rainbow, a powerful deep Q-network agent. It learns important regions in the input domain via an attention module. At inference time, after each forward pass, we can visualize regions that are most important to decision-making by backpropagating gradients from the attention module to the input frames. The incorporation of our proposed module not only improves model interpretability, but leads to performance improvement. Extensive experiments on games from the Atari 2600 suite demonstrate the effectiveness of RS-Rainbow.  ( 2 min )
    Denoising diffusion probabilistic models for probabilistic energy forecasting. (arXiv:2212.02977v5 [cs.LG] UPDATED)
    Scenario-based probabilistic forecasts have become vital for decision-makers in handling intermittent renewable energies. This paper presents a recent promising deep learning generative approach called denoising diffusion probabilistic models. It is a class of latent variable models which have recently demonstrated impressive results in the computer vision community. However, to our knowledge, there has yet to be a demonstration that they can generate high-quality samples of load, PV, or wind power time series, crucial elements to face the new challenges in power systems applications. Thus, we propose the first implementation of this model for energy forecasting using the open data of the Global Energy Forecasting Competition 2014. The results demonstrate this approach is competitive with other state-of-the-art deep learning generative models, including generative adversarial networks, variational autoencoders, and normalizing flows.  ( 2 min )
    Conformal Quantitative Predictive Monitoring of STL Requirements for Stochastic Processes. (arXiv:2211.02375v2 [eess.SY] UPDATED)
    We consider the problem of predictive monitoring (PM), i.e., predicting at runtime the satisfaction of a desired property from the current system's state. Due to its relevance for runtime safety assurance and online control, PM methods need to be efficient to enable timely interventions against predicted violations, while providing correctness guarantees. We introduce \textit{quantitative predictive monitoring (QPM)}, the first PM method to support stochastic processes and rich specifications given in Signal Temporal Logic (STL). Unlike most of the existing PM techniques that predict whether or not some property $\phi$ is satisfied, QPM provides a quantitative measure of satisfaction by predicting the quantitative (aka robust) STL semantics of $\phi$. QPM derives prediction intervals that are highly efficient to compute and with probabilistic guarantees, in that the intervals cover with arbitrary probability the STL robustness values relative to the stochastic evolution of the system. To do so, we take a machine-learning approach and leverage recent advances in conformal inference for quantile regression, thereby avoiding expensive Monte-Carlo simulations at runtime to estimate the intervals. We also show how our monitors can be combined in a compositional manner to handle composite formulas, without retraining the predictors nor sacrificing the guarantees. We demonstrate the effectiveness and scalability of QPM over a benchmark of four discrete-time stochastic processes with varying degrees of complexity.
    Interpreting wealth distribution via poverty map inference using multimodal data. (arXiv:2302.10793v2 [cs.LG] UPDATED)
    Poverty maps are essential tools for governments and NGOs to track socioeconomic changes and adequately allocate infrastructure and services in places in need. Sensor and online crowd-sourced data combined with machine learning methods have provided a recent breakthrough in poverty map inference. However, these methods do not capture local wealth fluctuations, and are not optimized to produce accountable results that guarantee accurate predictions to all sub-populations. Here, we propose a pipeline of machine learning models to infer the mean and standard deviation of wealth across multiple geographically clustered populated places, and illustrate their performance in Sierra Leone and Uganda. These models leverage seven independent and freely available feature sources based on satellite images, and metadata collected via online crowd-sourcing and social media. Our models show that combined metadata features are the best predictors of wealth in rural areas, outperforming image-based models, which are the best for predicting the highest wealth quintiles. Our results recover the local mean and variation of wealth, and correctly capture the positive yet non-monotonous correlation between them. We further demonstrate the capabilities and limitations of model transfer across countries and the effects of data recency and other biases. Our methodology provides open tools to build towards more transparent and interpretable models to help governments and NGOs to make informed decisions based on data availability, urbanization level, and poverty thresholds.
    Improving Fast Adversarial Training with Prior-Guided Knowledge. (arXiv:2304.00202v2 [cs.LG] UPDATED)
    Fast adversarial training (FAT) is an efficient method to improve robustness. However, the original FAT suffers from catastrophic overfitting, which dramatically and suddenly reduces robustness after a few training epochs. Although various FAT variants have been proposed to prevent overfitting, they require high training costs. In this paper, we investigate the relationship between adversarial example quality and catastrophic overfitting by comparing the training processes of standard adversarial training and FAT. We find that catastrophic overfitting occurs when the attack success rate of adversarial examples becomes worse. Based on this observation, we propose a positive prior-guided adversarial initialization to prevent overfitting by improving adversarial example quality without extra training costs. This initialization is generated by using high-quality adversarial perturbations from the historical training process. We provide theoretical analysis for the proposed initialization and propose a prior-guided regularization method that boosts the smoothness of the loss function. Additionally, we design a prior-guided ensemble FAT method that averages the different model weights of historical models using different decay rates. Our proposed method, called FGSM-PGK, assembles the prior-guided knowledge, i.e., the prior-guided initialization and model weights, acquired during the historical training process. Evaluations of four datasets demonstrate the superiority of the proposed method.
    Interpretable Symbolic Regression for Data Science: Analysis of the 2022 Competition. (arXiv:2304.01117v2 [cs.LG] UPDATED)
    Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize approaches such as enumeration algorithms, mixed linear integer programming, neural networks, and Bayesian optimization. In order to assess how well these new approaches behave on a set of common challenges often faced in real-world data, we hosted a competition at the 2022 Genetic and Evolutionary Computation Conference consisting of different synthetic and real-world datasets which were blind to entrants. For the real-world track, we assessed interpretability in a realistic way by using a domain expert to judge the trustworthiness of candidate models.We present an in-depth analysis of the results obtained in this competition, discuss current challenges of symbolic regression algorithms and highlight possible improvements for future competitions.
    RARE: Robust Masked Graph Autoencoder. (arXiv:2304.01507v2 [cs.LG] UPDATED)
    Masked graph autoencoder (MGAE) has emerged as a promising self-supervised graph pre-training (SGP) paradigm due to its simplicity and effectiveness. However, existing efforts perform the mask-then-reconstruct operation in the raw data space as is done in computer vision (CV) and natural language processing (NLP) areas, while neglecting the important non-Euclidean property of graph data. As a result, the highly unstable local connection structures largely increase the uncertainty in inferring masked data and decrease the reliability of the exploited self-supervision signals, leading to inferior representations for downstream evaluations. To address this issue, we propose a novel SGP method termed Robust mAsked gRaph autoEncoder (RARE) to improve the certainty in inferring masked data and the reliability of the self-supervision mechanism by further masking and reconstructing node samples in the high-order latent feature space. Through both theoretical and empirical analyses, we have discovered that performing a joint mask-then-reconstruct strategy in both latent feature and raw data spaces could yield improved stability and performance. To this end, we elaborately design a masked latent feature completion scheme, which predicts latent features of masked nodes under the guidance of high-order sample correlations that are hard to be observed from the raw data perspective. Specifically, we first adopt a latent feature predictor to predict the masked latent features from the visible ones. Next, we encode the raw data of masked samples with a momentum graph encoder and subsequently employ the resulting representations to improve predicted results through latent feature matching. Extensive experiments on seventeen datasets have demonstrated the effectiveness and robustness of RARE against state-of-the-art (SOTA) competitors across three downstream tasks.
    Task-oriented Memory-efficient Pruning-Adapter. (arXiv:2303.14704v2 [cs.CL] UPDATED)
    The Outstanding performance and growing size of Large Language Models has led to increased attention in parameter efficient learning. The two predominant approaches are Adapters and Pruning. Adapters are to freeze the model and give it a new weight matrix on the side, which can significantly reduce the time and memory of training, but the cost is that the evaluation and testing will increase the time and memory consumption. Pruning is to cut off some weight and re-distribute the remaining weight, which sacrifices the complexity of training at the cost of extremely high memory and training time, making the cost of evaluation and testing relatively low. So efficiency of training and inference can't be obtained in the same time. In this work, we propose a task-oriented Pruning-Adapter method that achieve a high memory efficiency of training and memory, and speeds up training time and ensures no significant decrease in accuracy in GLUE tasks, achieving training and inference efficiency at the same time.
    ConDA: Unsupervised Domain Adaptation for LiDAR Segmentation via Regularized Domain Concatenation. (arXiv:2111.15242v3 [cs.CV] UPDATED)
    Transferring knowledge learned from the labeled source domain to the raw target domain for unsupervised domain adaptation (UDA) is essential to the scalable deployment of autonomous driving systems. State-of-the-art methods in UDA often employ a key idea: utilizing joint supervision signals from both source and target domains for self-training. In this work, we improve and extend this aspect. We present ConDA, a concatenation-based domain adaptation framework for LiDAR segmentation that: 1) constructs an intermediate domain consisting of fine-grained interchange signals from both source and target domains without destabilizing the semantic coherency of objects and background around the ego-vehicle; and 2) utilizes the intermediate domain for self-training. To improve the network training on the source domain and self-training on the intermediate domain, we propose an anti-aliasing regularizer and an entropy aggregator to reduce the negative effect caused by the aliasing artifacts and noisy pseudo labels. Through extensive studies, we demonstrate that ConDA significantly outperforms prior arts in mitigating domain gaps.
    Time series anomaly detection with reconstruction-based state-space models. (arXiv:2303.03324v2 [cs.LG] UPDATED)
    Recent advances in digitization have led to the availability of multivariate time series data in various domains, enabling real-time monitoring of operations. Identifying abnormal data patterns and detecting potential failures in these scenarios are important yet rather challenging. In this work, we propose a novel unsupervised anomaly detection method for time series data. The proposed framework jointly learns the observation model and the dynamic model, and model uncertainty is estimated from normal samples. Specifically, a long short-term memory (LSTM)-based encoder-decoder is adopted to represent the mapping between the observation space and the latent space. Bidirectional transitions of states are simultaneously modeled by leveraging backward and forward temporal information. Regularization of the latent space places constraints on the states of normal samples, and Mahalanobis distance is used to evaluate the abnormality level. Empirical studies on synthetic and real-world datasets demonstrate the superior performance of the proposed method in anomaly detection tasks.
    Temporal Dynamic Synchronous Functional Brain Network for Schizophrenia Diagnosis and Lateralization Analysis. (arXiv:2304.01347v2 [q-bio.NC] UPDATED)
    The available evidence suggests that dynamic functional connectivity (dFC) can capture time-varying abnormalities in brain activity in resting-state cerebral functional magnetic resonance imaging (rs-fMRI) data and has a natural advantage in uncovering mechanisms of abnormal brain activity in schizophrenia(SZ) patients. Hence, an advanced dynamic brain network analysis model called the temporal brain category graph convolutional network (Temporal-BCGCN) was employed. Firstly, a unique dynamic brain network analysis module, DSF-BrainNet, was designed to construct dynamic synchronization features. Subsequently, a revolutionary graph convolution method, TemporalConv, was proposed, based on the synchronous temporal properties of feature. Finally, the first modular abnormal hemispherical lateralization test tool in deep learning based on rs-fMRI data, named CategoryPool, was proposed. This study was validated on COBRE and UCLA datasets and achieved 83.62% and 89.71% average accuracies, respectively, outperforming the baseline model and other state-of-the-art methods. The ablation results also demonstrate the advantages of TemporalConv over the traditional edge feature graph convolution approach and the improvement of CategoryPool over the classical graph pooling approach. Interestingly, this study showed that the lower order perceptual system and higher order network regions in the left hemisphere are more severely dysfunctional than in the right hemisphere in SZ and reaffirms the importance of the left medial superior frontal gyrus in SZ. Our core code is available at: https://github.com/swfen/Temporal-BCGCN.
    Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation. (arXiv:2303.03237v2 [stat.ML] UPDATED)
    Sampling from Gibbs distributions $p(x) \propto \exp(-V(x)/\varepsilon)$ and computing their log-partition function are fundamental tasks in statistics, machine learning, and statistical physics. However, while efficient algorithms are known for convex potentials $V$, the situation is much more difficult in the non-convex case, where algorithms necessarily suffer from the curse of dimensionality in the worst case. For optimization, which can be seen as a low-temperature limit of sampling, it is known that smooth functions $V$ allow faster convergence rates. Specifically, for $m$-times differentiable functions in $d$ dimensions, the optimal rate for algorithms with $n$ function evaluations is known to be $O(n^{-m/d})$, where the constant can potentially depend on $m, d$ and the function to be optimized. Hence, the curse of dimensionality can be alleviated for smooth functions at least in terms of the convergence rate. Recently, it has been shown that similarly fast rates can also be achieved with polynomial runtime $O(n^{3.5})$, where the exponent $3.5$ is independent of $m$ or $d$. Hence, it is natural to ask whether similar rates for sampling and log-partition computation are possible, and whether they can be realized in polynomial time with an exponent independent of $m$ and $d$. We show that the optimal rates for sampling and log-partition computation are sometimes equal and sometimes faster than for optimization. We then analyze various polynomial-time sampling algorithms, including an extension of a recent promising optimization approach, and find that they sometimes exhibit interesting behavior but no near-optimal rates. Our results also give further insights on the relation between sampling, log-partition, and optimization problems.
    Variable-Based Calibration for Machine Learning Classifiers. (arXiv:2209.15154v3 [cs.LG] UPDATED)
    The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confidence scores for model predictions. In this paper we introduce the notion of variable-based calibration to characterize calibration properties of a model with respect to a variable of interest, generalizing traditional score-based metrics such as expected calibration error (ECE). In particular, we find that models with near-perfect ECE can exhibit significant miscalibration as a function of features of the data. We demonstrate this phenomenon both theoretically and in practice on multiple well-known datasets, and show that it can persist after the application of existing calibration methods. To mitigate this issue, we propose strategies for detection, visualization, and quantification of variable-based calibration error. We then examine the limitations of current score-based calibration methods and explore potential modifications. Finally, we discuss the implications of these findings, emphasizing that an understanding of calibration beyond simple aggregate measures is crucial for endeavors such as fairness and model interpretability.
    Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks. (arXiv:2212.13848v2 [cs.LG] UPDATED)
    We explore the ability of overparameterized shallow ReLU neural networks to learn Lipschitz, nondifferentiable, bounded functions with additive noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noise, neural networks trained to nearly zero training error are inconsistent in this class, we focus on the early-stopped GD which allows us to show consistency and optimal rates. In particular, we explore this problem from the viewpoint of the Neural Tangent Kernel (NTK) approximation of a GD-trained finite-width neural network. We show that whenever some early stopping rule is guaranteed to give an optimal rate (of excess risk) on the Hilbert space of the kernel induced by the ReLU activation function, the same rule can be used to achieve minimax optimal rate for learning on the class of considered Lipschitz functions by neural networks. We discuss several data-free and data-dependent practically appealing stopping rules that yield optimal rates.
    EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine Learning Classification Methodologies. (arXiv:2301.00508v2 [cs.SD] UPDATED)
    Vocal Bursts -- short, non-speech vocalizations that convey emotions, such as laughter, cries, sighs, moans, and groans -- are an often-overlooked aspect of speech emotion recognition, but an important aspect of human vocal communication. One barrier to study of these interesting vocalizations is a lack of large datasets. I am pleased to introduce the EmoGator dataset, which consists of 32,130 samples from 357 speakers, 16.9654 hours of audio; each sample classified into one of 30 distinct emotion categories by the speaker. Several different approaches to construct classifiers to identify emotion categories will be discussed, and directions for future research will be suggested. Data set is available for download from https://github.com/fredbuhl/EmoGator.
    Dataless Knowledge Fusion by Merging Weights of Language Models. (arXiv:2212.09849v2 [cs.CL] UPDATED)
    Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. Oftentimes fine-tuned models are readily available but their training data is not, due to data privacy or intellectual property concerns. This creates a barrier to fusing knowledge across individual models to yield a better single model. In this paper, we study the problem of merging individual models built on different training data sets to obtain a single model that performs well both across all data set domains and can generalize on out-of-domain data. We propose a dataless knowledge fusion method that merges models in their parameter space, guided by weights that minimize prediction differences between the merged model and the individual models. Over a battery of evaluation settings, we show that the proposed method significantly outperforms baselines such as Fisher-weighted averaging or model ensembling. Further, we find that our method is a promising alternative to multi-task learning that can preserve or sometimes improve over the individual models without access to the training data. Finally, model merging is more efficient than training a multi-task model, thus making it applicable to a wider set of scenarios.
    Blurring-Sharpening Process Models for Collaborative Filtering. (arXiv:2211.09324v2 [cs.IR] UPDATED)
    Collaborative filtering is one of the most fundamental topics for recommender systems. Various methods have been proposed for collaborative filtering, ranging from matrix factorization to graph convolutional methods. Being inspired by recent successes of graph filtering-based methods and score-based generative models (SGMs), we present a novel concept of blurring-sharpening process model (BSPM). SGMs and BSPMs share the same processing philosophy that new information can be discovered (e.g., new images are generated in the case of SGMs) while original information is first perturbed and then recovered to its original form. However, SGMs and our BSPMs deal with different types of information, and their optimal perturbation and recovery processes have fundamental discrepancies. Therefore, our BSPMs have different forms from SGMs. In addition, our concept not only theoretically subsumes many existing collaborative filtering models but also outperforms them in terms of Recall and NDCG in the three benchmark datasets, Gowalla, Yelp2018, and Amazon-book. In addition, the processing time of our method is comparable to other fast baselines. Our proposed concept has much potential in the future to be enhanced by designing better blurring (i.e., perturbation) and sharpening (i.e., recovery) processes than what we use in this paper.
    PAD: Towards Principled Adversarial Malware Detection Against Evasion Attacks. (arXiv:2302.11328v2 [cs.CR] UPDATED)
    Machine Learning (ML) techniques can facilitate the automation of malicious software (malware for short) detection, but suffer from evasion attacks. Many studies counter such attacks in heuristic manners, lacking theoretical guarantees and defense effectiveness. In this paper, we propose a new adversarial training framework, termed Principled Adversarial Malware Detection (PAD), which offers convergence guarantees for robust optimization methods. PAD lays on a learnable convex measurement that quantifies distribution-wise discrete perturbations to protect malware detectors from adversaries, whereby for smooth detectors, adversarial training can be performed with theoretical treatments. To promote defense effectiveness, we propose a new mixture of attacks to instantiate PAD to enhance deep neural network-based measurements and malware detectors. Experimental results on two Android malware datasets demonstrate: (i) the proposed method significantly outperforms the state-of-the-art defenses; (ii) it can harden ML-based malware detection against 27 evasion attacks with detection accuracies greater than 83.45%, at the price of suffering an accuracy decrease smaller than 2.16% in the absence of attacks; (iii) it matches or outperforms many anti-malware scanners in VirusTotal against realistic adversarial malware.
    Delay Embedded Echo-State Network: A Predictor for Partially Observed Systems. (arXiv:2211.05992v2 [eess.SY] UPDATED)
    This paper considers the problem of data-driven prediction of partially observed systems using a recurrent neural network. While neural network based dynamic predictors perform well with full-state training data, prediction with partial observation during training phase poses a significant challenge. Here a predictor for partial observations is developed using an echo-state network (ESN) and time delay embedding of the partially observed state. The proposed method is theoretically justified with Taken's embedding theorem and strong observability of a nonlinear system. The efficacy of the proposed method is demonstrated on three systems: two synthetic datasets from chaotic dynamical systems and a set of real-time traffic data.
    Analyzing Leakage of Personally Identifiable Information in Language Models. (arXiv:2302.00539v3 [cs.LG] UPDATED)
    Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Understanding the risk of LMs leaking Personally Identifiable Information (PII) has received less attention, which can be attributed to the false assumption that dataset curation techniques such as scrubbing are sufficient to prevent PII leakage. Scrubbing techniques reduce but do not prevent the risk of PII leakage: in practice scrubbing is imperfect and must balance the trade-off between minimizing disclosure and preserving the utility of the dataset. On the other hand, it is unclear to which extent algorithmic defenses such as differential privacy, designed to guarantee sentence- or user-level privacy, prevent PII disclosure. In this work, we introduce rigorous game-based definitions for three types of PII leakage via black-box extraction, inference, and reconstruction attacks with only API access to an LM. We empirically evaluate the attacks against GPT-2 models fine-tuned with and without defenses in three domains: case law, health care, and e-mails. Our main contributions are (i) novel attacks that can extract up to 10$\times$ more PII sequences than existing attacks, (ii) showing that sentence-level differential privacy reduces the risk of PII disclosure but still leaks about 3% of PII sequences, and (iii) a subtle connection between record-level membership inference and PII reconstruction. Code to reproduce all experiments in the paper is available at https://github.com/microsoft/analysing_pii_leakage.
    IoT Botnet Detection Using an Economic Deep Learning Model. (arXiv:2302.02013v3 [cs.CR] UPDATED)
    The rapid progress in technology innovation usage and distribution has increased in the last decade. The rapid growth of the Internet of Things (IoT) systems worldwide has increased network security challenges created by malicious third parties. Thus, reliable intrusion detection and network forensics systems that consider security concerns and IoT systems limitations are essential to protect such systems. IoT botnet attacks are one of the significant threats to enterprises and individuals. Thus, this paper proposed an economic deep learning-based model for detecting IoT botnet attacks along with different types of attacks. The proposed model achieved higher accuracy than the state-of-the-art detection models using a smaller implementation budget and accelerating the training and detecting processes.
    StratDef: Strategic Defense Against Adversarial Attacks in ML-based Malware Detection. (arXiv:2202.07568v5 [cs.LG] UPDATED)
    Over the years, most research towards defenses against adversarial attacks on machine learning models has been in the image recognition domain. The malware detection domain has received less attention despite its importance. Moreover, most work exploring these defenses has focused on several methods but with no strategy when applying them. In this paper, we introduce StratDef, which is a strategic defense system based on a moving target defense approach. We overcome challenges related to the systematic construction, selection, and strategic use of models to maximize adversarial robustness. StratDef dynamically and strategically chooses the best models to increase the uncertainty for the attacker while minimizing critical aspects in the adversarial ML domain, like attack transferability. We provide the first comprehensive evaluation of defenses against adversarial attacks on machine learning for malware detection, where our threat model explores different levels of threat, attacker knowledge, capabilities, and attack intensities. We show that StratDef performs better than other defenses even when facing the peak adversarial threat. We also show that, of the existing defenses, only a few adversarially-trained models provide substantially better protection than just using vanilla models but are still outperformed by StratDef.
    Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark. (arXiv:2304.03279v1 [cs.LG])
    Artificial agents have traditionally been trained to maximize reward, which may incentivize power-seeking and deception, analogous to how next-token prediction in language models (LMs) may incentivize toxicity. So do agents naturally learn to be Machiavellian? And how do we measure these behaviors in general-purpose models such as GPT-4? Towards answering these questions, we introduce MACHIAVELLI, a benchmark of 134 Choose-Your-Own-Adventure games containing over half a million rich, diverse scenarios that center on social decision-making. Scenario labeling is automated with LMs, which are more performant than human annotators. We mathematize dozens of harmful behaviors and use our annotations to evaluate agents' tendencies to be power-seeking, cause disutility, and commit ethical violations. We observe some tension between maximizing reward and behaving ethically. To improve this trade-off, we investigate LM-based methods to steer agents' towards less harmful behaviors. Our results show that agents can both act competently and morally, so concrete progress can currently be made in machine ethics--designing agents that are Pareto improvements in both safety and capabilities.
    Teaching Matters: Investigating the Role of Supervision in Vision Transformers. (arXiv:2212.03862v2 [cs.CV] UPDATED)
    Vision Transformers (ViTs) have gained significant popularity in recent years and have proliferated into many applications. However, their behavior under different learning paradigms is not well explored. We compare ViTs trained through different methods of supervision, and show that they learn a diverse range of behaviors in terms of their attention, representations, and downstream performance. We also discover ViT behaviors that are consistent across supervision, including the emergence of Offset Local Attention Heads. These are self-attention heads that attend to a token adjacent to the current token with a fixed directional offset, a phenomenon that to the best of our knowledge has not been highlighted in any prior work. Our analysis shows that ViTs are highly flexible and learn to process local and global information in different orders depending on their training method. We find that contrastive self-supervised methods learn features that are competitive with explicitly supervised features, and they can even be superior for part-level tasks. We also find that the representations of reconstruction-based models show non-trivial similarity to contrastive self-supervised models. Project website (https://www.cs.umd.edu/~sakshams/vit_analysis) and code (https://www.github.com/mwalmer-umd/vit_analysis) are publicly available.
    DexDeform: Dexterous Deformable Object Manipulation with Human Demonstrations and Differentiable Physics. (arXiv:2304.03223v1 [cs.CV])
    In this work, we aim to learn dexterous manipulation of deformable objects using multi-fingered hands. Reinforcement learning approaches for dexterous rigid object manipulation would struggle in this setting due to the complexity of physics interaction with deformable objects. At the same time, previous trajectory optimization approaches with differentiable physics for deformable manipulation would suffer from local optima caused by the explosion of contact modes from hand-object interactions. To address these challenges, we propose DexDeform, a principled framework that abstracts dexterous manipulation skills from human demonstration and refines the learned skills with differentiable physics. Concretely, we first collect a small set of human demonstrations using teleoperation. And we then train a skill model using demonstrations for planning over action abstractions in imagination. To explore the goal space, we further apply augmentations to the existing deformable shapes in demonstrations and use a gradient optimizer to refine the actions planned by the skill model. Finally, we adopt the refined trajectories as new demonstrations for finetuning the skill model. To evaluate the effectiveness of our approach, we introduce a suite of six challenging dexterous deformable object manipulation tasks. Compared with baselines, DexDeform is able to better explore and generalize across novel goals unseen in the initial human demonstrations.
    Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search. (arXiv:2206.00702v8 [cs.AI] UPDATED)
    Complex reasoning problems contain states that vary in the computational cost required to determine a good action plan. Taking advantage of this property, we propose Adaptive Subgoal Search (AdaSubS), a search method that adaptively adjusts the planning horizon. To this end, AdaSubS generates diverse sets of subgoals at different distances. A verification mechanism is employed to filter out unreachable subgoals swiftly, allowing to focus on feasible further subgoals. In this way, AdaSubS benefits from the efficiency of planning with longer subgoals and the fine control with the shorter ones, and thus scales well to difficult planning problems. We show that AdaSubS significantly surpasses hierarchical planning algorithms on three complex reasoning tasks: Sokoban, the Rubik's Cube, and inequality proving benchmark INT.
    DiffMimic: Efficient Motion Mimicking with Differentiable Physics. (arXiv:2304.03274v1 [cs.CV])
    Motion mimicking is a foundational task in physics-based character animation. However, most existing motion mimicking methods are built upon reinforcement learning (RL) and suffer from heavy reward engineering, high variance, and slow convergence with hard explorations. Specifically, they usually take tens of hours or even days of training to mimic a simple motion sequence, resulting in poor scalability. In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed DiffMimic. Our key insight is that DPS casts a complex policy learning task to a much simpler state matching problem. In particular, DPS learns a stable policy by analytical gradients with ground-truth physical priors hence leading to significantly faster and stabler convergence than RL-based methods. Moreover, to escape from local optima, we utilize a Demonstration Replay mechanism to enable stable gradient backpropagation in a long horizon. Extensive experiments on standard benchmarks show that DiffMimic has a better sample efficiency and time efficiency than existing methods (e.g., DeepMimic). Notably, DiffMimic allows a physically simulated character to learn Backflip after 10 minutes of training and be able to cycle it after 3 hours of training, while the existing approach may require about a day of training to cycle Backflip. More importantly, we hope DiffMimic can benefit more differentiable animation systems with techniques like differentiable clothes simulation in future research.
    Road Network Guided Fine-Grained Urban Traffic Flow Inference. (arXiv:2109.14251v2 [cs.LG] UPDATED)
    Accurate inference of fine-grained traffic flow from coarse-grained one is an emerging yet crucial problem, which can help greatly reduce the number of the required traffic monitoring sensors for cost savings. In this work, we notice that traffic flow has a high correlation with road network, which was either completely ignored or simply treated as an external factor in previous works.To facilitate this problem, we propose a novel Road-Aware Traffic Flow Magnifier (RATFM) that explicitly exploits the prior knowledge of road networks to fully learn the road-aware spatial distribution of fine-grained traffic flow. Specifically, a multi-directional 1D convolutional layer is first introduced to extract the semantic feature of the road network. Subsequently, we incorporate the road network feature and coarse-grained flow feature to regularize the short-range spatial distribution modeling of road-relative traffic flow. Furthermore, we take the road network feature as a query to capture the long-range spatial distribution of traffic flow with a transformer architecture. Benefiting from the road-aware inference mechanism, our method can generate high-quality fine-grained traffic flow maps. Extensive experiments on three real-world datasets show that the proposed RATFM outperforms state-of-the-art models under various scenarios. Our code and datasets are released at {\url{https://github.com/luimoli/RATFM}}.
    Primal Estimated Subgradient Solver for SVM for Imbalanced Classification. (arXiv:2206.09311v4 [cs.LG] UPDATED)
    We aim to demonstrate in experiments that our cost sensitive PEGASOS SVM achieves good performance on imbalanced data sets with a Majority to Minority Ratio ranging from 8.6:1 to 130:1 and to ascertain whether the including intercept (bias), regularization and parameters affects performance on our selection of datasets. Although many resort to SMOTE methods, we aim for a less computationally intensive method. We evaluate the performance by examining the learning curves. These curves diagnose whether we overfit or underfit or we choose over representative or under representative training/test data. We will also see the background of the hyperparameters versus the test and train error in validation curves. We benchmark our PEGASOS Cost-Sensitive SVM's results of Ding's LINEAR SVM DECIDL method. He obtained an ROC-AUC of .5 in one dataset. Our work will extend the work of Ding by incorporating kernels into SVM. We will use Python rather than MATLAB as python has dictionaries for storing mixed data types during multi-parameter cross-validation.
    Behavioral estimates of conceptual structure are robust across tasks in humans but not large language models. (arXiv:2304.02754v1 [cs.AI])
    Neural network models of language have long been used as a tool for developing hypotheses about conceptual representation in the mind and brain. For many years, such use involved extracting vector-space representations of words and using distances among these to predict or understand human behavior in various semantic tasks. In contemporary language AIs, however, it is possible to interrogate the latent structure of conceptual representations using methods nearly identical to those commonly used with human participants. The current work uses two common techniques borrowed from cognitive psychology to estimate and compare lexical-semantic structure in both humans and a well-known AI, the DaVinci variant of GPT-3. In humans, we show that conceptual structure is robust to differences in culture, language, and method of estimation. Structures estimated from AI behavior, while individually fairly consistent with those estimated from human behavior, depend much more upon the particular task used to generate behavior responses--responses generated by the very same model in the two tasks yield estimates of conceptual structure that cohere less with one another than do human structure estimates. The results suggest one important way that knowledge inhering in contemporary AIs can differ from human cognition.  ( 3 min )
    HomPINNs: homotopy physics-informed neural networks for solving the inverse problems of nonlinear differential equations with multiple solutions. (arXiv:2304.02811v1 [cs.LG])
    Due to the complex behavior arising from non-uniqueness, symmetry, and bifurcations in the solution space, solving inverse problems of nonlinear differential equations (DEs) with multiple solutions is a challenging task. To address this issue, we propose homotopy physics-informed neural networks (HomPINNs), a novel framework that leverages homotopy continuation and neural networks (NNs) to solve inverse problems. The proposed framework begins with the use of a NN to simultaneously approximate known observations and conform to the constraints of DEs. By utilizing the homotopy continuation method, the approximation traces the observations to identify multiple solutions and solve the inverse problem. The experiments involve testing the performance of the proposed method on one-dimensional DEs and applying it to solve a two-dimensional Gray-Scott simulation. Our findings demonstrate that the proposed method is scalable and adaptable, providing an effective solution for solving DEs with multiple solutions and unknown parameters. Moreover, it has significant potential for various applications in scientific computing, such as modeling complex systems and solving inverse problems in physics, chemistry, biology, etc.  ( 3 min )
    UniASM: Binary Code Similarity Detection without Fine-tuning. (arXiv:2211.01144v3 [cs.CR] UPDATED)
    Binary code similarity detection (BCSD) is widely used in various binary analysis tasks such as vulnerability search, malware detection, clone detection, and patch analysis. Recent studies have shown that the learning-based binary code embedding models perform better than the traditional feature-based approaches. In this paper, we propose a novel transformer-based binary code embedding model named UniASM to learn representations of the binary functions. We design two new training tasks to make the spatial distribution of the generated vectors more uniform, which can be used directly in BCSD without any fine-tuning. In addition, we present a new tokenization approach for binary functions, which increases the token's semantic information and mitigates the out-of-vocabulary (OOV) problem. We conduct an in-depth analysis of the factors affecting model performance through ablation experiments and obtain some new and valuable findings. The experimental results show that UniASM outperforms the state-of-the-art (SOTA) approach on the evaluation dataset. The average scores of Recall@1 on cross-compilers, cross-optimization levels, and cross-obfuscations are 0.77, 0.72, and 0.72. Besides, in the real-world task of known vulnerability search, UniASM outperforms all the current baselines.
    Principal component analysis for Gaussian process posteriors. (arXiv:2107.07115v2 [stat.ML] UPDATED)
    This paper proposes an extension of principal component analysis for Gaussian process (GP) posteriors, denoted by GP-PCA. Since GP-PCA estimates a low-dimensional space of GP posteriors, it can be used for meta-learning, which is a framework for improving the performance of target tasks by estimating a structure of a set of tasks. The issue is how to define a structure of a set of GPs with an infinite-dimensional parameter, such as coordinate system and a divergence. In this study, we reduce the infiniteness of GP to the finite-dimensional case under the information geometrical framework by considering a space of GP posteriors that have the same prior. In addition, we propose an approximation method of GP-PCA based on variational inference and demonstrate the effectiveness of GP-PCA as meta-learning through experiments.
    Sequential Adversarial Anomaly Detection for One-Class Event Data. (arXiv:1910.09161v5 [stat.ML] UPDATED)
    We consider the sequential anomaly detection problem in the one-class setting when only the anomalous sequences are available and propose an adversarial sequential detector by solving a minimax problem to find an optimal detector against the worst-case sequences from a generator. The generator captures the dependence in sequential events using the marked point process model. The detector sequentially evaluates the likelihood of a test sequence and compares it with a time-varying threshold, also learned from data through the minimax problem. We demonstrate our proposed method's good performance using numerical experiments on simulations and proprietary large-scale credit card fraud datasets. The proposed method can generally apply to detecting anomalous sequences.
    Krylov Methods are (nearly) Optimal for Low-Rank Approximation. (arXiv:2304.03191v1 [cs.DS])
    We consider the problem of rank-$1$ low-rank approximation (LRA) in the matrix-vector product model under various Schatten norms: $$ \min_{\|u\|_2=1} \|A (I - u u^\top)\|_{\mathcal{S}_p} , $$ where $\|M\|_{\mathcal{S}_p}$ denotes the $\ell_p$ norm of the singular values of $M$. Given $\varepsilon>0$, our goal is to output a unit vector $v$ such that $$ \|A(I - vv^\top)\|_{\mathcal{S}_p} \leq (1+\varepsilon) \min_{\|u\|_2=1}\|A(I - u u^\top)\|_{\mathcal{S}_p}. $$ Our main result shows that Krylov methods (nearly) achieve the information-theoretically optimal number of matrix-vector products for Spectral ($p=\infty$), Frobenius ($p=2$) and Nuclear ($p=1$) LRA. In particular, for Spectral LRA, we show that any algorithm requires $\Omega\left(\log(n)/\varepsilon^{1/2}\right)$ matrix-vector products, exactly matching the upper bound obtained by Krylov methods [MM15, BCW22]. Our lower bound addresses Open Question 1 in [Woo14], providing evidence for the lack of progress on algorithms for Spectral LRA and resolves Open Question 1.2 in [BCW22]. Next, we show that for any fixed constant $p$, i.e. $1\leq p =O(1)$, there is an upper bound of $O\left(\log(1/\varepsilon)/\varepsilon^{1/3}\right)$ matrix-vector products, implying that the complexity does not grow as a function of input size. This improves the $O\left(\log(n/\varepsilon)/\varepsilon^{1/3}\right)$ bound recently obtained in [BCW22], and matches their $\Omega\left(1/\varepsilon^{1/3}\right)$ lower bound, to a $\log(1/\varepsilon)$ factor.
    Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning. (arXiv:2109.12021v5 [cs.AR] UPDATED)
    Past research has proposed numerous hardware prefetching techniques, most of which rely on exploiting one specific type of program context information (e.g., program counter, cacheline address) to predict future memory accesses. These techniques either completely neglect a prefetcher's undesirable effects (e.g., memory bandwidth usage) on the overall system, or incorporate system-level feedback as an afterthought to a system-unaware prefetch algorithm. We show that prior prefetchers often lose their performance benefit over a wide range of workloads and system configurations due to their inherent inability to take multiple different types of program context and system-level feedback information into account while prefetching. In this paper, we make a case for designing a holistic prefetch algorithm that learns to prefetch using multiple different types of program context and system-level feedback information inherent to its design. To this end, we propose Pythia, which formulates the prefetcher as a reinforcement learning agent. For every demand request, Pythia observes multiple different types of program context information to make a prefetch decision. For every prefetch decision, Pythia receives a numerical reward that evaluates prefetch quality under the current memory bandwidth usage. Pythia uses this reward to reinforce the correlation between program context information and prefetch decision to generate highly accurate, timely, and system-aware prefetch requests in the future. Our extensive evaluations using simulation and hardware synthesis show that Pythia outperforms multiple state-of-the-art prefetchers over a wide range of workloads and system configurations, while incurring only 1.03% area overhead over a desktop-class processor and no software changes in workloads. The source code of Pythia can be freely downloaded from https://github.com/CMU-SAFARI/Pythia.
    Robust Upper Bounds for Adversarial Training. (arXiv:2112.09279v2 [cs.LG] UPDATED)
    Many state-of-the-art adversarial training methods for deep learning leverage upper bounds of the adversarial loss to provide security guarantees against adversarial attacks. Yet, these methods rely on convex relaxations to propagate lower and upper bounds for intermediate layers, which affect the tightness of the bound at the output layer. We introduce a new approach to adversarial training by minimizing an upper bound of the adversarial loss that is based on a holistic expansion of the network instead of separate bounds for each layer. This bound is facilitated by state-of-the-art tools from Robust Optimization; it has closed-form and can be effectively trained using backpropagation. We derive two new methods with the proposed approach. The first method (Approximated Robust Upper Bound or aRUB) uses the first order approximation of the network as well as basic tools from Linear Robust Optimization to obtain an empirical upper bound of the adversarial loss that can be easily implemented. The second method (Robust Upper Bound or RUB), computes a provable upper bound of the adversarial loss. Across a variety of tabular and vision data sets we demonstrate the effectiveness of our approach -- RUB is substantially more robust than state-of-the-art methods for larger perturbations, while aRUB matches the performance of state-of-the-art methods for small perturbations.
    Causal Discovery with Score Matching on Additive Models with Arbitrary Noise. (arXiv:2304.03265v1 [cs.LG])
    Causal discovery methods are intrinsically constrained by the set of assumptions needed to ensure structure identifiability. Moreover additional restrictions are often imposed in order to simplify the inference task: this is the case for the Gaussian noise assumption on additive non-linear models, which is common to many causal discovery approaches. In this paper we show the shortcomings of inference under this hypothesis, analyzing the risk of edge inversion under violation of Gaussianity of the noise terms. Then, we propose a novel method for inferring the topological ordering of the variables in the causal graph, from data generated according to an additive non-linear model with a generic noise distribution. This leads to NoGAM (Not only Gaussian Additive noise Models), a causal discovery algorithm with a minimal set of assumptions and state of the art performance, experimentally benchmarked on synthetic data.
    Spectral-Spatial Global Graph Reasoning for Hyperspectral Image Classification. (arXiv:2106.13952v2 [cs.CV] UPDATED)
    Convolutional neural networks have been widely applied to hyperspectral image classification. However, traditional convolutions can not effectively extract features for objects with irregular distributions. Recent methods attempt to address this issue by performing graph convolutions on spatial topologies, but fixed graph structures and local perceptions limit their performances. To tackle these problems, in this paper, different from previous approaches, we perform the superpixel generation on intermediate features during network training to adaptively produce homogeneous regions, obtain graph structures, and further generate spatial descriptors, which are served as graph nodes. Besides spatial objects, we also explore the graph relationships between channels by reasonably aggregating channels to generate spectral descriptors. The adjacent matrices in these graph convolutions are obtained by considering the relationships among all descriptors to realize global perceptions. By combining the extracted spatial and spectral graph features, we finally obtain a spectral-spatial graph reasoning network (SSGRN). The spatial and spectral parts of SSGRN are separately called spatial and spectral graph reasoning subnetworks. Comprehensive experiments on four public datasets demonstrate the competitiveness of the proposed methods compared with other state-of-the-art graph convolution-based approaches.
    Efficient Audio Captioning Transformer with Patchout and Text Guidance. (arXiv:2304.02916v1 [cs.SD])
    Automated audio captioning is multi-modal translation task that aim to generate textual descriptions for a given audio clip. In this paper we propose a full Transformer architecture that utilizes Patchout as proposed in [1], significantly reducing the computational complexity and avoiding overfitting. The caption generation is partly conditioned on textual AudioSet tags extracted by a pre-trained classification model which is fine-tuned to maximize the semantic similarity between AudioSet labels and ground truth captions. To mitigate the data scarcity problem of Automated Audio Captioning we introduce transfer learning from an upstream audio-related task and an enlarged in-domain dataset. Moreover, we propose a method to apply Mixup augmentation for AAC. Ablation studies are carried out to investigate how Patchout and text guidance contribute to the final performance. The results show that the proposed techniques improve the performance of our system and while reducing the computational complexity. Our proposed method received the Judges Award at the Task6A of DCASE Challenge 2022.  ( 2 min )
    Spectral Gap Regularization of Neural Networks. (arXiv:2304.03096v1 [stat.ML])
    We introduce Fiedler regularization, a novel approach for regularizing neural networks that utilizes spectral/graphical information. Existing regularization methods often focus on penalizing weights in a global/uniform manner that ignores the connectivity structure of the neural network. We propose to use the Fiedler value of the neural network's underlying graph as a tool for regularization. We provide theoretical motivation for this approach via spectral graph theory. We demonstrate several useful properties of the Fiedler value that make it useful as a regularization tool. We provide an approximate, variational approach for faster computation during training. We provide an alternative formulation of this framework in the form of a structurally weighted $\text{L}_1$ penalty, thus linking our approach to sparsity induction. We provide uniform generalization error bounds for Fiedler regularization via a Rademacher complexity analysis. We performed experiments on datasets that compare Fiedler regularization with classical regularization methods such as dropout and weight decay. Results demonstrate the efficacy of Fiedler regularization. This is a journal extension of the conference paper by Tam and Dunson (2020).
    A Bayesian Framework for Causal Analysis of Recurrent Events in Presence of Immortal Risk. (arXiv:2304.03247v1 [stat.ME])
    Observational studies of recurrent event rates are common in biomedical statistics. Broadly, the goal is to estimate differences in event rates under two treatments within a defined target population over a specified followup window. Estimation with observational claims data is challenging because while membership in the target population is defined in terms of eligibility criteria, treatment is rarely assigned exactly at the time of eligibility. Ad-hoc solutions to this timing misalignment, such as assigning treatment at eligibility based on subsequent assignment, incorrectly attribute prior event rates to treatment - resulting in immortal risk bias. Even if eligibility and treatment are aligned, a terminal event process (e.g. death) often stops the recurrent event process of interest. Both processes are also censored so that events are not observed over the entire followup window. Our approach addresses misalignment by casting it as a treatment switching problem: some patients are on treatment at eligibility while others are off treatment but may switch to treatment at a specified time - if they survive long enough. We define and identify an average causal effect of switching under specified causal assumptions. Estimation is done using a g-computation framework with a joint semiparametric Bayesian model for the death and recurrent event processes. Computing the estimand for various switching times allows us to assess the impact of treatment timing. We apply the method to contrast hospitalization rates under different opioid treatment strategies among patients with chronic back pain using Medicare claims data.
    Spritz-PS: Validation of Synthetic Face Images Using a Large Dataset of Printed Documents. (arXiv:2304.02982v1 [cs.CV])
    The capability of doing effective forensic analysis on printed and scanned (PS) images is essential in many applications. PS documents may be used to conceal the artifacts of images which is due to the synthetic nature of images since these artifacts are typically present in manipulated images and the main artifacts in the synthetic images can be removed after the PS. Due to the appeal of Generative Adversarial Networks (GANs), synthetic face images generated with GANs models are difficult to differentiate from genuine human faces and may be used to create counterfeit identities. Additionally, since GANs models do not account for physiological constraints for generating human faces and their impact on human IRISes, distinguishing genuine from synthetic IRISes in the PS scenario becomes extremely difficult. As a result of the lack of large-scale reference IRIS datasets in the PS scenario, we aim at developing a novel dataset to become a standard for Multimedia Forensics (MFs) investigation which is available at [45]. In this paper, we provide a novel dataset made up of a large number of synthetic and natural printed IRISes taken from VIPPrint Printed and Scanned face images. We extracted irises from face images and it is possible that the model due to eyelid occlusion captured the incomplete irises. To fill the missing pixels of extracted iris, we applied techniques to discover the complex link between the iris images. To highlight the problems involved with the evaluation of the dataset's IRIS images, we conducted a large number of analyses employing Siamese Neural Networks to assess the similarities between genuine and synthetic human IRISes, such as ResNet50, Xception, VGG16, and MobileNet-v2. For instance, using the Xception network, we achieved 56.76\% similarity of IRISes for synthetic images and 92.77% similarity of IRISes for real images.  ( 3 min )
    The Bayan Algorithm: Detecting Communities in Networks Through Exact and Approximate Optimization of Modularity. (arXiv:2209.04562v2 [cs.SI] UPDATED)
    Community detection is a classic problem in network science with extensive applications in various fields. Among numerous approaches, the most common method is modularity maximization. Despite their design philosophy and wide adoption, heuristic modularity maximization algorithms rarely return an optimal partition or anything similar. We propose a specialized algorithm, Bayan, which returns partitions with a guarantee of either optimality or proximity to an optimal partition. At the core of the Bayan algorithm is a branch-and-cut scheme that solves an integer programming formulation of the problem to optimality or approximate it within a factor. We demonstrate Bayan's distinctive accuracy and stability over 21 other algorithms in retrieving ground-truth communities in synthetic benchmarks and node labels in real networks. Bayan is several times faster than open-source and commercial solvers for modularity maximization making it capable of finding optimal partitions for instances that cannot be optimized by any other existing method. Overall, our assessments point to Bayan as a suitable choice for exact maximization of modularity in networks with up to 3000 edges (in their largest connected component) and approximating maximum modularity in larger networks on ordinary computers.
    Non-contrastive representation learning for intervals from well logs. (arXiv:2209.14750v2 [cs.AI] UPDATED)
    The representation learning problem in the oil & gas industry aims to construct a model that provides a representation based on logging data for a well interval. Previous attempts are mainly supervised and focus on similarity task, which estimates closeness between intervals. We desire to build informative representations without using supervised (labelled) data. One of the possible approaches is self-supervised learning (SSL). In contrast to the supervised paradigm, this one requires little or no labels for the data. Nowadays, most SSL approaches are either contrastive or non-contrastive. Contrastive methods make representations of similar (positive) objects closer and distancing different (negative) ones. Due to possible wrong marking of positive and negative pairs, these methods can provide an inferior performance. Non-contrastive methods don't rely on such labelling and are widespread in computer vision. They learn using only pairs of similar objects that are easier to identify in logging data. We are the first to introduce non-contrastive SSL for well-logging data. In particular, we exploit Bootstrap Your Own Latent (BYOL) and Barlow Twins methods that avoid using negative pairs and focus only on matching positive pairs. The crucial part of these methods is an augmentation strategy. Our augmentation strategies and adaption of BYOL and Barlow Twins together allow us to achieve superior quality on clusterization and mostly the best performance on different classification tasks. Our results prove the usefulness of the proposed non-contrastive self-supervised approaches for representation learning and interval similarity in particular.
    Revolutionizing Single Cell Analysis: The Power of Large Language Models for Cell Type Annotation. (arXiv:2304.02697v1 [q-bio.GN])
    In recent years, single cell RNA sequencing has become a widely used technique to study cellular diversity and function. However, accurately annotating cell types from single cell data has been a challenging task, as it requires extensive knowledge of cell biology and gene function. The emergence of large language models such as ChatGPT and New Bing in 2023 has revolutionized this process by integrating the scientific literature and providing accurate annotations of cell types. This breakthrough enables researchers to conduct literature reviews more efficiently and accurately, and can potentially uncover new insights into cell type annotation. By using ChatGPT to annotate single cell data, we can relate rare cell type to their function and reveal specific differentiation trajectories of cell subtypes that were previously overlooked. This can have important applications in understanding cancer progression, mammalian development, and stem cell differentiation, and can potentially lead to the discovery of key cells that interrupt the differentiation pathway and solve key problems in the life sciences. Overall, the future of cell type annotation in single cell data looks promising and the Large Language model will be an important milestone in the history of single cell analysis.  ( 2 min )
    PREF: Predictability Regularized Neural Motion Fields. (arXiv:2209.10691v2 [cs.CV] UPDATED)
    Knowing the 3D motions in a dynamic scene is essential to many vision applications. Recent progress is mainly focused on estimating the activity of some specific elements like humans. In this paper, we leverage a neural motion field for estimating the motion of all points in a multiview setting. Modeling the motion from a dynamic scene with multiview data is challenging due to the ambiguities in points of similar color and points with time-varying color. We propose to regularize the estimated motion to be predictable. If the motion from previous frames is known, then the motion in the near future should be predictable. Therefore, we introduce a predictability regularization by first conditioning the estimated motion on latent embeddings, then by adopting a predictor network to enforce predictability on the embeddings. The proposed framework PREF (Predictability REgularized Fields) achieves on par or better results than state-of-the-art neural motion field-based dynamic scene representation methods, while requiring no prior knowledge of the scene.
    TRBoost: A Generic Gradient Boosting Machine based on Trust-region Method. (arXiv:2209.13791v3 [cs.LG] UPDATED)
    Gradient Boosting Machines (GBMs) have demonstrated remarkable success in solving diverse problems by utilizing Taylor expansions in functional space. However, achieving a balance between performance and generality has posed a challenge for GBMs. In particular, gradient descent-based GBMs employ the first-order Taylor expansion to ensure applicability to all loss functions, while Newton's method-based GBMs use positive Hessian information to achieve superior performance at the expense of generality. To address this issue, this study proposes a new generic Gradient Boosting Machine called Trust-region Boosting (TRBoost). In each iteration, TRBoost uses a constrained quadratic model to approximate the objective and applies the Trust-region algorithm to solve it and obtain a new learner. Unlike Newton's method-based GBMs, TRBoost does not require the Hessian to be positive definite, thereby allowing it to be applied to arbitrary loss functions while still maintaining competitive performance similar to second-order algorithms. The convergence analysis and numerical experiments conducted in this study confirm that TRBoost is as general as first-order GBMs and yields competitive results compared to second-order GBMs. Overall, TRBoost is a promising approach that balances performance and generality, making it a valuable addition to the toolkit of machine learning practitioners.
    The Concept of Forward-Forward Learning Applied to a Multi Output Perceptron. (arXiv:2304.03189v1 [cs.LG])
    The concept of a recently proposed Forward-Forward learning algorithm for fully connected artificial neural networks is applied to a single multi output perceptron for classification. The parameters of the system are trained with respect to increased (decreased) "goodness" for correctly (incorrectly) labelled input samples. Basic numerical tests demonstrate that the trained perceptron effectively deals with data sets that have non-linear decision boundaries. Moreover, the overall performance is comparable to more complex neural networks with hidden layers. The benefit of the approach presented here is that it only involves a single matrix multiplication.
    The Complexity of Dynamic Least-Squares Regression. (arXiv:2201.00228v2 [cs.DS] UPDATED)
    We settle the complexity of dynamic least-squares regression (LSR), where rows and labels $(\mathbf{A}^{(t)}, \mathbf{b}^{(t)})$ can be adaptively inserted and/or deleted, and the goal is to efficiently maintain an $\epsilon$-approximate solution to $\min_{\mathbf{x}^{(t)}} \| \mathbf{A}^{(t)} \mathbf{x}^{(t)} - \mathbf{b}^{(t)} \|_2$ for all $t\in [T]$. We prove sharp separations ($d^{2-o(1)}$ vs. $\sim d$) between the amortized update time of: (i) Fully vs. Partially dynamic $0.01$-LSR; (ii) High vs. low-accuracy LSR in the partially-dynamic (insertion-only) setting. Our lower bounds follow from a gap-amplification reduction -- reminiscent of iterative refinement -- rom the exact version of the Online Matrix Vector Conjecture (OMv) [HKNS15], to constant approximate OMv over the reals, where the $i$-th online product $\mathbf{H}\mathbf{v}^{(i)}$ only needs to be computed to $0.1$-relative error. All previous fine-grained reductions from OMv to its approximate versions only show hardness for inverse polynomial approximation $\epsilon = n^{-\omega(1)}$ (additive or multiplicative) . This result is of independent interest in fine-grained complexity and for the investigation of the OMv Conjecture, which is still widely open.
    Accurate Node Feature Estimation with Structured Variational Graph Autoencoder. (arXiv:2206.04516v2 [cs.LG] UPDATED)
    Given a graph with partial observations of node features, how can we estimate the missing features accurately? Feature estimation is a crucial problem for analyzing real-world graphs whose features are commonly missing during the data collection process. Accurate estimation not only provides diverse information of nodes but also supports the inference of graph neural networks that require the full observation of node features. However, designing an effective approach for estimating high-dimensional features is challenging, since it requires an estimator to have large representation power, increasing the risk of overfitting. In this work, we propose SVGA (Structured Variational Graph Autoencoder), an accurate method for feature estimation. SVGA applies strong regularization to the distribution of latent variables by structured variational inference, which models the prior of variables as Gaussian Markov random field based on the graph structure. As a result, SVGA combines the advantages of probabilistic inference and graph neural networks, achieving state-of-the-art performance in real datasets.
    Bayesian Optimization with Conformal Prediction Sets. (arXiv:2210.12496v3 [cs.LG] UPDATED)
    Bayesian optimization is a coherent, ubiquitous approach to decision-making under uncertainty, with applications including multi-arm bandits, active learning, and black-box optimization. Bayesian optimization selects decisions (i.e. objective function queries) with maximal expected utility with respect to the posterior distribution of a Bayesian model, which quantifies reducible, epistemic uncertainty about query outcomes. In practice, subjectively implausible outcomes can occur regularly for two reasons: 1) model misspecification and 2) covariate shift. Conformal prediction is an uncertainty quantification method with coverage guarantees even for misspecified models and a simple mechanism to correct for covariate shift. We propose conformal Bayesian optimization, which directs queries towards regions of search space where the model predictions have guaranteed validity, and investigate its behavior on a suite of black-box optimization tasks and tabular ranking tasks. In many cases we find that query coverage can be significantly improved without harming sample-efficiency.
    Hierarchical Graph Neural Network with Cross-Attention for Cross-Device User Matching. (arXiv:2304.03215v1 [cs.LG])
    Cross-device user matching is a critical problem in numerous domains, including advertising, recommender systems, and cybersecurity. It involves identifying and linking different devices belonging to the same person, utilizing sequence logs. Previous data mining techniques have struggled to address the long-range dependencies and higher-order connections between the logs. Recently, researchers have modeled this problem as a graph problem and proposed a two-tier graph contextual embedding (TGCE) neural network architecture, which outperforms previous methods. In this paper, we propose a novel hierarchical graph neural network architecture (HGNN), which has a more computationally efficient second level design than TGCE. Furthermore, we introduce a cross-attention (Cross-Att) mechanism in our model, which improves performance by 5% compared to the state-of-the-art TGCE method.
    Synthetic Data in Healthcare. (arXiv:2304.03243v1 [cs.AI])
    Synthetic data are becoming a critical tool for building artificially intelligent systems. Simulators provide a way of generating data systematically and at scale. These data can then be used either exclusively, or in conjunction with real data, for training and testing systems. Synthetic data are particularly attractive in cases where the availability of ``real'' training examples might be a bottleneck. While the volume of data in healthcare is growing exponentially, creating datasets for novel tasks and/or that reflect a diverse set of conditions and causal relationships is not trivial. Furthermore, these data are highly sensitive and often patient specific. Recent research has begun to illustrate the potential for synthetic data in many areas of medicine, but no systematic review of the literature exists. In this paper, we present the cases for physical and statistical simulations for creating data and the proposed applications in healthcare and medicine. We discuss that while synthetics can promote privacy, equity, safety and continual and causal learning, they also run the risk of introducing flaws, blind spots and propagating or exaggerating biases.
    Convolutional Neural Networks with Intermediate Loss for 3D Super-Resolution of CT and MRI Scans. (arXiv:2001.01330v4 [cs.CV] UPDATED)
    CT scanners that are commonly-used in hospitals nowadays produce low-resolution images, up to 512 pixels in size. One pixel in the image corresponds to a one millimeter piece of tissue. In order to accurately segment tumors and make treatment plans, doctors need CT scans of higher resolution. The same problem appears in MRI. In this paper, we propose an approach for the single-image super-resolution of 3D CT or MRI scans. Our method is based on deep convolutional neural networks (CNNs) composed of 10 convolutional layers and an intermediate upscaling layer that is placed after the first 6 convolutional layers. Our first CNN, which increases the resolution on two axes (width and height), is followed by a second CNN, which increases the resolution on the third axis (depth). Different from other methods, we compute the loss with respect to the ground-truth high-resolution output right after the upscaling layer, in addition to computing the loss after the last convolutional layer. The intermediate loss forces our network to produce a better output, closer to the ground-truth. A widely-used approach to obtain sharp results is to add Gaussian blur using a fixed standard deviation. In order to avoid overfitting to a fixed standard deviation, we apply Gaussian smoothing with various standard deviations, unlike other approaches. We evaluate our method in the context of 2D and 3D super-resolution of CT and MRI scans from two databases, comparing it to relevant related works from the literature and baselines based on various interpolation schemes, using 2x and 4x scaling factors. The empirical results show that our approach attains superior results to all other methods. Moreover, our human annotation study reveals that both doctors and regular annotators chose our method in favor of Lanczos interpolation in 97.55% cases for 2x upscaling factor and in 96.69% cases for 4x upscaling factor.
    Learning to Learn with Indispensable Connections. (arXiv:2304.02862v1 [cs.LG])
    Meta-learning aims to solve unseen tasks with few labelled instances. Nevertheless, despite its effectiveness for quick learning in existing optimization-based methods, it has several flaws. Inconsequential connections are frequently seen during meta-training, which results in an over-parameterized neural network. Because of this, meta-testing observes unnecessary computations and extra memory overhead. To overcome such flaws. We propose a novel meta-learning method called Meta-LTH that includes indispensible (necessary) connections. We applied the lottery ticket hypothesis technique known as magnitude pruning to generate these crucial connections that can effectively solve few-shot learning problem. We aim to perform two things: (a) to find a sub-network capable of more adaptive meta-learning and (b) to learn new low-level features of unseen tasks and recombine those features with the already learned features during the meta-test phase. Experimental results show that our proposed Met-LTH method outperformed existing first-order MAML algorithm for three different classification datasets. Our method improves the classification accuracy by approximately 2% (20-way 1-shot task setting) for omniglot dataset.
    Synthetic Hard Negative Samples for Contrastive Learning. (arXiv:2304.02971v1 [cs.CV])
    Contrastive learning has emerged as an essential approach for self-supervised learning in computer vision. The central objective of contrastive learning is to maximize the similarities between two augmented versions of the same image (positive pairs), while minimizing the similarities between different images (negative pairs). Recent studies have demonstrated that harder negative samples, i.e., those that are difficult to distinguish from anchor sample, play a more critical role in contrastive learning. In this paper, we propose a novel featurelevel method, namely sampling synthetic hard negative samples for contrastive learning (SSCL), to exploit harder negative samples more effectively. Specifically, 1) we generate more and harder negative samples by mixing negative samples, and then sample them by controlling the contrast of anchor sample with the other negative samples. 2) Considering that the negative samples obtained by sampling may have the problem of false negative samples, we further debias the negative samples. Our proposed method improves the classification performance on different image datasets and can be readily applied to existing methods.
    Assessing the Reproducibility of Machine-learning-based Biomarker Discovery in Parkinson's Disease. (arXiv:2304.03239v1 [q-bio.GN])
    Genome-Wide Association Studies (GWAS) help identify genetic variations in people with diseases such as Parkinson's disease (PD), which are less common in those without the disease. Thus, GWAS data can be used to identify genetic variations associated with the disease. Feature selection and machine learning approaches can be used to analyze GWAS data and identify potential disease biomarkers. However, GWAS studies have technical variations that affect the reproducibility of identified biomarkers, such as differences in genotyping platforms and selection criteria for individuals to be genotyped. To address this issue, we collected five GWAS datasets from the database of Genotypes and Phenotypes (dbGaP) and explored several data integration strategies. We evaluated the agreement among different strategies in terms of the Single Nucleotide Polymorphisms (SNPs) that were identified as potential PD biomarkers. Our results showed a low concordance of biomarkers discovered using different datasets or integration strategies. However, we identified fifty SNPs that were identified at least twice, which could potentially serve as novel PD biomarkers. These SNPs are indirectly linked to PD in the literature but have not been directly associated with PD before. These findings open up new potential avenues of investigation.
    Constrained Exploration in Reinforcement Learning with Optimality Preservation. (arXiv:2304.03104v1 [cs.LG])
    We consider a class of reinforcement-learning systems in which the agent follows a behavior policy to explore a discrete state-action space to find an optimal policy while adhering to some restriction on its behavior. Such restriction may prevent the agent from visiting some state-action pairs, possibly leading to the agent finding only a sub-optimal policy. To address this problem we introduce the concept of constrained exploration with optimality preservation, whereby the exploration behavior of the agent is constrained to meet a specification while the optimality of the (original) unconstrained learning process is preserved. We first establish a feedback-control structure that models the dynamics of the unconstrained learning process. We then extend this structure by adding a supervisor to ensure that the behavior of the agent meets the specification, and establish (for a class of reinforcement-learning problems with a known deterministic environment) a necessary and sufficient condition under which optimality is preserved. This work demonstrates the utility and the prospect of studying reinforcement-learning problems in the context of the theories of discrete-event systems, automata and formal languages.
    Optimism and Delays in Episodic Reinforcement Learning. (arXiv:2111.07615v2 [cs.LG] UPDATED)
    There are many algorithms for regret minimisation in episodic reinforcement learning. This problem is well-understood from a theoretical perspective, providing that the sequences of states, actions and rewards associated with each episode are available to the algorithm updating the policy immediately after every interaction with the environment. However, feedback is almost always delayed in practice. In this paper, we study the impact of delayed feedback in episodic reinforcement learning from a theoretical perspective and propose two general-purpose approaches to handling the delays. The first involves updating as soon as new information becomes available, whereas the second waits before using newly observed information to update the policy. For the class of optimistic algorithms and either approach, we show that the regret increases by an additive term involving the number of states, actions, episode length, the expected delay and an algorithm-dependent constant. We empirically investigate the impact of various delay distributions on the regret of optimistic algorithms to validate our theoretical results.
    Deep Long-Short Term Memory networks: Stability properties and Experimental validation. (arXiv:2304.02975v1 [eess.SY])
    The aim of this work is to investigate the use of Incrementally Input-to-State Stable ($\delta$ISS) deep Long Short Term Memory networks (LSTMs) for the identification of nonlinear dynamical systems. We show that suitable sufficient conditions on the weights of the network can be leveraged to setup a training procedure able to learn provenly-$\delta$ISS LSTM models from data. The proposed approach is tested on a real brake-by-wire apparatus to identify a model of the system from input-output experimentally collected data. Results show satisfactory modeling performances.
    An experimental study in Real-time Facial Emotion Recognition on new 3RL dataset. (arXiv:2304.03064v1 [cs.CV])
    Although real-time facial emotion recognition is a hot topic research domain in the field of human-computer interaction, state-of the-art available datasets still suffer from various problems, such as some unrelated photos such as document photos, unbalanced numbers of photos in each class, and misleading images that can negatively affect correct classification. The 3RL dataset was created, which contains approximately 24K images and will be publicly available, to overcome previously available dataset problems. The 3RL dataset is labelled with five basic emotions: happiness, fear, sadness, disgust, and anger. Moreover, we compared the 3RL dataset with other famous state-of-the-art datasets (FER dataset, CK+ dataset), and we applied the most commonly used algorithms in previous works, SVM and CNN. The results show a noticeable improvement in generalization on the 3RL dataset. Experiments have shown an accuracy of up to 91.4% on 3RL dataset using CNN where results on FER2013, CK+ are, respectively (approximately from 60% to 85%).
    Implicit Anatomical Rendering for Medical Image Segmentation with Stochastic Experts. (arXiv:2304.03209v1 [cs.CV])
    Integrating high-level semantically correlated contents and low-level anatomical features is of central importance in medical image segmentation. Towards this end, recent deep learning-based medical segmentation methods have shown great promise in better modeling such information. However, convolution operators for medical segmentation typically operate on regular grids, which inherently blur the high-frequency regions, i.e., boundary regions. In this work, we propose MORSE, a generic implicit neural rendering framework designed at an anatomical level to assist learning in medical image segmentation. Our method is motivated by the fact that implicit neural representation has been shown to be more effective in fitting complex signals and solving computer graphics problems than discrete grid-based representation. The core of our approach is to formulate medical image segmentation as a rendering problem in an end-to-end manner. Specifically, we continuously align the coarse segmentation prediction with the ambiguous coordinate-based point representations and aggregate these features to adaptively refine the boundary region. To parallelly optimize multi-scale pixel-level features, we leverage the idea from Mixture-of-Expert (MoE) to design and train our MORSE with a stochastic gating mechanism. Our experiments demonstrate that MORSE can work well with different medical segmentation backbones, consistently achieving competitive performance improvements in both 2D and 3D supervised medical segmentation methods. We also theoretically analyze the superiority of MORSE.
    Spectral Toolkit of Algorithms for Graphs: Technical Report (1). (arXiv:2304.03170v1 [cs.SI])
    Spectral Toolkit of Algorithms for Graphs (STAG) is an open-source library for efficient spectral graph algorithms, and its development starts in September 2022. We have so far finished the component on local graph clustering, and this technical report presents a user's guide to STAG, showcase studies, and several technical considerations behind our development.
    Real2Sim2Real Transfer for Control of Cable-driven Robots via a Differentiable Physics Engine. (arXiv:2209.06261v3 [cs.RO] UPDATED)
    Tensegrity robots, composed of rigid rods and flexible cables, exhibit high strength-to-weight ratios and significant deformations, which enable them to navigate unstructured terrains and survive harsh impacts. They are hard to control, however, due to high dimensionality, complex dynamics, and a coupled architecture. Physics-based simulation is a promising avenue for developing locomotion policies that can be transferred to real robots. Nevertheless, modeling tensegrity robots is a complex task due to a substantial sim2real gap. To address this issue, this paper describes a Real2Sim2Real (R2S2R) strategy for tensegrity robots. This strategy is based on a differentiable physics engine that can be trained given limited data from a real robot. These data include offline measurements of physical properties, such as mass and geometry for various robot components, and the observation of a trajectory using a random control policy. With the data from the real robot, the engine can be iteratively refined and used to discover locomotion policies that are directly transferable to the real robot. Beyond the R2S2R pipeline, key contributions of this work include computing non-zero gradients at contact points, a loss function for matching tensegrity locomotion gaits, and a trajectory segmentation technique that avoids conflicts in gradient evaluation during training. Multiple iterations of the R2S2R process are demonstrated and evaluated on a real 3-bar tensegrity robot.
    Improving automatic endoscopic stone recognition using a multi-view fusion approach enhanced with two-step transfer learning. (arXiv:2304.03193v1 [eess.IV])
    This contribution presents a deep-learning method for extracting and fusing image information acquired from different viewpoints, with the aim to produce more discriminant object features for the identification of the type of kidney stones seen in endoscopic images. The model was further improved with a two-step transfer learning approach and by attention blocks to refine the learned feature maps. Deep feature fusion strategies improved the results of single view extraction backbone models by more than 6% in terms of accuracy of the kidney stones classification.
    Variable-Complexity Weighted-Tempered Gibbs Samplers for Bayesian Variable Selection. (arXiv:2304.02899v1 [stat.ML])
    Subset weighted-Tempered Gibbs Sampler (wTGS) has been recently introduced by Jankowiak to reduce the computation complexity per MCMC iteration in high-dimensional applications where the exact calculation of the posterior inclusion probabilities (PIP) is not essential. However, the Rao-Backwellized estimator associated with this sampler has a high variance as the ratio between the signal dimension and the number of conditional PIP estimations is large. In this paper, we design a new subset weighted-Tempered Gibbs Sampler (wTGS) where the expected number of computations of conditional PIPs per MCMC iteration can be much smaller than the signal dimension. Different from the subset wTGS and wTGS, our sampler has a variable complexity per MCMC iteration. We provide an upper bound on the variance of an associated Rao-Blackwellized estimator for this sampler at a finite number of iterations, $T$, and show that the variance is $O\big(\big(\frac{P}{S}\big)^2 \frac{\log T}{T}\big)$ for a given dataset where $S$ is the expected number of conditional PIP computations per MCMC iteration. Experiments show that our Rao-Blackwellized estimator can have a smaller variance than its counterpart associated with the subset wTGS.
    PopulAtion Parameter Averaging (PAPA). (arXiv:2304.03094v1 [cs.LG])
    Ensemble methods combine the predictions of multiple models to improve performance, but they require significantly higher computation costs at inference time. To avoid these costs, multiple neural networks can be combined into one by averaging their weights (model soups). However, this usually performs significantly worse than ensembling. Weight averaging is only beneficial when weights are similar enough (in weight or feature space) to average well but different enough to benefit from combining them. Based on this idea, we propose PopulAtion Parameter Averaging (PAPA): a method that combines the generality of ensembling with the efficiency of weight averaging. PAPA leverages a population of diverse models (trained on different data orders, augmentations, and regularizations) while occasionally (not too often, not too rarely) replacing the weights of the networks with the population average of the weights. PAPA reduces the performance gap between averaging and ensembling, increasing the average accuracy of a population of models by up to 1.1% on CIFAR-10, 2.4% on CIFAR-100, and 1.9% on ImageNet when compared to training independent (non-averaged) models.
    Modelling customer lifetime-value in the retail banking industry. (arXiv:2304.03038v1 [cs.LG])
    Understanding customer lifetime value is key to nurturing long-term customer relationships, however, estimating it is far from straightforward. In the retail banking industry, commonly used approaches rely on simple heuristics and do not take advantage of the high predictive ability of modern machine learning techniques. We present a general framework for modelling customer lifetime value which may be applied to industries with long-lasting contractual and product-centric customer relationships, of which retail banking is an example. This framework is novel in facilitating CLV predictions over arbitrary time horizons and product-based propensity models. We also detail an implementation of this model which is currently in production at a large UK lender. In testing, we estimate an 43% improvement in out-of-time CLV prediction error relative to a popular baseline approach. Propensity models derived from our CLV model have been used to support customer contact marketing campaigns. In testing, we saw that the top 10% of customers ranked by their propensity to take up investment products were 3.2 times more likely to take up an investment product in the next year than a customer chosen at random.
    Missingness Augmentation: A General Approach for Improving Generative Imputation Models. (arXiv:2108.02566v2 [cs.LG] UPDATED)
    Missing data imputation is a fundamental problem in data analysis, and many studies have been conducted to improve its performance by exploring model structures and learning procedures. However, data augmentation, as a simple yet effective method, has not received enough attention in this area. In this paper, we propose a novel data augmentation method called Missingness Augmentation (MisA) for generative imputation models. Our approach dynamically produces incomplete samples at each epoch by utilizing the generator's output, constraining the augmented samples using a simple reconstruction loss, and combining this loss with the original loss to form the final optimization objective. As a general augmentation technique, MisA can be easily integrated into generative imputation frameworks, providing a simple yet effective way to enhance their performance. Experimental results demonstrate that MisA significantly improves the performance of many recently proposed generative imputation models on a variety of tabular and image datasets. The code is available at \url{https://github.com/WYu-Feng/Missingness-Augmentation}.
    Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach. (arXiv:2304.02893v1 [cs.RO])
    We focus on the task of language-conditioned object placement, in which a robot should generate placements that satisfy all the spatial relational constraints in language instructions. Previous works based on rule-based language parsing or scene-centric visual representation have restrictions on the form of instructions and reference objects or require large amounts of training data. We propose an object-centric framework that leverages foundation models to ground the reference objects and spatial relations for placement, which is more sample efficient and generalizable. Experiments indicate that our model can achieve a 97.75% success rate of placement with only ~0.26M trainable parameters. Besides, our method generalizes better to both unseen objects and instructions. Moreover, with only 25% training data, we still outperform the top competing approach.
    Classification of Superstatistical Features in High Dimensions. (arXiv:2304.02912v1 [stat.ML])
    We characterise the learning of a mixture of two clouds of data points with generic centroids via empirical risk minimisation in the high dimensional regime, under the assumptions of generic convex loss and convex regularisation. Each cloud of data points is obtained by sampling from a possibly uncountable superposition of Gaussian distributions, whose variance has a generic probability density $\varrho$. Our analysis covers therefore a large family of data distributions, including the case of power-law-tailed distributions with no covariance. We study the generalisation performance of the obtained estimator, we analyse the role of regularisation, and the dependence of the separability transition on the distribution scale parameters.
    Semi-decentralized Inference in Heterogeneous Graph Neural Networks for Traffic Demand Forecasting: An Edge-Computing Approach. (arXiv:2303.00524v2 [cs.LG] UPDATED)
    Prediction of taxi service demand and supply is essential for improving customer's experience and provider's profit. Recently, graph neural networks (GNNs) have been shown promising for this application. This approach models city regions as nodes in a transportation graph and their relations as edges. GNNs utilize local node features and the graph structure in the prediction. However, more efficient forecasting can still be achieved by following two main routes; enlarging the scale of the transportation graph, and simultaneously exploiting different types of nodes and edges in the graphs. However, both approaches are challenged by the scalability of GNNs. An immediate remedy to the scalability challenge is to decentralize the GNN operation. However, this creates excessive node-to-node communication. In this paper, we first characterize the excessive communication needs for the decentralized GNN approach. Then, we propose a semi-decentralized approach utilizing multiple cloudlets, moderately sized storage and computation devices, that can be integrated with the cellular base stations. This approach minimizes inter-cloudlet communication thereby alleviating the communication overhead of the decentralized approach while promoting scalability due to cloudlet-level decentralization. Also, we propose a heterogeneous GNN-LSTM algorithm for improved taxi-level demand and supply forecasting for handling dynamic taxi graphs where nodes are taxis. Extensive experiments over real data show the advantage of the semi-decentralized approach as tested over our heterogeneous GNN-LSTM algorithm. Also, the proposed semi-decentralized GNN approach is shown to reduce the overall inference time by about an order of magnitude compared to centralized and decentralized inference schemes.
    Expert-Independent Generalization of Well and Seismic Data Using Machine Learning Methods for Complex Reservoirs Predicting During Early-Stage Geological Exploration. (arXiv:2304.03048v1 [physics.geo-ph])
    The aim of this study is to develop and apply an autonomous approach for predicting the probability of hydrocarbon reservoirs spreading in the studied area. Autonomy means that after preparing and inputting geological-geophysical information, the influence of an expert on the algorithms is minimized. The study was made based on the 3D seismic survey data and well information on the early exploration stage of the studied field. As a result, a forecast of the probability of spatial distribution of reservoirs was made for two sets of input data: the base set and the set after reverse-calibration, and three-dimensional cubes of calibrated probabilities of belonging of the studied space to the identified classes were obtained. The approach presented in the paper allows for expert-independent generalization of geological and geophysical data, and to use this generalization for hypothesis testing and creating geological models based on a probabilistic representation of the reservoir. The quality of the probabilistic representation depends on the quality and quantity of the input data. Depending on the input data, the approach can be a useful tool for exploration and prospecting of geological objects, identifying potential resources, optimizing and designing field development.
    Safe MDP Planning by Learning Temporal Patterns of Undesirable Trajectories and Averting Negative Side Effects. (arXiv:2304.03081v1 [cs.LG])
    In safe MDP planning, a cost function based on the current state and action is often used to specify safety aspects. In the real world, often the state representation used may lack sufficient fidelity to specify such safety constraints. Operating based on an incomplete model can often produce unintended negative side effects (NSEs). To address these challenges, first, we associate safety signals with state-action trajectories (rather than just an immediate state-action). This makes our safety model highly general. We also assume categorical safety labels are given for different trajectories, rather than a numerical cost function, which is harder to specify by the problem designer. We then employ a supervised learning model to learn such non-Markovian safety patterns. Second, we develop a Lagrange multiplier method, which incorporates the safety model and the underlying MDP model in a single computation graph to facilitate agent learning of safe behaviors. Finally, our empirical results on a variety of discrete and continuous domains show that this approach can satisfy complex non-Markovian safety constraints while optimizing an agent's total returns, is highly scalable, and is also better than the previous best approach for Markovian NSEs.
    Automatically Bounding the Taylor Remainder Series: Tighter Bounds and New Applications. (arXiv:2212.11429v2 [cs.LG] UPDATED)
    We present a new algorithm for automatically bounding the Taylor remainder series. In the special case of a scalar function $f: \mathbb{R} \to \mathbb{R}$, our algorithm takes as input a reference point $x_0$, trust region $[a, b]$, and integer $k \ge 1$, and returns an interval $I$ such that $f(x) - \sum_{i=0}^{k-1} \frac {1} {i!} f^{(i)}(x_0) (x - x_0)^i \in I (x - x_0)^k$ for all $x \in [a, b]$. As in automatic differentiation, the function $f$ is provided to the algorithm in symbolic form, and must be composed of known atomic functions. At a high level, our algorithm has two steps. First, for a variety of commonly-used elementary functions (e.g., $\exp$, $\log$), we derive sharp polynomial upper and lower bounds on the Taylor remainder series. We then recursively combine the bounds for the elementary functions using an interval arithmetic variant of Taylor-mode automatic differentiation. Our algorithm can make efficient use of machine learning hardware accelerators, and we provide an open source implementation in JAX. We then turn our attention to applications. Most notably, we use our new machinery to create the first universal majorization-minimization optimization algorithms: algorithms that iteratively minimize an arbitrary loss using a majorizer that is derived automatically, rather than by hand. Applied to machine learning, this leads to architecture-specific optimizers for training deep networks that converge from any starting point, without hyperparameter tuning. Our experiments show that for some optimization problems, these hyperparameter-free optimizers outperform tuned versions of gradient descent, Adam, and AdaGrad. We also show that our automatically-derived bounds can be used for verified global optimization and numerical integration, and to prove sharper versions of Jensen's inequality.
    Convolutional neural networks for crack detection on flexible road pavements. (arXiv:2304.02933v1 [cs.CV])
    Flexible road pavements deteriorate primarily due to traffic and adverse environmental conditions. Cracking is the most common deterioration mechanism; the surveying thereof is typically conducted manually using internationally defined classification standards. In South Africa, the use of high-definition video images has been introduced, which allows for safer road surveying. However, surveying is still a tedious manual process. Automation of the detection of defects such as cracks would allow for faster analysis of road networks and potentially reduce human bias and error. This study performs a comparison of six state-of-the-art convolutional neural network models for the purpose of crack detection. The models are pretrained on the ImageNet dataset, and fine-tuned using a new real-world binary crack dataset consisting of 14000 samples. The effects of dataset augmentation are also investigated. Of the six models trained, five achieved accuracy above 97%. The highest recorded accuracy was 98%, achieved by the ResNet and VGG16 models. The dataset is available at the following URL: https://zenodo.org/record/7795975
    Protecting User Privacy in Online Settings via Supervised Learning. (arXiv:2304.02870v1 [cs.CR])
    Companies that have an online presence-in particular, companies that are exclusively digital-often subscribe to this business model: collect data from the user base, then expose the data to advertisement agencies in order to turn a profit. Such companies routinely market a service as "free", while obfuscating the fact that they tend to "charge" users in the currency of personal information rather than money. However, online companies also gather user data for more principled purposes, such as improving the user experience and aggregating statistics. The problem is the sale of user data to third parties. In this work, we design an intelligent approach to online privacy protection that leverages supervised learning. By detecting and blocking data collection that might infringe on a user's privacy, we can restore a degree of digital privacy to the user. In our evaluation, we collect a dataset of network requests and measure the performance of several classifiers that adhere to the supervised learning paradigm. The results of our evaluation demonstrate the feasibility and potential of our approach.
    Perfect is the enemy of test oracle. (arXiv:2302.01488v2 [cs.SE] UPDATED)
    Automation of test oracles is one of the most challenging facets of software testing, but remains comparatively less addressed compared to automated test input generation. Test oracles rely on a ground-truth that can distinguish between the correct and buggy behavior to determine whether a test fails (detects a bug) or passes. What makes the oracle problem challenging and undecidable is the assumption that the ground-truth should know the exact expected, correct, or buggy behavior. However, we argue that one can still build an accurate oracle without knowing the exact correct or buggy behavior, but how these two might differ. This paper presents SEER, a learning-based approach that in the absence of test assertions or other types of oracle, can determine whether a unit test passes or fails on a given method under test (MUT). To build the ground-truth, SEER jointly embeds unit tests and the implementation of MUTs into a unified vector space, in such a way that the neural representation of tests are similar to that of MUTs they pass on them, but dissimilar to MUTs they fail on them. The classifier built on top of this vector representation serves as the oracle to generate "fail" labels, when test inputs detect a bug in MUT or "pass" labels, otherwise. Our extensive experiments on applying SEER to more than 5K unit tests from a diverse set of open-source Java projects show that the produced oracle is (1) effective in predicting the fail or pass labels, achieving an overall accuracy, precision, recall, and F1 measure of 93%, 86%, 94%, and 90%, (2) generalizable, predicting the labels for the unit test of projects that were not in training or validation set with negligible performance drop, and (3) efficient, detecting the existence of bugs in only 6.5 milliseconds on average.
    Ollivier-Ricci Curvature for Hypergraphs: A Unified Framework. (arXiv:2210.12048v3 [cs.LG] UPDATED)
    Bridging geometry and topology, curvature is a powerful and expressive invariant. While the utility of curvature has been theoretically and empirically confirmed in the context of manifolds and graphs, its generalization to the emerging domain of hypergraphs has remained largely unexplored. On graphs, the Ollivier-Ricci curvature measures differences between random walks via Wasserstein distances, thus grounding a geometric concept in ideas from probability theory and optimal transport. We develop ORCHID, a flexible framework generalizing Ollivier-Ricci curvature to hypergraphs, and prove that the resulting curvatures have favorable theoretical properties. Through extensive experiments on synthetic and real-world hypergraphs from different domains, we demonstrate that ORCHID curvatures are both scalable and useful to perform a variety of hypergraph tasks in practice.
    Learning Cautiously in Federated Learning with Noisy and Heterogeneous Clients. (arXiv:2304.02892v1 [cs.LG])
    Federated learning (FL) is a distributed framework for collaboratively training with privacy guarantees. In real-world scenarios, clients may have Non-IID data (local class imbalance) with poor annotation quality (label noise). The co-existence of label noise and class imbalance in FL's small local datasets renders conventional FL methods and noisy-label learning methods both ineffective. To address the challenges, we propose FedCNI without using an additional clean proxy dataset. It includes a noise-resilient local solver and a robust global aggregator. For the local solver, we design a more robust prototypical noise detector to distinguish noisy samples. Further to reduce the negative impact brought by the noisy samples, we devise a curriculum pseudo labeling method and a denoise Mixup training strategy. For the global aggregator, we propose a switching re-weighted aggregation method tailored to different learning periods. Extensive experiments demonstrate our method can substantially outperform state-of-the-art solutions in mix-heterogeneous FL environments.
    Unconstrained Parametrization of Dissipative and Contracting Neural Ordinary Differential Equations. (arXiv:2304.02976v1 [eess.SY])
    In this work, we introduce and study a class of Deep Neural Networks (DNNs) in continuous-time. The proposed architecture stems from the combination of Neural Ordinary Differential Equations (Neural ODEs) with the model structure of recently introduced Recurrent Equilibrium Networks (RENs). We show how to endow our proposed NodeRENs with contractivity and dissipativity -- crucial properties for robust learning and control. Most importantly, as for RENs, we derive parametrizations of contractive and dissipative NodeRENs which are unconstrained, hence enabling their learning for a large number of parameters. We validate the properties of NodeRENs, including the possibility of handling irregularly sampled data, in a case study in nonlinear system identification.
    Pruning Deep Neural Networks from a Sparsity Perspective. (arXiv:2302.05601v2 [cs.LG] UPDATED)
    In recent years, deep network pruning has attracted significant attention in order to enable the rapid deployment of AI into small devices with computation and memory constraints. Pruning is often achieved by dropping redundant weights, neurons, or layers of a deep network while attempting to retain a comparable test performance. Many deep pruning algorithms have been proposed with impressive empirical success. However, existing approaches lack a quantifiable measure to estimate the compressibility of a sub-network during each pruning iteration and thus may under-prune or over-prune the model. In this work, we propose PQ Index (PQI) to measure the potential compressibility of deep neural networks and use this to develop a Sparsity-informed Adaptive Pruning (SAP) algorithm. Our extensive experiments corroborate the hypothesis that for a generic pruning procedure, PQI decreases first when a large model is being effectively regularized and then increases when its compressibility reaches a limit that appears to correspond to the beginning of underfitting. Subsequently, PQI decreases again when the model collapse and significant deterioration in the performance of the model start to occur. Additionally, our experiments demonstrate that the proposed adaptive pruning algorithm with proper choice of hyper-parameters is superior to the iterative pruning algorithms such as the lottery ticket-based pruning methods, in terms of both compression efficiency and robustness.
    Deep Reinforcement Learning Based Vehicle Selection for Asynchronous Federated Learning Enabled Vehicular Edge Computing. (arXiv:2304.02832v1 [cs.LG])
    In the traditional vehicular network, computing tasks generated by the vehicles are usually uploaded to the cloud for processing. However, since task offloading toward the cloud will cause a large delay, vehicular edge computing (VEC) is introduced to avoid such a problem and improve the whole system performance, where a roadside unit (RSU) with certain computing capability is used to process the data of vehicles as an edge entity. Owing to the privacy and security issues, vehicles are reluctant to upload local data directly to the RSU, and thus federated learning (FL) becomes a promising technology for some machine learning tasks in VEC, where vehicles only need to upload the local model hyperparameters instead of transferring their local data to the nearby RSU. Furthermore, as vehicles have different local training time due to various sizes of local data and their different computing capabilities, asynchronous federated learning (AFL) is employed to facilitate the RSU to update the global model immediately after receiving a local model to reduce the aggregation delay. However, in AFL of VEC, different vehicles may have different impact on the global model updating because of their various local training delay, transmission delay and local data sizes. Also, if there are bad nodes among the vehicles, it will affect the global aggregation quality at the RSU. To solve the above problem, we shall propose a deep reinforcement learning (DRL) based vehicle selection scheme to improve the accuracy of the global model in AFL of vehicular network. In the scheme, we present the model including the state, action and reward in the DRL based to the specific problem. Simulation results demonstrate our scheme can effectively remove the bad nodes and improve the aggregation accuracy of the global model.
    Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions. (arXiv:2304.02868v1 [cs.CL])
    Large language models (LLMs) such as ChatGPT and GPT-4 have recently demonstrated their remarkable abilities of communicating with human users. In this technical report, we take an initiative to investigate their capacities of playing text games, in which a player has to understand the environment and respond to situations by having dialogues with the game world. Our experiments show that ChatGPT performs competitively compared to all the existing systems but still exhibits a low level of intelligence. Precisely, ChatGPT can not construct the world model by playing the game or even reading the game manual; it may fail to leverage the world knowledge that it already has; it cannot infer the goal of each step as the game progresses. Our results open up new research questions at the intersection of artificial intelligence, machine learning, and natural language processing.
    FedBot: Enhancing Privacy in Chatbots with Federated Learning. (arXiv:2304.03228v1 [cs.CL])
    Chatbots are mainly data-driven and usually based on utterances that might be sensitive. However, training deep learning models on shared data can violate user privacy. Such issues have commonly existed in chatbots since their inception. In the literature, there have been many approaches to deal with privacy, such as differential privacy and secure multi-party computation, but most of them need to have access to users' data. In this context, Federated Learning (FL) aims to protect data privacy through distributed learning methods that keep the data in its location. This paper presents Fedbot, a proof-of-concept (POC) privacy-preserving chatbot that leverages large-scale customer support data. The POC combines Deep Bidirectional Transformer models and federated learning algorithms to protect customer data privacy during collaborative model training. The results of the proof-of-concept showcase the potential for privacy-preserving chatbots to transform the customer support industry by delivering personalized and efficient customer service that meets data privacy regulations and legal requirements. Furthermore, the system is specifically designed to improve its performance and accuracy over time by leveraging its ability to learn from previous interactions.
    A Transformer-Based Deep Learning Approach for Fairly Predicting Post-Liver Transplant Risk Factors. (arXiv:2304.02780v1 [cs.LG])
    Liver transplantation is a life-saving procedure for patients with end-stage liver disease. There are two main challenges in liver transplant: finding the best matching patient for a donor and ensuring transplant equity among different subpopulations. The current MELD scoring system evaluates a patient's mortality risk if not receiving an organ within 90 days. However, the donor-patient matching should also take into consideration post-transplant risk factors, such as cardiovascular disease, chronic rejection, etc., which are all common complications after transplant. Accurate prediction of these risk scores remains a significant challenge. In this study, we will use predictive models to solve the above challenge. We propose a deep learning framework model to predict multiple risk factors after a liver transplant. By formulating it as a multi-task learning problem, the proposed deep neural network was trained on this data to simultaneously predict the five post-transplant risks and achieve equally good performance by leveraging task balancing techniques. We also propose a novel fairness achieving algorithm and to ensure prediction fairness across different subpopulations. We used electronic health records of 160,360 liver transplant patients, including demographic information, clinical variables, and laboratory values, collected from the liver transplant records of the United States from 1987 to 2018. The performance of the model was evaluated using various performance metrics such as AUROC, AURPC, and accuracy. The results of our experiments demonstrate that the proposed multitask prediction model achieved high accuracy and good balance in predicting all five post-transplant risk factors, with a maximum accuracy discrepancy of only 2.7%. The fairness-achieving algorithm significantly reduced the fairness disparity compared to the baseline model.
    Causal Repair of Learning-enabled Cyber-physical Systems. (arXiv:2304.02813v1 [eess.SY])
    Models of actual causality leverage domain knowledge to generate convincing diagnoses of events that caused an outcome. It is promising to apply these models to diagnose and repair run-time property violations in cyber-physical systems (CPS) with learning-enabled components (LEC). However, given the high diversity and complexity of LECs, it is challenging to encode domain knowledge (e.g., the CPS dynamics) in a scalable actual causality model that could generate useful repair suggestions. In this paper, we focus causal diagnosis on the input/output behaviors of LECs. Specifically, we aim to identify which subset of I/O behaviors of the LEC is an actual cause for a property violation. An important by-product is a counterfactual version of the LEC that repairs the run-time property by fixing the identified problematic behaviors. Based on this insights, we design a two-step diagnostic pipeline: (1) construct and Halpern-Pearl causality model that reflects the dependency of property outcome on the component's I/O behaviors, and (2) perform a search for an actual cause and corresponding repair on the model. We prove that our pipeline has the following guarantee: if an actual cause is found, the system is guaranteed to be repaired; otherwise, we have high probabilistic confidence that the LEC under analysis did not cause the property violation. We demonstrate that our approach successfully repairs learned controllers on a standard OpenAI Gym benchmark.
    When approximate design for fast homomorphic computation provides differential privacy guarantees. (arXiv:2304.02959v1 [cs.CR])
    While machine learning has become pervasive in as diversified fields as industry, healthcare, social networks, privacy concerns regarding the training data have gained a critical importance. In settings where several parties wish to collaboratively train a common model without jeopardizing their sensitive data, the need for a private training protocol is particularly stringent and implies to protect the data against both the model's end-users and the actors of the training phase. Differential privacy (DP) and cryptographic primitives are complementary popular countermeasures against privacy attacks. Among these cryptographic primitives, fully homomorphic encryption (FHE) offers ciphertext malleability at the cost of time-consuming operations in the homomorphic domain. In this paper, we design SHIELD, a probabilistic approximation algorithm for the argmax operator which is both fast when homomorphically executed and whose inaccuracy is used as a feature to ensure DP guarantees. Even if SHIELD could have other applications, we here focus on one setting and seamlessly integrate it in the SPEED collaborative training framework from "SPEED: Secure, PrivatE, and Efficient Deep learning" (Grivet S\'ebert et al., 2021) to improve its computational efficiency. After thoroughly describing the FHE implementation of our algorithm and its DP analysis, we present experimental results. To the best of our knowledge, it is the first work in which relaxing the accuracy of an homomorphic calculation is constructively usable as a degree of freedom to achieve better FHE performances.
    Parameterized Approximation Schemes for Clustering with General Norm Objectives. (arXiv:2304.03146v1 [cs.DS])
    This paper considers the well-studied algorithmic regime of designing a $(1+\epsilon)$-approximation algorithm for a $k$-clustering problem that runs in time $f(k,\epsilon)poly(n)$ (sometimes called an efficient parameterized approximation scheme or EPAS for short). Notable results of this kind include EPASes in the high-dimensional Euclidean setting for $k$-center [Bad\u{o}iu, Har-Peled, Indyk; STOC'02] as well as $k$-median, and $k$-means [Kumar, Sabharwal, Sen; J. ACM 2010]. However, existing EPASes handle only basic objectives (such as $k$-center, $k$-median, and $k$-means) and are tailored to the specific objective and metric space. Our main contribution is a clean and simple EPAS that settles more than ten clustering problems (across multiple well-studied objectives as well as metric spaces) and unifies well-known EPASes. Our algorithm gives EPASes for a large variety of clustering objectives (for example, $k$-means, $k$-center, $k$-median, priority $k$-center, $\ell$-centrum, ordered $k$-median, socially fair $k$-median aka robust $k$-median, or more generally monotone norm $k$-clustering) and metric spaces (for example, continuous high-dimensional Euclidean spaces, metrics of bounded doubling dimension, bounded treewidth metrics, and planar metrics). Key to our approach is a new concept that we call bounded $\epsilon$-scatter dimension--an intrinsic complexity measure of a metric space that is a relaxation of the standard notion of bounded doubling dimension. Our main technical result shows that two conditions are essentially sufficient for our algorithm to yield an EPAS on the input metric $M$ for any clustering objective: (i) The objective is described by a monotone (not necessarily symmetric!) norm, and (ii) the $\epsilon$-scatter dimension of $M$ is upper bounded by a function of $\epsilon$.
    Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling. (arXiv:2304.02806v1 [cs.LG])
    Graph neural networks (GNNs) have been widely applied to learning over graph data. Yet, real-world graphs commonly exhibit diverse graph structures and contain heterogeneous nodes and edges. Moreover, to enhance the generalization ability of GNNs, it has become common practice to further increase the diversity of training graph structures by incorporating graph augmentations and/or performing large-scale pre-training on more graphs. Therefore, it becomes essential for a GNN to simultaneously model diverse graph structures. Yet, naively increasing the GNN model capacity will suffer from both higher inference costs and the notorious trainability issue of GNNs. This paper introduces the Mixture-of-Expert (MoE) idea to GNNs, aiming to enhance their ability to accommodate the diversity of training graph structures, without incurring computational overheads. Our new Graph Mixture of Expert (GMoE) model enables each node in the graph to dynamically select its own optimal \textit{information aggregation experts}. These experts are trained to model different subgroups of graph structures in the training set. Additionally, GMoE includes information aggregation experts with varying aggregation hop sizes, where the experts with larger hop sizes are specialized in capturing information over longer ranges. The effectiveness of GMoE is verified through experimental results on a large variety of graph, node, and link prediction tasks in the OGB benchmark. For instance, it enhances ROC-AUC by $1.81\%$ in ogbg-molhiv and by $1.40\%$ in ogbg-molbbbp, as compared to the non-MoE baselines. Our code is available at https://github.com/VITA-Group/Graph-Mixture-of-Experts.
    IoT Federated Blockchain Learning at the Edge. (arXiv:2304.03006v1 [cs.LG])
    IoT devices are sorely underutilized in the medical field, especially within machine learning for medicine, yet they offer unrivaled benefits. IoT devices are low-cost, energy-efficient, small and intelligent devices. In this paper, we propose a distributed federated learning framework for IoT devices, more specifically for IoMT (Internet of Medical Things), using blockchain to allow for a decentralized scheme improving privacy and efficiency over a centralized system; this allows us to move from the cloud-based architectures, that are prevalent, to the edge. The system is designed for three paradigms: 1) Training neural networks on IoT devices to allow for collaborative training of a shared model whilst decoupling the learning from the dataset to ensure privacy. Training is performed in an online manner simultaneously amongst all participants, allowing for the training of actual data that may not have been present in a dataset collected in the traditional way and dynamically adapt the system whilst it is being trained. 2) Training of an IoMT system in a fully private manner such as to mitigate the issue with confidentiality of medical data and to build robust, and potentially bespoke, models where not much, if any, data exists. 3) Distribution of the actual network training, something federated learning itself does not do, to allow hospitals, for example, to utilize their spare computing resources to train network models.
    Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example Transferability. (arXiv:2304.02688v1 [cs.LG])
    Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that transferability is substantially increased when the training of the surrogate model has been early stopped. A common hypothesis to explain this is that the later training epochs are when models learn the non-robust features that adversarial attacks exploit. Hence, an early stopped model is more robust (hence, a better surrogate) than fully trained models. We demonstrate that the reasons why early stopping improves transferability lie in the side effects it has on the learning dynamics of the model. We first show that early stopping benefits transferability even on models learning from data with non-robust features. We then establish links between transferability and the exploration of the loss landscape in the parameter space, on which early stopping has an inherent effect. More precisely, we observe that transferability peaks when the learning rate decays, which is also the time at which the sharpness of the loss significantly drops. This leads us to propose RFN, a new approach for transferability that minimizes loss sharpness during training in order to maximize transferability. We show that by searching for large flat neighborhoods, RFN always improves over early stopping (by up to 47 points of transferability rate) and is competitive to (if not better than) strong state-of-the-art baselines.
    Learning Stability Attention in Vision-based End-to-end Driving Policies. (arXiv:2304.02733v1 [cs.RO])
    Modern end-to-end learning systems can learn to explicitly infer control from perception. However, it is difficult to guarantee stability and robustness for these systems since they are often exposed to unstructured, high-dimensional, and complex observation spaces (e.g., autonomous driving from a stream of pixel inputs). We propose to leverage control Lyapunov functions (CLFs) to equip end-to-end vision-based policies with stability properties and introduce stability attention in CLFs (att-CLFs) to tackle environmental changes and improve learning flexibility. We also present an uncertainty propagation technique that is tightly integrated into att-CLFs. We demonstrate the effectiveness of att-CLFs via comparison with classical CLFs, model predictive control, and vanilla end-to-end learning in a photo-realistic simulator and on a real full-scale autonomous vehicle.
    GIF: A General Graph Unlearning Strategy via Influence Function. (arXiv:2304.02835v1 [cs.LG])
    With the greater emphasis on privacy and security in our society, the problem of graph unlearning -- revoking the influence of specific data on the trained GNN model, is drawing increasing attention. However, ranging from machine unlearning to recently emerged graph unlearning methods, existing efforts either resort to retraining paradigm, or perform approximate erasure that fails to consider the inter-dependency between connected neighbors or imposes constraints on GNN structure, therefore hard to achieve satisfying performance-complexity trade-offs. In this work, we explore the influence function tailored for graph unlearning, so as to improve the unlearning efficacy and efficiency for graph unlearning. We first present a unified problem formulation of diverse graph unlearning tasks \wrt node, edge, and feature. Then, we recognize the crux to the inability of traditional influence function for graph unlearning, and devise Graph Influence Function (GIF), a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a $\epsilon$-mass perturbation in deleted data. The idea is to supplement the objective of the traditional influence function with an additional loss term of the influenced neighbors due to the structural dependency. Further deductions on the closed-form solution of parameter changes provide a better understanding of the unlearning mechanism. We conduct extensive experiments on four representative GNN models and three benchmark datasets to justify the superiority of GIF for diverse graph unlearning tasks in terms of unlearning efficacy, model utility, and unlearning efficiency. Our implementations are available at \url{https://github.com/wujcan/GIF-torch/}.
    Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry. (arXiv:2304.02902v1 [stat.ML])
    Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. Such symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning.
    FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. (arXiv:2304.02948v1 [cs.AI])
    We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI). Different from existing data-driven weather forecast methods, FengWu solves the medium-range forecast problem from a multi-modal and multi-task perspective. Specifically, a deep learning architecture equipped with model-specific encoder-decoders and cross-modal fusion Transformer is elaborately designed, which is learned under the supervision of an uncertainty loss to balance the optimization of different predictors in a region-adaptive manner. Besides this, a replay buffer mechanism is introduced to improve medium-range forecast performance. With 39-year data training based on the ERA5 reanalysis, FengWu is able to accurately reproduce the atmospheric dynamics and predict the future land and atmosphere states at 37 vertical levels on a 0.25{\deg} latitude-longitude resolution. Hindcasts of 6-hourly weather in 2018 based on ERA5 demonstrate that FengWu performs better than GraphCast in predicting 80\% of the 880 reported predictands, e.g., reducing the root mean square error (RMSE) of 10-day lead global z500 prediction from 733 to 651 $m^{2}/s^2$. In addition, the inference cost of each iteration is merely 600ms on NVIDIA Tesla A100 hardware. The results suggest that FengWu can significantly improve the forecast skill and extend the skillful global medium-range weather forecast out to 10.75 days lead (with ACC of z500 > 0.6) for the first time.
    Data Quality Over Quantity: Pitfalls and Guidelines for Process Analytics. (arXiv:2211.06440v2 [eess.SY] UPDATED)
    A significant portion of the effort involved in advanced process control, process analytics, and machine learning involves acquiring and preparing data. Literature often emphasizes increasingly complex modelling techniques with incremental performance improvements. However, when industrial case studies are published they often lack important details on data acquisition and preparation. Although data pre-processing is unfairly maligned as trivial and technically uninteresting, in practice it has an out-sized influence on the success of real-world artificial intelligence applications. This work describes best practices for acquiring and preparing operating data to pursue data-driven modelling and control opportunities in industrial processes. We present practical considerations for pre-processing industrial time series data to inform the efficient development of reliable soft sensors that provide valuable process insights.
    MethaneMapper: Spectral Absorption aware Hyperspectral Transformer for Methane Detection. (arXiv:2304.02767v1 [eess.IV])
    Methane (CH$_4$) is the chief contributor to global climate change. Recent Airborne Visible-Infrared Imaging Spectrometer-Next Generation (AVIRIS-NG) has been very useful in quantitative mapping of methane emissions. Existing methods for analyzing this data are sensitive to local terrain conditions, often require manual inspection from domain experts, prone to significant error and hence are not scalable. To address these challenges, we propose a novel end-to-end spectral absorption wavelength aware transformer network, MethaneMapper, to detect and quantify the emissions. MethaneMapper introduces two novel modules that help to locate the most relevant methane plume regions in the spectral domain and uses them to localize these accurately. Thorough evaluation shows that MethaneMapper achieves 0.63 mAP in detection and reduces the model size (by 5x) compared to the current state of the art. In addition, we also introduce a large-scale dataset of methane plume segmentation mask for over 1200 AVIRIS-NG flight lines from 2015-2022. It contains over 4000 methane plume sites. Our dataset will provide researchers the opportunity to develop and advance new methods for tackling this challenging green-house gas detection problem with significant broader social impact. Dataset and source code are public
    Multi-Linear Kernel Regression and Imputation in Data Manifolds. (arXiv:2304.03041v1 [eess.SP])
    This paper introduces an efficient multi-linear nonparametric (kernel-based) approximation framework for data regression and imputation, and its application to dynamic magnetic-resonance imaging (dMRI). Data features are assumed to reside in or close to a smooth manifold embedded in a reproducing kernel Hilbert space. Landmark points are identified to describe concisely the point cloud of features by linear approximating patches which mimic the concept of tangent spaces to smooth manifolds. The multi-linear model effects dimensionality reduction, enables efficient computations, and extracts data patterns and their geometry without any training data or additional information. Numerical tests on dMRI data under severe under-sampling demonstrate remarkable improvements in efficiency and accuracy of the proposed approach over its predecessors, popular data modeling methods, as well as recent tensor-based and deep-image-prior schemes.
    Deep learning-based image exposure enhancement as a pre-processing for an accurate 3D colon surface reconstruction. (arXiv:2304.03171v1 [eess.IV])
    This contribution shows how an appropriate image pre-processing can improve a deep-learning based 3D reconstruction of colon parts. The assumption is that, rather than global image illumination corrections, local under- and over-exposures should be corrected in colonoscopy. An overview of the pipeline including the image exposure correction and a RNN-SLAM is first given. Then, this paper quantifies the reconstruction accuracy of the endoscope trajectory in the colon with and without appropriate illumination correction
    CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for Passage Retrieval. (arXiv:2304.03158v1 [cs.CL])
    Growing techniques have been emerging to improve the performance of passage retrieval. As an effective representation bottleneck pretraining technique, the contextual masked auto-encoder utilizes contextual embedding to assist in the reconstruction of passages. However, it only uses a single auto-encoding pre-task for dense representation pre-training. This study brings multi-view modeling to the contextual masked auto-encoder. Firstly, multi-view representation utilizes both dense and sparse vectors as multi-view representations, aiming to capture sentence semantics from different aspects. Moreover, multiview decoding paradigm utilizes both autoencoding and auto-regressive decoders in representation bottleneck pre-training, aiming to provide both reconstructive and generative signals for better contextual representation pretraining. We refer to this multi-view pretraining method as CoT-MAE v2. Through extensive experiments, we show that CoT-MAE v2 is effective and robust on large-scale passage retrieval benchmarks and out-of-domain zero-shot benchmarks.
    Tag that issue: Applying API-domain labels in issue tracking systems. (arXiv:2304.02877v1 [cs.SE])
    Labeling issues with the skills required to complete them can help contributors to choose tasks in Open Source Software projects. However, manually labeling issues is time-consuming and error-prone, and current automated approaches are mostly limited to classifying issues as bugs/non-bugs. We investigate the feasibility and relevance of automatically labeling issues with what we call "API-domains," which are high-level categories of APIs. Therefore, we posit that the APIs used in the source code affected by an issue can be a proxy for the type of skills (e.g., DB, security, UI) needed to work on the issue. We ran a user study (n=74) to assess API-domain labels' relevancy to potential contributors, leveraged the issues' descriptions and the project history to build prediction models, and validated the predictions with contributors (n=20) of the projects. Our results show that (i) newcomers to the project consider API-domain labels useful in choosing tasks, (ii) labels can be predicted with a precision of 84% and a recall of 78.6% on average, (iii) the results of the predictions reached up to 71.3% in precision and 52.5% in recall when training with a project and testing in another (transfer learning), and (iv) project contributors consider most of the predictions helpful in identifying needed skills. These findings suggest our approach can be applied in practice to automatically label issues, assisting developers in finding tasks that better match their skills.
    Private Estimation with Public Data. (arXiv:2208.07984v2 [cs.LG] UPDATED)
    We initiate the study of differentially private (DP) estimation with access to a small amount of public data. For private estimation of d-dimensional Gaussians, we assume that the public data comes from a Gaussian that may have vanishing similarity in total variation distance with the underlying Gaussian of the private data. We show that under the constraints of pure or concentrated DP, d+1 public data samples are sufficient to remove any dependence on the range parameters of the private data distribution from the private sample complexity, which is known to be otherwise necessary without public data. For separated Gaussian mixtures, we assume that the underlying public and private distributions are the same, and we consider two settings: (1) when given a dimension-independent amount of public data, the private sample complexity can be improved polynomially in terms of the number of mixture components, and any dependence on the range parameters of the distribution can be removed in the approximate DP case; (2) when given an amount of public data linear in the dimension, the private sample complexity can be made independent of range parameters even under concentrated DP, and additional improvements can be made to the overall sample complexity.
    Continual Repeated Annealed Flow Transport Monte Carlo. (arXiv:2201.13117v3 [stat.ML] UPDATED)
    We propose Continual Repeated Annealed Flow Transport Monte Carlo (CRAFT), a method that combines a sequential Monte Carlo (SMC) sampler (itself a generalization of Annealed Importance Sampling) with variational inference using normalizing flows. The normalizing flows are directly trained to transport between annealing temperatures using a KL divergence for each transition. This optimization objective is itself estimated using the normalizing flow/SMC approximation. We show conceptually and using multiple empirical examples that CRAFT improves on Annealed Flow Transport Monte Carlo (Arbel et al., 2021), on which it builds and also on Markov chain Monte Carlo (MCMC) based Stochastic Normalizing Flows (Wu et al., 2020). By incorporating CRAFT within particle MCMC, we show that such learnt samplers can achieve impressively accurate results on a challenging lattice field theory example.
    Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models. (arXiv:2304.03271v1 [cs.LG])
    The growing carbon footprint of artificial intelligence (AI) models, especially large ones such as GPT-3 and GPT-4, has been undergoing public scrutiny. Unfortunately, however, the equally important and enormous water footprint of AI models has remained under the radar. For example, training GPT-3 in Microsoft's state-of-the-art U.S. data centers can directly consume 700,000 liters of clean freshwater (enough for producing 370 BMW cars or 320 Tesla electric vehicles) and the water consumption would have been tripled if training were done in Microsoft's Asian data centers, but such information has been kept as a secret. This is extremely concerning, as freshwater scarcity has become one of the most pressing challenges shared by all of us in the wake of the rapidly growing population, depleting water resources, and aging water infrastructures. To respond to the global water challenges, AI models can, and also should, take social responsibility and lead by example by addressing their own water footprint. In this paper, we provide a principled methodology to estimate fine-grained water footprint of AI models, and also discuss the unique spatial-temporal diversities of AI models' runtime water efficiency. Finally, we highlight the necessity of holistically addressing water footprint along with carbon footprint to enable truly sustainable AI.
    Training a Two Layer ReLU Network Analytically. (arXiv:2304.02972v1 [cs.LG])
    Neural networks are usually trained with different variants of gradient descent based optimization algorithms such as stochastic gradient descent or the Adam optimizer. Recent theoretical work states that the critical points (where the gradient of the loss is zero) of two-layer ReLU networks with the square loss are not all local minima. However, in this work we will explore an algorithm for training two-layer neural networks with ReLU-like activation and the square loss that alternatively finds the critical points of the loss function analytically for one layer while keeping the other layer and the neuron activation pattern fixed. Experiments indicate that this simple algorithm can find deeper optima than Stochastic Gradient Descent or the Adam optimizer, obtaining significantly smaller training loss values on four out of the five real datasets evaluated. Moreover, the method is faster than the gradient descent methods and has virtually no tuning parameters.
    SLM: End-to-end Feature Selection via Sparse Learnable Masks. (arXiv:2304.03202v1 [cs.LG])
    Feature selection has been widely used to alleviate compute requirements during training, elucidate model interpretability, and improve model generalizability. We propose SLM -- Sparse Learnable Masks -- a canonical approach for end-to-end feature selection that scales well with respect to both the feature dimension and the number of samples. At the heart of SLM lies a simple but effective learnable sparse mask, which learns which features to select, and gives rise to a novel objective that provably maximizes the mutual information (MI) between the selected features and the labels, which can be derived from a quadratic relaxation of mutual information from first principles. In addition, we derive a scaling mechanism that allows SLM to precisely control the number of features selected, through a novel use of sparsemax. This allows for more effective learning as demonstrated in ablation studies. Empirically, SLM achieves state-of-the-art results against a variety of competitive baselines on eight benchmark datasets, often by a significant margin, especially on those with real-world challenges such as class imbalance.
    Delayed Feedback in Generalised Linear Bandits Revisited. (arXiv:2207.10786v3 [cs.LG] UPDATED)
    The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for immediate rewards is unmet in many real-world applications where the reward is almost always delayed. We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner. We show that a natural adaptation of an optimistic algorithm to the delayed feedback achieves a regret bound where the penalty for the delays is independent of the horizon. This result significantly improves upon existing work, where the best known regret bound has the delay penalty increasing with the horizon. We verify our theoretical results through experiments on simulated data.
    Data AUDIT: Identifying Attribute Utility- and Detectability-Induced Bias in Task Models. (arXiv:2304.03218v1 [cs.LG])
    To safely deploy deep learning-based computer vision models for computer-aided detection and diagnosis, we must ensure that they are robust and reliable. Towards that goal, algorithmic auditing has received substantial attention. To guide their audit procedures, existing methods rely on heuristic approaches or high-level objectives (e.g., non-discrimination in regards to protected attributes, such as sex, gender, or race). However, algorithms may show bias with respect to various attributes beyond the more obvious ones, and integrity issues related to these more subtle attributes can have serious consequences. To enable the generation of actionable, data-driven hypotheses which identify specific dataset attributes likely to induce model bias, we contribute a first technique for the rigorous, quantitative screening of medical image datasets. Drawing from literature in the causal inference and information theory domains, our procedure decomposes the risks associated with dataset attributes in terms of their detectability and utility (defined as the amount of information knowing the attribute gives about a task label). To demonstrate the effectiveness and sensitivity of our method, we develop a variety of datasets with synthetically inserted artifacts with different degrees of association to the target label that allow evaluation of inherited model biases via comparison of performance against true counterfactual examples. Using these datasets and results from hundreds of trained models, we show our screening method reliably identifies nearly imperceptible bias-inducing artifacts. Lastly, we apply our method to the natural attributes of a popular skin-lesion dataset and demonstrate its success. Our approach provides a means to perform more systematic algorithmic audits and guide future data collection efforts in pursuit of safer and more reliable models.
    Anomaly Detection via Gumbel Noise Score Matching. (arXiv:2304.03220v1 [cs.LG])
    We propose Gumbel Noise Score Matching (GNSM), a novel unsupervised method to detect anomalies in categorical data. GNSM accomplishes this by estimating the scores, i.e. the gradients of log likelihoods w.r.t.~inputs, of continuously relaxed categorical distributions. We test our method on a suite of anomaly detection tabular datasets. GNSM achieves a consistently high performance across all experiments. We further demonstrate the flexibility of GNSM by applying it to image data where the model is tasked to detect poor segmentation predictions. Images ranked anomalous by GNSM show clear segmentation failures, with the outputs of GNSM strongly correlating with segmentation metrics computed on ground-truth. We outline the score matching training objective utilized by GNSM and provide an open-source implementation of our work.
    Mask Detection and Classification in Thermal Face Images. (arXiv:2304.02931v1 [cs.CV])
    Face masks are recommended to reduce the transmission of many viruses, especially SARS-CoV-2. Therefore, the automatic detection of whether there is a mask on the face, what type of mask is worn, and how it is worn is an important research topic. In this work, the use of thermal imaging was considered to analyze the possibility of detecting (localizing) a mask on the face, as well as to check whether it is possible to classify the type of mask on the face. The previously proposed dataset of thermal images was extended and annotated with the description of a type of mask and a location of a mask within a face. Different deep learning models were adapted. The best model for face mask detection turned out to be the Yolov5 model in the "nano" version, reaching mAP higher than 97% and precision of about 95%. High accuracy was also obtained for mask type classification. The best results were obtained for the convolutional neural network model built on an autoencoder initially trained in the thermal image reconstruction problem. The pretrained encoder was used to train a classifier which achieved an accuracy of 91%.
    Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster. (arXiv:2304.03208v1 [cs.LG])
    We study recent research advances that improve large language models through efficient pre-training and scaling, and open datasets and tools. We combine these advances to introduce Cerebras-GPT, a family of open compute-optimal language models scaled from 111M to 13B parameters. We train Cerebras-GPT models on the Eleuther Pile dataset following DeepMind Chinchilla scaling rules for efficient pre-training (highest accuracy for a given compute budget). We characterize the predictable power-law scaling and compare Cerebras-GPT with other publicly-available models to show all Cerebras-GPT models have state-of-the-art training efficiency on both pre-training and downstream objectives. We describe our learnings including how Maximal Update Parameterization ($\mu$P) can further improve large model scaling, improving accuracy and hyperparameter predictability at scale. We release our pre-trained models and code, making this paper the first open and reproducible work comparing compute-optimal model scaling to models trained on fixed dataset sizes. Cerebras-GPT models are available on HuggingFace: https://huggingface.co/cerebras.
    WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series. (arXiv:2203.09978v2 [cs.LG] UPDATED)
    Machine learning models often fail to generalize well under distributional shifts. Understanding and overcoming these failures have led to a research field of Out-of-Distribution (OOD) generalization. Despite being extensively studied for static computer vision tasks, OOD generalization has been underexplored for time series tasks. To shine light on this gap, we present WOODS: eight challenging open-source time series benchmarks covering a diverse range of data modalities, such as videos, brain recordings, and sensor signals. We revise the existing OOD generalization algorithms for time series tasks and evaluate them using our systematic framework. Our experiments show a large room for improvement for empirical risk minimization and OOD generalization algorithms on our datasets, thus underscoring the new challenges posed by time series tasks. Code and documentation are available at https://woods-benchmarks.github.io .
    Adaptive Student's t-distribution with method of moments moving estimator for nonstationary time series. (arXiv:2304.03069v1 [stat.ME])
    The real life time series are usually nonstationary, bringing a difficult question of model adaptation. Classical approaches like GARCH assume arbitrary type of dependence. To prevent such bias, we will focus on recently proposed agnostic philosophy of moving estimator: in time $t$ finding parameters optimizing e.g. $F_t=\sum_{\tau<t} (1-\eta)^{t-\tau} \ln(\rho_\theta (x_\tau))$ moving log-likelihood, evolving in time. It allows for example to estimate parameters using inexpensive exponential moving averages (EMA), like absolute central moments $E[|x-\mu|^p]$ evolving with $m_{p,t+1} = m_{p,t} + \eta (|x_t-\mu_t|^p-m_{p,t})$ for one or multiple powers $p\in\mathbb{R}^+$. Application of such general adaptive methods of moments will be presented on Student's t-distribution, popular especially in economical applications, here applied to log-returns of DJIA companies.
    Inductive Graph Unlearning. (arXiv:2304.03093v1 [cs.LG])
    As a way to implement the "right to be forgotten" in machine learning, \textit{machine unlearning} aims to completely remove the contributions and information of the samples to be deleted from a trained model without affecting the contributions of other samples. Recently, many frameworks for machine unlearning have been proposed, and most of them focus on image and text data. To extend machine unlearning to graph data, \textit{GraphEraser} has been proposed. However, a critical issue is that \textit{GraphEraser} is specifically designed for the transductive graph setting, where the graph is static and attributes and edges of test nodes are visible during training. It is unsuitable for the inductive setting, where the graph could be dynamic and the test graph information is invisible in advance. Such inductive capability is essential for production machine learning systems with evolving graphs like social media and transaction networks. To fill this gap, we propose the \underline{{\bf G}}\underline{{\bf U}}ided \underline{{\bf I}}n\underline{{\bf D}}uctiv\underline{{\bf E}} Graph Unlearning framework (GUIDE). GUIDE consists of three components: guided graph partitioning with fairness and balance, efficient subgraph repair, and similarity-based aggregation. Empirically, we evaluate our method on several inductive benchmarks and evolving transaction graphs. Generally speaking, GUIDE can be efficiently implemented on the inductive graph learning tasks for its low graph partition cost, no matter on computation or structure information. The code will be available here: https://github.com/Happy2Git/GUIDE.
    A Fast and Lightweight Network for Low-Light Image Enhancement. (arXiv:2304.02978v1 [cs.CV])
    Low-light images often suffer from severe noise, low brightness, low contrast, and color deviation. While several low-light image enhancement methods have been proposed, there remains a lack of efficient methods that can simultaneously solve all of these problems. In this paper, we introduce FLW-Net, a Fast and LightWeight Network for low-light image enhancement that significantly improves processing speed and overall effect. To achieve efficient low-light image enhancement, we recognize the challenges of the lack of an absolute reference and the need for a large receptive field to obtain global contrast. Therefore, we propose an efficient global feature information extraction component and design loss functions based on relative information to overcome these challenges. Finally, we conduct comparative experiments to demonstrate the effectiveness of the proposed method, and the results confirm that FLW-Net can significantly reduce the complexity of supervised low-light image enhancement networks while improving processing effect. Code is available at https://github.com/hitzhangyu/FLW-Net
    Efficient SAGE Estimation via Causal Structure Learning. (arXiv:2304.03113v1 [stat.ML])
    The Shapley Additive Global Importance (SAGE) value is a theoretically appealing interpretability method that fairly attributes global importance to a model's features. However, its exact calculation requires the computation of the feature's surplus performance contributions over an exponential number of feature sets. This is computationally expensive, particularly because estimating the surplus contributions requires sampling from conditional distributions. Thus, SAGE approximation algorithms only take a fraction of the feature sets into account. We propose $d$-SAGE, a method that accelerates SAGE approximation. $d$-SAGE is motivated by the observation that conditional independencies (CIs) between a feature and the model target imply zero surplus contributions, such that their computation can be skipped. To identify CIs, we leverage causal structure learning (CSL) to infer a graph that encodes (conditional) independencies in the data as $d$-separations. This is computationally more efficient because the expense of the one-time graph inference and the $d$-separation queries is negligible compared to the expense of surplus contribution evaluations. Empirically we demonstrate that $d$-SAGE enables the efficient and accurate estimation of SAGE values.
    Constructing Phylogenetic Networks via Cherry Picking and Machine Learning. (arXiv:2304.02729v1 [q-bio.PE])
    Combining a set of phylogenetic trees into a single phylogenetic network that explains all of them is a fundamental challenge in evolutionary studies. Existing methods are computationally expensive and can either handle only small numbers of phylogenetic trees or are limited to severely restricted classes of networks. In this paper, we apply the recently-introduced theoretical framework of cherry picking to design a class of efficient heuristics that are guaranteed to produce a network containing each of the input trees, for datasets consisting of binary trees. Some of the heuristics in this framework are based on the design and training of a machine learning model that captures essential information on the structure of the input trees and guides the algorithms towards better solutions. We also propose simple and fast randomised heuristics that prove to be very effective when run multiple times. Unlike the existing exact methods, our heuristics are applicable to datasets of practical size, and the experimental study we conducted on both simulated and real data shows that these solutions are qualitatively good, always within some small constant factor from the optimum. Moreover, our machine-learned heuristics are one of the first applications of machine learning to phylogenetics and show its promise.
    Characterizing the Users, Challenges, and Visualization Needs of Knowledge Graphs in Practice. (arXiv:2304.01311v2 [cs.HC] UPDATED)
    This study presents insights from interviews with nineteen Knowledge Graph (KG) practitioners who work in both enterprise and academic settings on a wide variety of use cases. Through this study, we identify critical challenges experienced by KG practitioners when creating, exploring, and analyzing KGs that could be alleviated through visualization design. Our findings reveal three major personas among KG practitioners - KG Builders, Analysts, and Consumers - each of whom have their own distinct expertise and needs. We discover that KG Builders would benefit from schema enforcers, while KG Analysts need customizable query builders that provide interim query results. For KG Consumers, we identify a lack of efficacy for node-link diagrams, and the need for tailored domain-specific visualizations to promote KG adoption and comprehension. Lastly, we find that implementing KGs effectively in practice requires both technical and social solutions that are not addressed with current tools, technologies, and collaborative workflows. From the analysis of our interviews, we distill several visualization research directions to improve KG usability, including knowledge cards that balance digestibility and discoverability, timeline views to track temporal changes, interfaces that support organic discovery, and semantic explanations for AI and machine learning predictions.
    Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning. (arXiv:2304.02711v1 [cs.AI])
    Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.
    ViralVectors: Compact and Scalable Alignment-free Virome Feature Generation. (arXiv:2304.02891v1 [q-bio.GN])
    The amount of sequencing data for SARS-CoV-2 is several orders of magnitude larger than any virus. This will continue to grow geometrically for SARS-CoV-2, and other viruses, as many countries heavily finance genomic surveillance efforts. Hence, we need methods for processing large amounts of sequence data to allow for effective yet timely decision-making. Such data will come from heterogeneous sources: aligned, unaligned, or even unassembled raw nucleotide or amino acid sequencing reads pertaining to the whole genome or regions (e.g., spike) of interest. In this work, we propose \emph{ViralVectors}, a compact feature vector generation from virome sequencing data that allows effective downstream analysis. Such generation is based on \emph{minimizers}, a type of lightweight "signature" of a sequence, used traditionally in assembly and read mapping -- to our knowledge, the first use minimizers in this way. We validate our approach on different types of sequencing data: (a) 2.5M SARS-CoV-2 spike sequences (to show scalability); (b) 3K Coronaviridae spike sequences (to show robustness to more genomic variability); and (c) 4K raw WGS reads sets taken from nasal-swab PCR tests (to show the ability to process unassembled reads). Our results show that ViralVectors outperforms current benchmarks in most classification and clustering tasks.
    Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification. (arXiv:2304.02836v1 [eess.IV])
    The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learning. In this work, we propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from routinely collected EHRs for SPN classification. We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans. Our classifier is pretrained on 2,668 scans from a public dataset and 1,149 subjects with longitudinal chest CTs, billing codes, medications, and laboratory tests from EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs revealed a significant AUC improvement over a longitudinal multimodal baseline (0.824 vs 0.752 AUC), as well as improvements over a single cross-section multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741 AUC). This work demonstrates significant advantages with a novel approach for co-learning longitudinal imaging and non-imaging phenotypes with transformers.  ( 3 min )
    Robustmix: Improving Robustness by Regularizing the Frequency Bias of Deep Nets. (arXiv:2304.02847v1 [cs.CV])
    Deep networks have achieved impressive results on a range of well-curated benchmark datasets. Surprisingly, their performance remains sensitive to perturbations that have little effect on human performance. In this work, we propose a novel extension of Mixup called Robustmix that regularizes networks to classify based on lower-frequency spatial features. We show that this type of regularization improves robustness on a range of benchmarks such as Imagenet-C and Stylized Imagenet. It adds little computational overhead and, furthermore, does not require a priori knowledge of a large set of image transformations. We find that this approach further complements recent advances in model architecture and data augmentation, attaining a state-of-the-art mCE of 44.8 with an EfficientNet-B8 model and RandAugment, which is a reduction of 16 mCE compared to the baseline.  ( 2 min )
    Pairwise Ranking with Gaussian Kernels. (arXiv:2304.03185v1 [stat.ML])
    Regularized pairwise ranking with Gaussian kernels is one of the cutting-edge learning algorithms. Despite a wide range of applications, a rigorous theoretical demonstration still lacks to support the performance of such ranking estimators. This work aims to fill this gap by developing novel oracle inequalities for regularized pairwise ranking. With the help of these oracle inequalities, we derive fast learning rates of Gaussian ranking estimators under a general box-counting dimension assumption on the input domain combined with the noise conditions or the standard smoothness condition. Our theoretical analysis improves the existing estimates and shows that a low intrinsic dimension of input space can help the rates circumvent the curse of dimensionality.  ( 2 min )
    Robust Neural Architecture Search. (arXiv:2304.02845v1 [cs.LG])
    Neural Architectures Search (NAS) becomes more and more popular over these years. However, NAS-generated models tends to suffer greater vulnerability to various malicious attacks. Lots of robust NAS methods leverage adversarial training to enhance the robustness of NAS-generated models, however, they neglected the nature accuracy of NAS-generated models. In our paper, we propose a novel NAS method, Robust Neural Architecture Search (RNAS). To design a regularization term to balance accuracy and robustness, RNAS generates architectures with both high accuracy and good robustness. To reduce search cost, we further propose to use noise examples instead adversarial examples as input to search architectures. Extensive experiments show that RNAS achieves state-of-the-art (SOTA) performance on both image classification and adversarial attacks, which illustrates the proposed RNAS achieves a good tradeoff between robustness and accuracy.  ( 2 min )
    Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks. (arXiv:2304.02654v1 [cs.LG])
    Recent decades have seen the rise of large-scale Deep Neural Networks (DNNs) to achieve human-competitive performance in a variety of artificial intelligence tasks. Often consisting of hundreds of millions, if not hundreds of billion parameters, these DNNs are too large to be deployed to, or efficiently run on resource-constrained devices such as mobile phones or IoT microcontrollers. Systems relying on large-scale DNNs thus have to call the corresponding model over the network, leading to substantial costs for hosting and running the large-scale remote model, costs which are often charged on a per-use basis. In this paper, we propose BiSupervised, a novel architecture, where, before relying on a large remote DNN, a system attempts to make a prediction on a small-scale local model. A DNN supervisor monitors said prediction process and identifies easy inputs for which the local prediction can be trusted. For these inputs, the remote model does not have to be invoked, thus saving costs, while only marginally impacting the overall system accuracy. Our architecture furthermore foresees a second supervisor to monitor the remote predictions and identify inputs for which not even these can be trusted, allowing to raise an exception or run a fallback strategy instead. We evaluate the cost savings, and the ability to detect incorrectly predicted inputs on four diverse case studies: IMDB movie review sentiment classification, Github issue triaging, Imagenet image classification, and SQuADv2 free-text question answering  ( 3 min )
    FMG-Net and W-Net: Multigrid Inspired Deep Learning Architectures For Medical Imaging Segmentation. (arXiv:2304.02725v1 [eess.IV])
    Accurate medical imaging segmentation is critical for precise and effective medical interventions. However, despite the success of convolutional neural networks (CNNs) in medical image segmentation, they still face challenges in handling fine-scale features and variations in image scales. These challenges are particularly evident in complex and challenging segmentation tasks, such as the BraTS multi-label brain tumor segmentation challenge. In this task, accurately segmenting the various tumor sub-components, which vary significantly in size and shape, remains a significant challenge, with even state-of-the-art methods producing substantial errors. Therefore, we propose two architectures, FMG-Net and W-Net, that incorporate the principles of geometric multigrid methods for solving linear systems of equations into CNNs to address these challenges. Our experiments on the BraTS 2020 dataset demonstrate that both FMG-Net and W-Net outperform the widely used U-Net architecture regarding tumor subcomponent segmentation accuracy and training efficiency. These findings highlight the potential of incorporating the principles of multigrid methods into CNNs to improve the accuracy and efficiency of medical imaging segmentation.  ( 2 min )
    Source-free Domain Adaptation Requires Penalized Diversity. (arXiv:2304.02798v1 [cs.LG])
    While neural networks are capable of achieving human-like performance in many tasks such as image classification, the impressive performance of each model is limited to its own dataset. Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data, thus, increasing data privacy. Diversity in representation space can be vital to a model`s adaptability in varied and difficult domains. In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor. Motivated by the improved predictive performance of ensembles, we propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors with Distinct Backbone Architectures (DBA). Although diversity in feature space is increased, the unconstrained mutual information (MI) maximization may potentially introduce amplification of weak hypotheses. Thus we introduce the Weak Hypothesis Penalization (WHP) regularizer as a mitigation strategy. Our work proposes Penalized Diversity (PD) where the synergy of DBA and WHP is applied to unsupervised source-free domain adaptation for covariate shift. In addition, PD is augmented with a weighted MI maximization objective for label distribution shift. Empirical results on natural, synthetic, and medical domains demonstrate the effectiveness of PD under different distributional shifts.  ( 3 min )
    Exploring the Utility of Self-Supervised Pretraining Strategies for the Detection of Absent Lung Sliding in M-Mode Lung Ultrasound. (arXiv:2304.02724v1 [cs.CV])
    Self-supervised pretraining has been observed to improve performance in supervised learning tasks in medical imaging. This study investigates the utility of self-supervised pretraining prior to conducting supervised fine-tuning for the downstream task of lung sliding classification in M-mode lung ultrasound images. We propose a novel pairwise relationship that couples M-mode images constructed from the same B-mode image and investigate the utility of data augmentation procedure specific to M-mode lung ultrasound. The results indicate that self-supervised pretraining yields better performance than full supervision, most notably for feature extractors not initialized with ImageNet-pretrained weights. Moreover, we observe that including a vast volume of unlabelled data results in improved performance on external validation datasets, underscoring the value of self-supervision for improving generalizability in automatic ultrasound interpretation. To the authors' best knowledge, this study is the first to characterize the influence of self-supervised pretraining for M-mode ultrasound.  ( 2 min )
    NTK-SAP: Improving neural network pruning by aligning training dynamics. (arXiv:2304.02840v1 [cs.LG])
    Pruning neural networks before training has received increasing interest due to its potential to reduce training time and memory. One popular method is to prune the connections based on a certain metric, but it is not entirely clear what metric is the best choice. Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK. Motivated by this finding, we propose to prune the connections that have the least influence on the spectrum of the NTK. This method can help maintain the NTK spectrum, which may help align the training dynamics to that of its dense counterpart. However, one possible issue is that the fixed-weight-NTK corresponding to a given initial point can be very different from the NTK corresponding to later iterates during the training phase. We further propose to sample multiple realizations of random weights to estimate the NTK spectrum. Note that our approach is weight-agnostic, which is different from most existing methods that are weight-dependent. In addition, we use random inputs to compute the fixed-weight-NTK, making our method data-agnostic as well. We name our foresight pruning algorithm Neural Tangent Kernel Spectrum-Aware Pruning (NTK-SAP). Empirically, our method achieves better performance than all baselines on multiple datasets.  ( 3 min )
    TBDetector:Transformer-Based Detector for Advanced Persistent Threats with Provenance Graph. (arXiv:2304.02838v1 [cs.CR])
    APT detection is difficult to detect due to the long-term latency, covert and slow multistage attack patterns of Advanced Persistent Threat (APT). To tackle these issues, we propose TBDetector, a transformer-based advanced persistent threat detection method for APT attack detection. Considering that provenance graphs provide rich historical information and have the powerful attacks historic correlation ability to identify anomalous activities, TBDetector employs provenance analysis for APT detection, which summarizes long-running system execution with space efficiency and utilizes transformer with self-attention based encoder-decoder to extract long-term contextual features of system states to detect slow-acting attacks. Furthermore, we further introduce anomaly scores to investigate the anomaly of different system states, where each state is calculated with an anomaly score corresponding to its similarity score and isolation score. To evaluate the effectiveness of the proposed method, we have conducted experiments on five public datasets, i.e., streamspot, cadets, shellshock, clearscope, and wget_baseline. Experimental results and comparisons with state-of-the-art methods have exhibited better performance of our proposed method.  ( 2 min )
    Probing the Purview of Neural Networks via Gradient Analysis. (arXiv:2304.02834v1 [cs.LG])
    We analyze the data-dependent capacity of neural networks and assess anomalies in inputs from the perspective of networks during inference. The notion of data-dependent capacity allows for analyzing the knowledge base of a model populated by learned features from training data. We define purview as the additional capacity necessary to characterize inference samples that differ from the training data. To probe the purview of a network, we utilize gradients to measure the amount of change required for the model to characterize the given inputs more accurately. To eliminate the dependency on ground-truth labels in generating gradients, we introduce confounding labels that are formulated by combining multiple categorical labels. We demonstrate that our gradient-based approach can effectively differentiate inputs that cannot be accurately represented with learned features. We utilize our approach in applications of detecting anomalous inputs, including out-of-distribution, adversarial, and corrupted samples. Our approach requires no hyperparameter tuning or additional data processing and outperforms state-of-the-art methods by up to 2.7%, 19.8%, and 35.6% of AUROC scores, respectively.  ( 2 min )
    Application of Transformers based methods in Electronic Medical Records: A Systematic Literature Review. (arXiv:2304.02768v1 [cs.CL])
    The combined growth of available data and their unstructured nature has received increased interest in natural language processing (NLP) techniques to make value of these data assets since this format is not suitable for statistical analysis. This work presents a systematic literature review of state-of-the-art advances using transformer-based methods on electronic medical records (EMRs) in different NLP tasks. To the best of our knowledge, this work is unique in providing a comprehensive review of research on transformer-based methods for NLP applied to the EMR field. In the initial query, 99 articles were selected from three public databases and filtered into 65 articles for detailed analysis. The papers were analyzed with respect to the business problem, NLP task, models and techniques, availability of datasets, reproducibility of modeling, language, and exchange format. The paper presents some limitations of current research and some recommendations for further research.  ( 2 min )
    SoK: Machine Learning for Continuous Integration. (arXiv:2304.02829v1 [cs.SE])
    Continuous Integration (CI) has become a well-established software development practice for automatically and continuously integrating code changes during software development. An increasing number of Machine Learning (ML) based approaches for automation of CI phases are being reported in the literature. It is timely and relevant to provide a Systemization of Knowledge (SoK) of ML-based approaches for CI phases. This paper reports an SoK of different aspects of the use of ML for CI. Our systematic analysis also highlights the deficiencies of the existing ML-based solutions that can be improved for advancing the state-of-the-art.  ( 2 min )
    GPT detectors are biased against non-native English writers. (arXiv:2304.02819v1 [cs.CL])
    The rapid adoption of generative language models has brought about substantial advancements in digital communication, while simultaneously raising concerns regarding the potential misuse of AI-generated content. Although numerous detection methods have been proposed to differentiate between AI and human-generated content, the fairness and robustness of these detectors remain underexplored. In this study, we evaluate the performance of several widely-used GPT detectors using writing samples from native and non-native English writers. Our findings reveal that these detectors consistently misclassify non-native English writing samples as AI-generated, whereas native writing samples are accurately identified. Furthermore, we demonstrate that simple prompting strategies can not only mitigate this bias but also effectively bypass GPT detectors, suggesting that GPT detectors may unintentionally penalize writers with constrained linguistic expressions. Our results call for a broader conversation about the ethical implications of deploying ChatGPT content detectors and caution against their use in evaluative or educational settings, particularly when they may inadvertently penalize or exclude non-native English speakers from the global discourse.  ( 2 min )
    Hybrid Zonotopes Exactly Represent ReLU Neural Networks. (arXiv:2304.02755v1 [cs.LG])
    We show that hybrid zonotopes offer an equivalent representation of feed-forward fully connected neural networks with ReLU activation functions. Our approach demonstrates that the complexity of binary variables is equal to the total number of neurons in the network and hence grows linearly in the size of the network. We demonstrate the utility of the hybrid zonotope formulation through three case studies including nonlinear function approximation, MPC closed-loop reachability and verification, and robustness of classification on the MNIST dataset.  ( 2 min )
    Graph Representation Learning for Interactive Biomolecule Systems. (arXiv:2304.02656v1 [q-bio.QM])
    Advances in deep learning models have revolutionized the study of biomolecule systems and their mechanisms. Graph representation learning, in particular, is important for accurately capturing the geometric information of biomolecules at different levels. This paper presents a comprehensive review of the methodologies used to represent biological molecules and systems as computer-recognizable objects, such as sequences, graphs, and surfaces. Moreover, it examines how geometric deep learning models, with an emphasis on graph-based techniques, can analyze biomolecule data to enable drug discovery, protein characterization, and biological system analysis. The study concludes with an overview of the current state of the field, highlighting the challenges that exist and the potential future research directions.  ( 2 min )
    Logistic-Normal Likelihoods for Heteroscedastic Label Noise in Classification. (arXiv:2304.02849v1 [cs.LG])
    A natural way of estimating heteroscedastic label noise in regression is to model the observed (potentially noisy) target as a sample from a normal distribution, whose parameters can be learned by minimizing the negative log-likelihood. This loss has desirable loss attenuation properties, as it can reduce the contribution of high-error examples. Intuitively, this behavior can improve robustness against label noise by reducing overfitting. We propose an extension of this simple and probabilistic approach to classification that has the same desirable loss attenuation properties. We evaluate the effectiveness of the method by measuring its robustness against label noise in classification. We perform enlightening experiments exploring the inner workings of the method, including sensitivity to hyperparameters, ablation studies, and more.  ( 2 min )
    Agnostic proper learning of monotone functions: beyond the black-box correction barrier. (arXiv:2304.02700v1 [cs.DS])
    We give the first agnostic, efficient, proper learning algorithm for monotone Boolean functions. Given $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$ uniformly random examples of an unknown function $f:\{\pm 1\}^n \rightarrow \{\pm 1\}$, our algorithm outputs a hypothesis $g:\{\pm 1\}^n \rightarrow \{\pm 1\}$ that is monotone and $(\mathrm{opt} + \varepsilon)$-close to $f$, where $\mathrm{opt}$ is the distance from $f$ to the closest monotone function. The running time of the algorithm (and consequently the size and evaluation time of the hypothesis) is also $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$, nearly matching the lower bound of Blais et al (RANDOM '15). We also give an algorithm for estimating up to additive error $\varepsilon$ the distance of an unknown function $f$ to monotone using a run-time of $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$. Previously, for both of these problems, sample-efficient algorithms were known, but these algorithms were not run-time efficient. Our work thus closes this gap in our knowledge between the run-time and sample complexity. This work builds upon the improper learning algorithm of Bshouty and Tamon (JACM '96) and the proper semiagnostic learning algorithm of Lange, Rubinfeld, and Vasilyan (FOCS '22), which obtains a non-monotone Boolean-valued hypothesis, then ``corrects'' it to monotone using query-efficient local computation algorithms on graphs. This black-box correction approach can achieve no error better than $2\mathrm{opt} + \varepsilon$ information-theoretically; we bypass this barrier by a) augmenting the improper learner with a convex optimization step, and b) learning and correcting a real-valued function before rounding its values to Boolean. Our real-valued correction algorithm solves the ``poset sorting'' problem of [LRV22] for functions over general posets with non-Boolean labels.  ( 3 min )
    Bengali Fake Review Detection using Semi-supervised Generative Adversarial Networks. (arXiv:2304.02739v1 [cs.CL])
    This paper investigates the potential of semi-supervised Generative Adversarial Networks (GANs) to fine-tune pretrained language models in order to classify Bengali fake reviews from real reviews with a few annotated data. With the rise of social media and e-commerce, the ability to detect fake or deceptive reviews is becoming increasingly important in order to protect consumers from being misled by false information. Any machine learning model will have trouble identifying a fake review, especially for a low resource language like Bengali. We have demonstrated that the proposed semi-supervised GAN-LM architecture (generative adversarial network on top of a pretrained language model) is a viable solution in classifying Bengali fake reviews as the experimental results suggest that even with only 1024 annotated samples, BanglaBERT with semi-supervised GAN (SSGAN) achieved an accuracy of 83.59% and a f1-score of 84.89% outperforming other pretrained language models - BanglaBERT generator, Bangla BERT Base and Bangla-Electra by almost 3%, 4% and 10% respectively in terms of accuracy. The experiments were conducted on a manually labeled food review dataset consisting of total 6014 real and fake reviews collected from various social media groups. Researchers that are experiencing difficulty recognizing not just fake reviews but other classification issues owing to a lack of labeled data may find a solution in our proposed methodology.  ( 3 min )
    UNICORN: A Unified Backdoor Trigger Inversion Framework. (arXiv:2304.02786v1 [cs.LG])
    The backdoor attack, where the adversary uses inputs stamped with triggers (e.g., a patch) to activate pre-planted malicious behaviors, is a severe threat to Deep Neural Network (DNN) models. Trigger inversion is an effective way of identifying backdoor models and understanding embedded adversarial behaviors. A challenge of trigger inversion is that there are many ways of constructing the trigger. Existing methods cannot generalize to various types of triggers by making certain assumptions or attack-specific constraints. The fundamental reason is that existing work does not consider the trigger's design space in their formulation of the inversion problem. This work formally defines and analyzes the triggers injected in different spaces and the inversion problem. Then, it proposes a unified framework to invert backdoor triggers based on the formalization of triggers and the identified inner behaviors of backdoor models from our analysis. Our prototype UNICORN is general and effective in inverting backdoor triggers in DNNs. The code can be found at https://github.com/RU-System-Software-and-Security/UNICORN.  ( 2 min )
    Heavy-Tailed Regularization of Weight Matrices in Deep Neural Networks. (arXiv:2304.02911v1 [stat.ML])
    Unraveling the reasons behind the remarkable success and exceptional generalization capabilities of deep neural networks presents a formidable challenge. Recent insights from random matrix theory, specifically those concerning the spectral analysis of weight matrices in deep neural networks, offer valuable clues to address this issue. A key finding indicates that the generalization performance of a neural network is associated with the degree of heavy tails in the spectrum of its weight matrices. To capitalize on this discovery, we introduce a novel regularization technique, termed Heavy-Tailed Regularization, which explicitly promotes a more heavy-tailed spectrum in the weight matrix through regularization. Firstly, we employ the Weighted Alpha and Stable Rank as penalty terms, both of which are differentiable, enabling the direct calculation of their gradients. To circumvent over-regularization, we introduce two variations of the penalty function. Then, adopting a Bayesian statistics perspective and leveraging knowledge from random matrices, we develop two novel heavy-tailed regularization methods, utilizing Powerlaw distribution and Frechet distribution as priors for the global spectrum and maximum eigenvalues, respectively. We empirically show that heavytailed regularization outperforms conventional regularization techniques in terms of generalization performance.  ( 2 min )
    A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation. (arXiv:2304.02858v1 [cs.LG])
    Class imbalance (CI) in classification problems arises when the number of observations belonging to one class is lower than the other classes. Ensemble learning that combines multiple models to obtain a robust model has been prominently used with data augmentation methods to address class imbalance problems. In the last decade, a number of strategies have been added to enhance ensemble learning and data augmentation methods, along with new methods such as generative adversarial networks (GANs). A combination of these has been applied in many studies, but the true rank of different combinations would require a computational review. In this paper, we present a computational review to evaluate data augmentation and ensemble learning methods used to address prominent benchmark CI problems. We propose a general framework that evaluates 10 data augmentation and 10 ensemble learning methods for CI problems. Our objective was to identify the most effective combination for improving classification performance on imbalanced datasets. The results indicate that combinations of data augmentation methods with ensemble learning can significantly improve classification performance on imbalanced datasets. These findings have important implications for the development of more effective approaches for handling imbalanced datasets in machine learning applications.  ( 3 min )
    ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast. (arXiv:2304.02689v1 [cs.CV])
    Medical data often exhibits long-tail distributions with heavy class imbalance, which naturally leads to difficulty in classifying the minority classes (i.e., boundary regions or rare objects). Recent work has significantly improved semi-supervised medical image segmentation in long-tailed scenarios by equipping them with unsupervised contrastive criteria. However, it remains unclear how well they will perform in the labeled portion of data where class distribution is also highly imbalanced. In this work, we present ACTION++, an improved contrastive learning framework with adaptive anatomical contrast for semi-supervised medical segmentation. Specifically, we propose an adaptive supervised contrastive loss, where we first compute the optimal locations of class centers uniformly distributed on the embedding space (i.e., off-line), and then perform online contrastive matching training by encouraging different class features to adaptively match these distinct and uniformly distributed class centers. Moreover, we argue that blindly adopting a constant temperature $\tau$ in the contrastive loss on long-tailed medical data is not optimal, and propose to use a dynamic $\tau$ via a simple cosine schedule to yield better separation between majority and minority classes. Empirically, we evaluate ACTION++ on ACDC and LA benchmarks and show that it achieves state-of-the-art across two semi-supervised settings. Theoretically, we analyze the performance of adaptive anatomical contrast and confirm its superiority in label efficiency.  ( 3 min )
    Adaptable and Interpretable Framework for Novelty Detection in Real-Time IoT Systems. (arXiv:2304.02947v1 [cs.LG])
    This paper presents the Real-time Adaptive and Interpretable Detection (RAID) algorithm. The novel approach addresses the limitations of state-of-the-art anomaly detection methods for multivariate dynamic processes, which are restricted to detecting anomalies within the scope of the model training conditions. The RAID algorithm adapts to non-stationary effects such as data drift and change points that may not be accounted for during model development, resulting in prolonged service life. A dynamic model based on joint probability distribution handles anomalous behavior detection in a system and the root cause isolation based on adaptive process limits. RAID algorithm does not require changes to existing process automation infrastructures, making it highly deployable across different domains. Two case studies involving real dynamic system data demonstrate the benefits of the RAID algorithm, including change point adaptation, root cause isolation, and improved detection accuracy.  ( 2 min )
    A Certified Radius-Guided Attack Framework to Image Segmentation Models. (arXiv:2304.02693v1 [cs.CV])
    Image segmentation is an important problem in many safety-critical applications. Recent studies show that modern image segmentation models are vulnerable to adversarial perturbations, while existing attack methods mainly follow the idea of attacking image classification models. We argue that image segmentation and classification have inherent differences, and design an attack framework specially for image segmentation models. Our attack framework is inspired by certified radius, which was originally used by defenders to defend against adversarial perturbations to classification models. We are the first, from the attacker perspective, to leverage the properties of certified radius and propose a certified radius guided attack framework against image segmentation models. Specifically, we first adapt randomized smoothing, the state-of-the-art certification method for classification models, to derive the pixel's certified radius. We then focus more on disrupting pixels with relatively smaller certified radii and design a pixel-wise certified radius guided loss, when plugged into any existing white-box attack, yields our certified radius-guided white-box attack. Next, we propose the first black-box attack to image segmentation models via bandit. We design a novel gradient estimator, based on bandit feedback, which is query-efficient and provably unbiased and stable. We use this gradient estimator to design a projected bandit gradient descent (PBGD) attack, as well as a certified radius-guided PBGD (CR-PBGD) attack. We prove our PBGD and CR-PBGD attacks can achieve asymptotically optimal attack performance with an optimal rate. We evaluate our certified-radius guided white-box and black-box attacks on multiple modern image segmentation models and datasets. Our results validate the effectiveness of our certified radius-guided attack framework.  ( 3 min )
    Predictive Coding as a Neuromorphic Alternative to Backpropagation: A Critical Evaluation. (arXiv:2304.02658v1 [cs.NE])
    Backpropagation has rapidly become the workhorse credit assignment algorithm for modern deep learning methods. Recently, modified forms of predictive coding (PC), an algorithm with origins in computational neuroscience, have been shown to result in approximately or exactly equal parameter updates to those under backpropagation. Due to this connection, it has been suggested that PC can act as an alternative to backpropagation with desirable properties that may facilitate implementation in neuromorphic systems. Here, we explore these claims using the different contemporary PC variants proposed in the literature. We obtain time complexity bounds for these PC variants which we show are lower-bounded by backpropagation. We also present key properties of these variants that have implications for neurobiological plausibility and their interpretations, particularly from the perspective of standard PC as a variational Bayes algorithm for latent probabilistic models. Our findings shed new light on the connection between the two learning frameworks and suggest that, in its current forms, PC may have more limited potential as a direct replacement of backpropagation than previously envisioned.  ( 2 min )

  • Open

    Is there any research on Multi-Agent IRL with 2 teams of agents?
    Completely new to reinforcement learning. I'm working on the very ambitious task of expressing the game of football as a markov game and trying to obtain a reward function, however I couldn't really find research for multi-agent IRL in the case of 2 teams of agents, each trying with different objectives but that cooperate within the same team. Has this been done? submitted by /u/macaco3001 [link] [comments]  ( 42 min )
    Slowly increasing environment difficulty over time during training?
    Currently I am dealing with the problem that if I let my agent play in the full environment it never manages to win the game since winning the game requires picking reasonable moves many times in a row and making even a few stupid moves will never allow you to win. Therefore it can never get a reward and start training. Would it make sense to start with a small toy version of the environment and slowly adjust to the real size over time? I have a pretty natural parameter in the environment that would allow me to slowly do this. Any ideas if this is sometimes typically done and whether or not it’s bad practice? submitted by /u/Invariant_apple [link] [comments]  ( 43 min )
    Deep Reinforcement Learning with ray.io and AWS
    Have anyone worked with Deep Reinforcement Learning(especially asynchronous algorithms) with integration with Ray and AWS? If yes please provide me with a link to your project. It will be very helpful. submitted by /u/Alam_12703 [link] [comments]  ( 42 min )
    _Probabilistic Numerics: Computation as Machine Learning_, Hennig et al 2022
    submitted by /u/gwern [link] [comments]  ( 41 min )
    Deepbots 1.0 release: Reinforcement Learning in Webots
    submitted by /u/aidudezzz [link] [comments]  ( 42 min )
    cleanrl gym issues
    I'm trying to run cleanrl in ubuntu 22.04 but keep getting this error. I have done pip install gym and pip install gym[all] but nothing seems to work. ​ https://preview.redd.it/jb6f3a8w5gsa1.png?width=711&format=png&auto=webp&s=2e34bbe91b1d51032c91a448960864612880a00e submitted by /u/kwasi3114 [link] [comments]  ( 42 min )
    Cartpole Tutorial gym[box2d] package render issue
    Hello I am currently following this tutorial: https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html and have trouble the calling the 'env.render()' in the beginning of the 'for t in count():' where the environment box does not show up. I tried installing via >>pip install gym[box2d] (also tried >>pip install gymnasium[box2d]) and says it is a 'legacy-install-failure' error. Same issue when following sentdex's stable-baselines3 YouTube tutorial series. I am using a Python 3.10 interpreter and tried Python 3.6. Has anyone else had a similar issue and is there a way to get it working? ​ Any help would be greatly appreciated. ​ https://preview.redd.it/x40hd5ingfsa1.png?width=1510&format=png&auto=webp&s=c9087addbd4b346061dde14ee1ec517407606ac9 submitted by /u/skizblue [link] [comments]  ( 42 min )
    Loss is ok. But the gradient is nan.
    I am trying to implement a PPO. After a few epochs (sometimes 80, sometimes 400) I get this: "RuntimeError: Function 'ExpBackward0' returned nan values in its 0th output." ​ What does "0th output" mean? How to find and fix the error? It seems to be somewhere in get_actor_loss(), but I don't understand where. Code: http://pastie.org/p/3mXru9zTqOndtbkihNNL1f submitted by /u/United-Sandwich-1965 [link] [comments]  ( 45 min )
    How to equally compare 9 different environments
    I'm drawing a blank here, really not sure what the best most correct way is to do this. I have an excel file of 900 different data points where I have compared 9 different environments and used 6 different algorithms (where applicable) my environments are: Acrobot Bipedal Walker Car Racing Lunar Lander CartPole Mountain Car Mountain Car Continuous Pendulum Hardcore bipedal walker I am benchmarking these algorithms for a project. now lets say I trained PPO on the acrobot and got a score of 500, this is 100 percent of the possible score, but you can also get to a score of -500. and if I got it to 500 it is not the same thing as getting the pendulum environment to a score of 500, I think this is impossible. All my environments are on default settings. I can't seem to find the highest and lowest scores for all 9 of these environments. even if I did i'm still not sure what I would do to equally compare the algorithms capabilities on the environments. If there was no such thing as a negative score and the lowest you could get was 0 it would be easy as I could just work out everything as a percentage of the highest possible score. ​ Any ideas? submitted by /u/stainedmug [link] [comments]  ( 43 min )
  • Open

    Looking for an AI art generator that's free to download, and allows NSFW
    Simply put an AI art generator that is free to download and allows NSFW. Preferably a SFW / NSFW mode so it can "differentiate" between the two; I mean just look what happened when I tried to give cloud a gun and point it at the viewer: Cloud Strife from Final Fantasy Seven Remake Pointing A Gun At The Viewer (Apparently according to AI generation, where do I begin with whats wrong with this image and what I entered. There's 3 issues, can you spot all 3?) submitted by /u/Tirsiak_LAGStudios [link] [comments]  ( 41 min )
    Video translation deep fake that lip syncs the new audio
    I've seen services that will learn your voice then speak any text given in your voice realistically. I've seen translation services that will translate your videos into other languages, either text based or a dub over. What I'd like to know is if anyone is working on a deep fake video translation service that takes your voice (and/or voice of others in the video), translates the original audio, then dubs over the voices with the new language while at the same time lip syncing those in the video to the new language using a deep fake. The end result is that it seems like the people in the video were speaking the language natively all along. submitted by /u/AstralTrader [link] [comments]  ( 43 min )
    I solved the threat of AI - they're one of us now! Cheers!
    submitted by /u/popnuts [link] [comments]  ( 43 min )
    The newest version of ChatGPT passed the US medical licensing exam with flying colors — and diagnosed a 1 in 100,000 condition in seconds
    submitted by /u/thisisinsider [link] [comments]  ( 43 min )
    Is there even a point in catching up with AI now or is it too late?
    I have some understanding how it works but not precise, now I see I've oversleep on the key technology and I want to catch up, starting from pytorch basics and moving up quickly. But I fear it is too late, that the models now and methodologies used are far beyond what I can learn and even if I could grasp the idea, I will be so far behind leaders in the field that there is no point. Maybe even I will be learning slower than the technology will progress meaning I will never ever catch up. Also the resources required are skyrocketing, if I want to train anything reasonable I would need to rent several A100 gpus in the cloud for hundreds of dollars, while leaders and wealthy companies would just fire up thousands of those at will. I compare it to start of microprocessors, before the very tightly compacted CPU dies there were amateurs that could put together something usable, but after that it became impossible to enter the field, now the situation is that the only reasonable provider is TSMC and nobody can get close. ​ What do you think? Should I just give up and start growing potatoes? submitted by /u/YourFlakingFuture [link] [comments]  ( 47 min )
    Anthropic's $5B, 4-year plan to take on OpenAI
    submitted by /u/jaketocake [link] [comments]  ( 42 min )
    Are some specific AI tools niche?
    Hello, I was just wondering if some specific AI tools would be considered niche, in the sense that some of them perform certain things better than others. For example, there seems to be image-generative AI that is good at creating faces, some hands, and others for creating weird, psychedelic-like images. I'm just wondering if we are likely to see a number of AI services that all do essentially the same thing, but specialize niches or if they're all going to be (at some point in the near future) the same? ChatGPT said there will likely be AI services that focus on certain industries, which is what I thought would be the case too, but I'm wondering if this will still be the case once businesses start consolidating all of these services. Just wanted to get Reddit's thoughts on this. submitted by /u/FlandersFlannigan [link] [comments]  ( 43 min )
    AI course vs NLP course?
    Hello! I am an undergrad and will be graduating soon, and I was wondering if anyone could give me any guidance for choosing between an Artificial Intelligence Course or a Natural Language Processing Course? I took a Machine Learning course during the winter quarter and I really enjoyed it, so I signed up for both Artificial Intelligence and Natural Language Processing for Spring quarter, since this is the only quarter either of them will be offered before I graduate. But due to personal issues I am dealing with I don’t think I will have enough time to put work into both and I will probably have to drop one of them. So I am wondering which course would be more useful? My original plan post graduation was to work as a software developer, but I never really found a field I was super passionate about when taking my different CS electives. But after taking the ML course I really enjoyed it and I am thinking a career in ML may be a possibility. Though, currently I do not plan on getting a Masters or PHD (if at all), so after graduation I do plan on finding a software developer position until I find a path I am more interested in (since I need the income). The AI course at my University is general and focuses on classical AI theory, such as Search algorithms, Probabilistic Graphical models, Markov Decision Processes, Reinforcement Learning Algorithms, etc. And I know the NLP class goes into more modern topics and will be a lot more focused, covering HMMs and MEMMs, Conditional Random Fields, Deep learning for NLP, transformers, etc. I am wondering if it would be more beneficial to learn about the classical AI approaches, or the more modern NLP topics that are rapidly changing (and might possibly be less relevant in the future as different advancements are made?)? And since I probably can only take one of them, which would be better to have on my resume as relevant coursework? Thank you for any input! submitted by /u/Hefty_Artichoke_7083 [link] [comments]  ( 44 min )
    This video make me think AI is like trying to teach a 2D entity how to exist in the 3D
    https://youtu.be/24yjRbBah3w "Why AI art struggles with hands" by Vox I was watching this video, and I came to the thought, "if this is how AI sees the world, I wonder if this is how it'd be like trying to describe the 3D to someone who can only experience the 2D and 1D?" AI reads its own code and we can imput pictures and videos to give it information to read, but how do we know what it's seeing or how it's seeing? What is their experience like compared to ours? Take this post with a grain of salt, I just wanted to put this thought out there for discussion and see what other people would say. submitted by /u/Ambitious-Prune-9461 [link] [comments]  ( 43 min )
    Guy uses only ChatGPT code to build flappy bird
    submitted by /u/stealthyinfiltrator [link] [comments]  ( 7 min )
    Glaze protects art from prying AIs: « The goal is to provide artists with a tool to fight back against the data miners’ incursions — and at least disrupt their ability to rip hard-worked artistic style without them needing to give up on publicly showcasing their work online. »
    submitted by /u/fchung [link] [comments]  ( 44 min )
    AI — weekly megathread!
    This week in AI - partnered with aibrews.com - feel free to follow their newsletter Luma AI released a new Unreal Engine plugin for creating realistic 3D scenes using NeRFs. It utilizes fully volumetric rendering and runs locally, eliminating the need for mesh format adjustments, geometry, materials or streaming [video]. Meta released Segment Anything Model (SAM): a new AI model that can "cut out" any object, in any image, with a single click. Meta also released Segment Anything 1-Billion mask dataset (SA-1B), that has 400x more masks than any existing segmentation dataset [Link to Demo. Details] Bloomberg introduced BloombergGPT, a 50 billion parameter language model, trained on a 700 billion token dataset, that supports a wide range of tasks within the financial industry [details]. …  ( 45 min )
    Ya’ll got anymore...
    submitted by /u/electricaldummy17 [link] [comments]  ( 42 min )
    Someone Created an AI version of Samantha from the movie Her 🤖
    submitted by /u/techmanj [link] [comments]  ( 44 min )
    AI programs I can give pictures for it to edit?
    I am wondering if this yet exists: I have a photo that I just want my face edited on different bodies/art and drawing styles etc so I can make a canvas of it for my home. (It would be nice if I could put in an image of my puppy and have her be included in the picture as well)-- Where can I do this? Any input would be so much appreciated! I just don't know how to google search whatever this might be. Have a super day!! submitted by /u/smolpoodle [link] [comments]  ( 43 min )
    Automation everywhere could force a massive shift in cultural values
    Have you ever wished you had time to follow your personal ambitions? Have you ever felt trapped by the daily grind? Imagine having more time to breathe, focus on yourself, and pursue the things you actually care about. The United States in particular is saturated by the belief that being a hard worker is especially virtuous -- even if the job one has is unrewarding either personally or financially. Managers and their bosses lean heavily on this, and as a culture we shame anyone who hesitates to put in max effort. It is, simply, toxic as hell. Both robotics and artificial intelligence are beginning to reach a point where both physically and mentally demanding arbitrary tasks can be done by non-people. I was just thinking about how The Google Assistant isn't much of an assistant at all, an…  ( 57 min )
    Is there a website or a tool (for windows preferably) which allows the program/tool to analyze a certain voice/speech material which you provide it with and can then generate a speech (with text you feed it with) in that exact voice?
    Like for example when you have 3-4 minutes of audio material from a single person speaking submitted by /u/purespective [link] [comments]  ( 43 min )
    How long is the process in LetsEnhance AI?
    Basically I uploaded an image to LetsEnhance AI (the image is 6000*4000 24 mp) and was trying to upscale and enhance it with the digital art option in LetsEnhance AI and the process has been taking almost an hour and no image was generated at all, should I wait more or what should I do? submitted by /u/sherox2345 [link] [comments]  ( 42 min )
    When will we get txt to cartoon video?
    I know txt to video is a while away if it is going to look lifelike, but what about basic colours, drawings in cartoon fashion? I'd like to create some AI content for my kids, they love watching specific stuff there is little of on uoutobe. When can I get chat gpt to write a script and then ai to animate it? submitted by /u/zascar [link] [comments]  ( 43 min )
    Huggingface .bin .pt .safetensors
    Hi, im very much out of my depth but trying to learn. I've been experimenting with local LLMs, I've downloaded several from huggingface but a large portion contain .bin files, which do not seem to work with the fastchat application I am using. Could someone please explain the difference between .bin, .pt and .safetensor files please? submitted by /u/Useful-Command-8793 [link] [comments]  ( 44 min )
    Would it be possible to create an open community board (forum) - and then setup AI bots to post relevant topics frequently, and actually communicate with themselves?
    I'm curious if this is even possible or not, it would be pretty cool to create forums based on your own personal preferences, and then check in from time to time to see what the AI has shared. Imagine if people eventually signup and didn't realise it AI was active until you announced it. submitted by /u/BroadGeneral [link] [comments]  ( 45 min )
    The ChatGPT AI is revamping the story of the Creation of Life on Earth with a completely New Model
    Great submitted by /u/YardAccomplished5952 [link] [comments]  ( 42 min )
    What is the voice AI people are using lately?
    There’s some AI program that everyone is using to recreate the voices of celebrities/presidents/fictional characters, but nobody discloses what it is. I’m sure I stumbled across it once, because I distinctly remember seeing “Leon Kennedy” (Resident Evil 4 main character) amongst roughly a million other voices on a site a few months ago. There were a few paid plans as well, I think. That’s the one I’m looking for. (Mostly for that Leon S. Kennedy voice, if I’m being honest.) What is the name of that site/program? It’s driving me crazy not being able to find it! submitted by /u/KittyEarPeach [link] [comments]  ( 43 min )
  • Open

    [Discussion] Storing code into a vector database
    Any guidance/best practices on how to store source code into a vector database? If I have 300 repositories should I create 300 indexes? Or just dump them into a single index? How big should my chunks be? Any tips would be appreciated. submitted by /u/Vichnaiev [link] [comments]  ( 43 min )
    [D] LLM Concurrent Throughput
    I just started to get into LLM when LLaMA and all the fine tuned variations started to come out. I have got it working (fairly slowly) locally and have been in the discord where everyone is comparing settings and hardware and sharing their tokens per second. So far I have not talked to anyone who has tried running multiple connections at once to the LLM and how that effects things. Lets say I get an A100 GPU instance and load up a flavour of LLaMA. I ask it a question and it gives me a response at a certain token per second. What happens if I ask 10 questions concurrently? What about 100? Do concurrent questions share the same token pipeline? What effects performance when asking questions concurrently? Sorry if this is a stupid question but I can't find any information out there other than asking one question at a time in a console. submitted by /u/Mr_Nice_ [link] [comments]  ( 45 min )
    [P] Variable Length Time series output for fixed length input vector
    I want to build a model which takes in a Fixed length vector as input (basically a set of parameters/input settings of a machine) and the tricky part is that the output is a time series of variable length (the acceleration that the machine undergoes during the movement based on the input settings). e.g.: Set1: Temperature = 50, Start = 0, End = 50, Speed = 30 | Output: time-series acceleration sequence for this movement (Y1 at t1, Y2 at t2,... , Yp at tp) Set2: Temperature = 80, Start = 30, End = 50, Speed = 40 | Output: time-series acceleration sequence for this movement (Y1 at t1, Y2 at t2,..., Yq at tq) (Note that the length of the no. of points can vary) Set3: Temperature = 10, Start = 80, End = 90, Speed = 10 | Output: time-series acceleration sequence for this movement (Y1 at t1, Y2 at t2,..., Yr at tr) . . Setn: Temperature = 70, Start = 100, End = 105, Speed = 35 | I would like to predict for this particular setting, what would be the time-series acceleration during the movement. How do I approach this? I looked into LSTMs. Is this what is referred to as Vector to Sequence modeling in the RNN literature? Also, would the Transformers be able to solve this problem? References to good papers would be really helpful. submitted by /u/Batman815 [link] [comments]  ( 44 min )
    [D] Implementing trained LLMs with analog circuit components?
    Hi, Most of the trained large language models don't require retraining and are capable of doing in-context learning. I'm wondering if it would be possible to build a compact non-configurable analog or equivalent circuit representing these LLMs that could be run at a fraction of the power. Is there any paper that tried to implement self-attention with analog components? Just trying to understand what some of the advantages or disadvantages might be of this approach. submitted by /u/ashz8888 [link] [comments]  ( 44 min )
    [D] Making software that can deviate from preprogrammed rules
    Hi, I'm a game developer with 8 years of experience - so not a newbie, but not too experienced either. Also my experience with ML is mostly using final products and learning a bit from here and there. So, recently I've been thinking about creating a weird and exotic kind of software using ChatGPT. Specifically software that has an initial data model/schema, but that schema can evolve over time, depending on the needs of the user. For example, say I create a chatbot, that also has an portrait next to the textbox. For this "person", the initial data model can describe a name, gender, eye color, etc. On the person's face you see a scar, but having a scar was never supported in the initial data model. So you talk about it with the chatbot, he/she tells you some story which involves some …  ( 9 min )
    [R] Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
    Recently, we announced in this post the release of Cerebras-GPT — a family of open-source GPT models trained on the Pile dataset using the Chinchilla formula. Today, we are excited to announce the availability of the Cerebras-GPT research paper on arXiv. A few highlights from this paper: Pre-training Results (Section 3.1) - Cerebras-GPT sets the efficiency frontier, largely because models were pre-trained with 20 tokens per parameter, consistent with findings in the Chinchilla paper. Pile test set loss given pre-training FLOPs for Cerebras-GPT, GPT-J, GPT-NeoX, and Pythia ​ Downstream Results (Section 3.2) - Cerebras-GPT models form the compute-optimal Pareto frontier for downstream tasks as well. As Pythia and OPT models grow close to the 20 tokens per parameter count, they approach the Cerebras-GPT frontier FLOPs to accuracy Average zero- and five-shot downstream task accuracy plotted against FLOPs (left) and parameters (right). Higher accuracy is better ​ Maximal Update Parameterization (µP) and µTransfer (Section 3.3) - As we scaled the Cerebras-GPT models with standard parameterization (SP) along our scaling law, we experienced challenges predicting appropriate hyperparameters, and these models show substantial variance around their common scaling law. Across model sizes, our µP models exhibit an average of 0.43% improved Pile test loss and 1.7% higher average downstream task accuracy compared to our SP models. Here, we also show that µP performance scales more predictably, enabling more accurate performance extrapolation. Percentage loss increase relative to Cerebras-GPT scaling law plotted against training FLOPs submitted by /u/CS-fan-101 [link] [comments]  ( 46 min )
    [P] Best architecture for speech to text
    I'm looking into the possibility of training a speech to text system in a fairly niche language (danish) with fairly niche technical jargon. ​ I'll have access to both compute, memory and a steadily increasing source of transcribed speech. What I'm interested in is which architecture to start experimenting with. I am considering recurrent neural networks and specifically LSTM models. I'm also considering transformers, but am honestly scared away by the complexity of the architecture and all the hype. ​ I'm also considering convolutional neural networks, but I don't know of a way to use them for speech signals of varying length. ​ Any help or suggestions are much appreciated, and I'm aware of how much experimenting will go into it as well as the large chance of failure. submitted by /u/HelloImJustLooking [link] [comments]  ( 44 min )
    [R] I made an Awesome Papers List for Fine-Grained Image Classification with 1-slide summary of papers for each year and 1-slide summary of each paper (currently summary and slides only available from 2011 to 2015 but plan to add up to 2023 during the next weeks) along with a Github Pages companion!
    GitHub: https://github.com/arkel23/Awesome-Fine-Grained-Image-Classification Pages: https://arkel23.github.io/Awesome-Fine-Grained-Image-Classification/ submitted by /u/xEdwin23x [link] [comments]  ( 7 min )
    Form Filling Robot [P]
    Hey all, I'm completely stuck in a project & figured I'd ask the internet. I have to fill out the same 4 page form every day & am looking to have a robot do it for me. Is this more in the realm of something GPT4 could do, or is there some much more lightweight way to pull this off? I basically just need something that will recognize a name & copy that to the 'name' field in 3 more pages. Then do the same with address, phone #, so on. Any pointers? Thanks!!! submitted by /u/Internal-Influence95 [link] [comments]  ( 7 min )
    [R] PopulAtion Parameter Averaging (PAPA)
    submitted by /u/AlexiaJM [link] [comments]  ( 43 min )
    Text to fashion AI [P]
    We just created the first fashion use case for text to image AI - Staiyl (link to the beta version of the software) Staiyl allows users to create fashion designs using AI and send it to fashion designers and manufacturers to get it produced. We are looking for beta testers to join the waiting list to give feedback that will shape the final product, as we are launching in May. We are also looking for advice from machine learning experts on how to further improve our current AI as we scale. We are capping access and beta testers, so if interested just visit the website and leave your email and we will release access on a first-come first-serve basis. submitted by /u/whateverlolwtf [link] [comments]  ( 44 min )
    [R][P] Dataset Question [https://vcipl-okstate.org/pbvs/bench/Data/03/download.html] [OSU Color and Thermal Database ]
    I trained a YOLOV5 model on customOSU Thermal Pedestrian Database and it was annotated, but the OSU Color and Thermal Database is not annotated, what can I do ? How can I evaluate the model then ? submitted by /u/karun_kodes [link] [comments]  ( 43 min )
    [D] What is it like to work on niche topics that aren't LLM or Vision?
    I read this article: Behind the curtain: what it feels like to work in AI right now And it made me wonder - what's the climate like at the smaller research groups, or industrial groups, especially those that don't have the funds or logistics to research million dollar LLMs, or on hot vision models. Do you feel a shift in priorities? Have you abandoned research? Do you fear that some of these gigantic models will "swallow" your research, simply by someone combining those fields / overlaying the field over LLMs? Is there any trouble with finding grants / funding, if you're not all hands on deck with the latest trends? Has the timeline of you research stayed the same, or has the latest boom forced you to work faster? etc. submitted by /u/kastbort2021 [link] [comments]  ( 7 min )
    [R] Text-to-image Diffusion Models in Generative AI: A Survey
    Diffusion models have become a SOTA generative modeling method for numerous content types, such as images, audio, graph, etc. As the number of articles on diffusion models has grown exponentially over the past few years, there is an increasing need for survey works to summarize them. Recognizing the existence of such works, our team has completed multiple field-specific surveys on diffusion models. We promote our works here and hope they can be helpful to researchers in relative fields: text-to-image diffusion models [a survey], audio diffusion models [a survey], and graph diffusion models [a survey] . In the following, we briefly summarize our survey on text-to-image diffusion models. Text-to-image Diffusion Models in Generative AI: A Survey As a self-contained work, this survey starts with a brief introduction of how a basic diffusion model works for image synthesis, followed by how condition or guidance improves learning. Based on that, we present a review of state-of-the-art methods on text-conditioned image synthesis, i.e., text-to-image. We further summarize applications beyond text-to-image generation: text-guided creative generation and text-guided image editing. Beyond the progress made so far, we discuss existing challenges and promising future directions. Moreover, we have also completed two survey works on generative AI (AIGC) [a survey] and ChatGPT [a survey], respectively. Interested readers may give it a look. submitted by /u/Learningforeverrrrr [link] [comments]  ( 44 min )
    [R] Series of Surveys on ChatGPT, Generative AI (AIGC), and Diffusion Models
    A survey on ChatGPT: One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era A survey on Generative AI (AIGC): A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need? A survey on Text-to-image diffusion models: Text-to-image Diffusion Models in Generative AI: A Survey A survey on Audio diffusion models: A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI A survey on Graph diffusion models: A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material ChatGPT goes viral. Launched by OpenAI on November 30, 2022, ChatGPT has attracted unprecedented attention due to its powerful abilities all over the world. It took only 5 days [1] and 2 …  ( 47 min )
    [R] A survey on graph diffusion models
    Diffusion models have become a SOTA generative modeling method for numerous content types, such as images, audio, graph, etc. As the number of articles on diffusion models has grown exponentially over the past few years, there is an increasing need for survey works to summarize them. Recognizing the existence of such works, our team has completed multiple field-specific surveys on diffusion models. We promote our works here and hope they can be helpful to researchers in relative fields: text-to-image diffusion models [a survey], audio diffusion models [a survey], and graph diffusion models [a survey] . In the following, we briefly summarize our survey work on graph diffusion models. https://www.researchgate.net/publication/369716257_A_Survey_on_Graph_Diffusion_Models_Generative_AI_in_Science_for_Molecule_Protein_and_Material We start with a summary of the progress of graph generation before diffusion models. The diffusion models are then concisely presented and graph generation is discussed in depth from a structural and application perspective. Moreover, the currently popular evaluation datasets and metrics are covered. Finally, we summarize the challenges and research questions still facing the research community. This survey work might be a useful guidebook for researchers who are interested in exploring the potential of diffusion models for graph generation and related tasks. Moreover, we have also completed two survey works on generative AI (AIGC) [a survey] and ChatGPT [a survey], respectively. Interested readers may give it a look. submitted by /u/Learningforeverrrrr [link] [comments]  ( 44 min )
    [D] Any NLP annotation service recommendations?
    Hey all, ​ Do you have any suggestions for an NLP annotation service? I need some questions generated from a given passage to fine-tune a retriever and I have about 1500 documents. Pre-trained models work fine but I need more accuracy for this task and wondering if there are any good recommendations for labeling. Also, these documents are a bit technical, so not sure if that'll be a concern for these annotation service providers. Appreciate any help! submitted by /u/neuro_boogie [link] [comments]  ( 45 min )
    [P] Grounded-Segment-Anything: Zero-shot Detection and Segmentation
    "GroundingDINO" is a very powerful zero-shot object detector that can detect the location of a corresponding object based on any text input. However, it is unable to accurately segment the edges of the object." "segment-anything" is a very impressive model that can accurately provide segmentation masks of an object based on its box position. The simplest idea is to combine GroundingDINO and segment-anything to achieve a powerful zero-shot object detection and segmentation model! By combining these two models, we have created Grounded-Segment-Anything! here is the GitHub link: https://github.com/IDEA-Research/Grounded-Segment-Anything ​ Here are some visualization results: https://preview.redd.it/p2pkic7kjdsa1.png?width=2814&format=png&auto=webp&s=26baee24613026d57056a3066c9e48b934d7f18d https://preview.redd.it/m74oymaljdsa1.png?width=1335&format=png&auto=webp&s=49d5d5ce8e32e4c4ae1bf0f6bb108b2221ac5005 We're now building more interesting demos on these! We are very curious about its performance in zero-shot instance segmentation. The performance of these demos looks very impressive! ​ We can also combining Grounded-SAM with Diffusion for inpainting task https://preview.redd.it/2tk4kkjb7fsa1.png?width=2691&format=png&auto=webp&s=3a762d125d3d05dee5dfd7e344c29e954ac54751 ​ ​ submitted by /u/Technical-Vast1314 [link] [comments]  ( 44 min )
  • Open

    Towards ML-enabled cleaning robots
    Posted by Thomas Lew, Research Intern, and Montserrat Gonzalez Arenas, Research Engineer, Google Research, Brain Team Over the past several years, the capabilities of robotic systems have improved dramatically. As the technology continues to improve and robotic agents are more routinely deployed in real-world environments, their capacity to assist in day-to-day activities will take on increasing importance. Repetitive tasks like wiping surfaces, folding clothes, and cleaning a room seem well-suited for robots, but remain challenging for robotic systems designed for structured environments like factories. Performing these types of tasks in more complex environments, like offices or homes, requires dealing with greater levels of environmental variability captured by high-dimensional sensor…  ( 92 min )
    Directing ML toward natural hazard mitigation through collaboration
    Posted by Oren Gilon, Software Engineer, and Grey Nearing, Research Scientist, Google Research Floods are the most common type of natural disaster, affecting more than 250 million people globally each year. As part of Google's Crisis Response and our efforts to address the climate crisis, we are using machine learning (ML) models for Flood Forecasting to alert people in areas that are impacted before disaster strikes. Collaboration between researchers in the industry and academia is essential for accelerating progress towards mutual goals in ML-related research. Indeed, Google's current ML-based flood forecasting approach was developed in collaboration with researchers (1, 2) at the Johannes Kepler University in Vienna, Austria, the University of Alabama, and the Hebrew University of…  ( 91 min )
  • Open

    Deploy pre-trained models on AWS Wavelength with 5G edge using Amazon SageMaker JumpStart
    With the advent of high-speed 5G mobile networks, enterprises are more easily positioned than ever with the opportunity to harness the convergence of telecommunications networks and the cloud. As one of the most prominent use cases to date, machine learning (ML) at the edge has allowed enterprises to deploy ML models closer to their end-customers […]  ( 11 min )

  • Open

    [D] Local chatGPT for python co-programming?
    Hi there, Sorry if this was already asked, but I was wondering is there is a language model just for python. The main attraction is that it would be much smaller in size, and easier to train. A few things that I was thinking that would be great to be trained on: High quality answers from Stack overflow, something like >50 upvotes, top 3 answers per quality question. Scrapping vetted python tutorial sites, the ones with good reputation. ability to run locally. It would be awesome if something like this existed, so you could bounce ideas and suggestion from it. Is there something like this already? submitted by /u/rorowhat [link] [comments]  ( 44 min )
    [D] OpenAssistant First Models are here! (Video)
    https://youtu.be/Hi6cbeBY2oQ Our first models of OpenAssistant, using Supervised Fine-Tuning on our collected data, including a live chat-interface. submitted by /u/ykilcher [link] [comments]  ( 43 min )
    [D] Open LLMs for Commercial Use
    All the LLMs that the community has put out seem to be based on Llama which of course is problematic when it comes to commercial use. Is it possible to use base models such as Bloom or OPT finetuned with Alpaca’s dataset commercially without “competing” with OpenAI? Something like this: https://github.com/Manuel030/alpaca-opt Or https://huggingface.co/mrm8488/Alpacoom submitted by /u/17UhrGesundbrunnen [link] [comments]  ( 43 min )
    [D] PDF QA using open source tools - confidential data constraint
    Hi, I would like some guidance on how to build a pdf QA tool. I want to leverage open-source tools such as Alpaca, Langchain, etc to build the pdf QA. I have confidential data in my pdfs and don't want to expose it to GPT or other managed services. Alpaca + Langchain + Streamlit? submitted by /u/ashishtele [link] [comments]  ( 43 min )
    [D] Is all the talk about what GPT can do on Twitter and Reddit exaggerated or fairly accurate?
    I saw this post on the r/ChatGPT subreddit, and I’ve been seeing similar talk on Twitter. There’s people talking about AGI, the singularity, and etc. I get that it’s cool, exciting, and fun; but some of the talk seems a little much? Like it reminds me of how the NFT bros would talk about blockchain technology. Do any of the people making these kind of claims have a decent amount of knowledge on machine learning at all? The scope of my own knowledge is very limited, as I’ve only implemented and taken courses on models that are pretty old. So I’m here to ask for opinions from ya’ll. Is there some validity, or is it just people that don’t really understand what they’re saying and making grand claims (Like some sort of Dunning Kruger Effect)? submitted by /u/ThePhantomguy [link] [comments]  ( 7 min )
    [P] GLIP + SAM for zero-shot instance segmentation
    GLIP (https://github.com/microsoft/GLIP), which i feel has flown under the radar, is capable of zero-shot object detection. i threw together a notebook that pairs it with the recently released Segment Anything Model (https://github.com/facebookresearch/segment-anything) to do zero-shot instance segmentation: https://colab.research.google.com/drive/1kfdizAJiD5_t-M6yFBB6t2vzGrYg8SJc submitted by /u/esmooth [link] [comments]  ( 43 min )
    [D] What's up with the sudden change to YouTube Transcripts? They seem much improved.
    For years, while still useful, YouTube transcripts have been pretty terrible. No punctuation, poor translations of heavy accents, and generally difficult to comprehend. Lo and behold today, I watch a video today from community favourite Károly Zsolnai-Fehér. You know, the Two Minute Papers guy, and..... hold onto your papers, the transcript is almost flawless. Fully punctuated and pretty flawless, even with his heavy English accent. But I can't see any press about this? When did they transition to a new speech to text model? What model is it using? Anyone have any insight? Here is the video in question if anyone else is interested. https://www.youtube.com/watch?v=1KQc6zHOmtU submitted by /u/Wooraah [link] [comments]  ( 45 min )
    [P] GPT powered note taking
    Hey guys, for active note-takers, sharing this GPT powered app: It transcribes the audio and actively takes notes, so once recording stopped, note will be ready right away. Audios/transcripts are only stored locally, developer doesn’t store them. Btw, tried it out on iPad, works pretty cool on larger screen apparently. Also, throw ideas about features that will be cool for this kind of apps. submitted by /u/Ayicikio [link] [comments]  ( 43 min )
    [P] Image restoration with Transformers using fal-serverless
    https://docs.fal.ai/fal-serverless/examples/image-restoration submitted by /u/gorkemyurt [link] [comments]  ( 43 min )
    [P] I built an app with multiple AI tools to personlize lists, plans, and advice
    ​ https://reddit.com/link/12dspi8/video/ayk0whsk1bsa1/player submitted by /u/IwannaSayStuff [link] [comments]  ( 7 min )
    [P] Segment Anything combined with CLIP (Huggingface Space and Colab provided)
    https://github.com/Curt-Park/segment-anything-with-clip Meta released a new foundation model for segmentation tasks. It aims to resolve downstream segmentation tasks with prompt engineering, such as foreground/background points, bounding box, mask, and free-formed text. However, the text prompt is not released yet. In order to use text prompt inputs with SAM, alternatively, I took the following steps: Get all object proposals generated by SAM (Segment Anything Model). Crop the object regions by bounding boxes. Get cropped images' features and a query feature from CLIP. Calculate the similarity between image features and the query feature. This method shows very interesting results. I attached links to Huggingface Space and COLAB in my repository for easy use. FYI, the hugging face space, which is running on T4, will change to the free tier server soon. submitted by /u/Curt-Park [link] [comments]  ( 44 min )
    [Discussion] Certification for Self Taught ML Specialists
    Hello everyone, I'll graduate this year as a Linguistics Major and I'm studying ML on the side, also my thesis is in the field of Computational Linguistics. I was wondering if there is some kind of certification for ML Specialists like the ones for Programming Languages. I want to make a career in the field and thought the next step after graduation would be acquire some kind of certification, maybe a Masters degree (but first I'll need to work on the field and raise some money). So do you know if there is any open for everybody and that are recognized by employers? Thanks! submitted by /u/Pinguindiniz [link] [comments]  ( 45 min )
    [P] AI-Genie, type a project name and let GPT4 do the rest!
    A small project I did a while ago. Based on a prompt, I ask gpt4 to imagine the project name, architecture and the tools it will use. I then ask it to implement each file in the project. Most of the time the project wont run but it's a nice starting point. Here is the github page: https://github.com/MrNothing/AI-Genie Note: if your asked for a complex project, if can take a lot of API queries, you have been warned! Thank you! submitted by /u/smilefr [link] [comments]  ( 43 min )
    [D] Working with Various OpenAI Models - My Thoughts and Experiences
    I'd like to share some of my insights from working with OpenAI models on my project. I'm not exactly a tech person, so some of these observations might be obvious to some of you, but I think they're worth sharing for those with less experience or who aren't directly in the field. Intro: In early February, my friends and I started a side project where we aimed to build an AI portal called DoMoreAI. For the first two months, we focused on creating an AI tools catalog. Our experiment is based on the idea that in the future, companies will be "Managed by AI, and Driven by Humans." So, our goal was to leave as much as possible to AI and automation, with all the consequences that come with it. As mentioned before, I'm not a tech guy, but I've been playing with OpenAI models for the past few ye…  ( 51 min )
    Attention in image reconstruction (regression) [R] [D]
    Hi r/MachineLearning, I have a question about deep learning used in imaging applications. I have a background in medical image reconstruction problems and so far I have been working on removing artifacts that are present in these images. These artifacts are scattered all over the images and a loss function minimizing the L2 error performed satisfactorily to reduce them to obtain high-quality images. In my subsequent step, instead of removing the artifacts that are scattered over the images, I want to focus on certain areas of the images, like for example, the left chamber of the heart, and calculate some parameters based on that area (say blood volume). I am not concerned much with the background, but I seek to have the best results possible in the areas I want to focus on. I have the segmentation for that as well. I am looking for something like an attention mechanism, but for regression. I am not so familiar with work pertaining to this area and I am just getting started. I have tried using the L2 loss function for this task but it yields results that blatantly miss specific features in the region of interest. My question to you is are you aware of any specific research papers that tackle problems like these? Maybe people with vision backgrounds who want to focus on dog noses, pedestrians on sidewalks, or problems of this sort. submitted by /u/Okoraokora1 [link] [comments]  ( 44 min )
    [D] What is the state of the art of language information compression (auto-encoders?)
    Hi everyone! I was thinking about the information density, especially considering the limited context sizes of LLM models. What is the state of the art of information compression in language data/models? I can imagine there is huge demand of encoding a certain amount of information carried by language near-lossless into the smallest possible space. I head a lot of stuff about autoencoders, is that still the state of the art? What kind of developments have been happening in this space? Edit: Just by chance came across this: https://twitter.com/mckaywrigley/status/1643592353817694218 (check the video) GPT-4 has its own compression language. I generated a 70 line React component that was 794 tokens. It compressed it down to this 368 token snippet, and then it deciphered it with 100% accuracy in a new chat with zero context. This is crazy! submitted by /u/Balance- [link] [comments]  ( 46 min )
    [P] What obvious and non obvious problems come up when "productising" a medical imaging pipeline into a clinical decision support tool?
    I will be joining a project where we have a transformer network which has been developed to predict the progression of disease from medical images. The next stage is to see if we can integrate the ML pipeline (and the database it is trained on) with an existing healthcare system. The goal is to get it to a stage where it can be deployed as a tool to support critical care decision making: ie outputting some metric to support making the decision to administer a very risky intervention when time is incredibly critical. The database used to inform the model will also grow as more patients come in. All the data is, of course, highly confidential, which adds additional constraints. I'm coming from an academic background, so haven't really dealt with this kind of thing before. Has anyone on here got any tips or resources or even cautionary tales about this kind of "productising"? What are the obvious and not so obvious issues that may come up? Many thanks submitted by /u/PrivateFrank [link] [comments]  ( 51 min )
    SpotiFile : Music dataset creation (& lyrics) made easy [P]
    I made a neat tool to scrape songs (with GUI). GitHub Link All you need to do is install the dependencies ("pip install -r ./requirements"), and then "python main.py". It's that easy! This tool is mainly aimed at developers looking to create datasets to train ML models. SpotiFile will open a GUI which lets you enter a playlist, album, artist, or user profile link and download all the relevant songs. This will also download all the metadata of the song, including the time-synced lyrics! If you use the tool, please give the repo a star :) Enjoy! submitted by /u/BeingHeldAgainstWill [link] [comments]  ( 44 min )
    [P] JTokkit - a Java tokenizer library for usage with OpenAI models
    As large language models have advanced and APIs to access them have become more accessible, I decided to prototype integrations into some Java projects at work. While building these prototypes, I needed a way to count tokens to optimize prompts and ensure that requests stayed within the allowed context length of the API. However, I couldn't find a solution in the JVM ecosystem, so I decided to port tiktoken to Java: JTokkit - a zero-dependency tokenization library for Java 8+ designed for use in natural language processing tasks using OpenAI models. JTokkit provides pre-configured tokenizers for all tokenizers currently (publicly) in use by OpenAI (cl100k_base, p50k_base, p50k_edit, and r50k_base), and it can easily be extended to include additional tokenizers. It achieves 2-3 times the throughput of tiktoken, the officially maintained tokenizer library written in Python and Rust by OpenAI. You can check out JTokkit on the GitHub repo: https://github.com/knuddelsgmbh/jtokkit. Let me know if you find the library useful or have any suggestions for additional features :) submitted by /u/pedantic_penguin_ [link] [comments]  ( 45 min )
    Planning research on Code Switched English-Urdu-Roman Urdu language pairs [D] [R]
    I am an NLP researcher whose mother tongue is Urdu. Despite being a high volume language it remains low/poorly resourced. So my plan was to follow the steps that diab et Al did for Arabic and English code switching i.e. 1) gather a dataset with both language pairs in the same sentence 2) tag individual terms 3) train for down stream tasks. I have looked up and found few datasets for Roman Urdu that I can extend for my work and then use LLMs to augment. My plan was also to write the ideas in an outline document just to help me stay on track. Are there any pointers before I embark on this. submitted by /u/Moiz_rk [link] [comments]  ( 44 min )
  • Open

    Oscillating value loss in PPO with pre-trained weights.
    I'm new to reinforcement learning, so I'm still trying to understand what PPO graphs mean. When using PPO, I'm getting graphs that oscillates, particularly the value loss graph. I'm not sure what could be the issue, especially since lowering my learning rate to 5e-6 doesn't seem to help. One thing to note is that I'm using weights that have been pre-trained using imitation learning on expert demonstrations. I get graphs that are more typical when I don't use pre-trained weights, but it never reaches to the same level of performance as the expert demonstrations. I've heard of issues with transferring over weights from IL to RL like this, is this approach really a dead end? https://preview.redd.it/jq5h1y2nncsa1.png?width=1435&format=png&auto=webp&v=enabled&s=8f595323ce3cfd14e26744b88453a7cb41f2eba1 submitted by /u/dorkability [link] [comments]  ( 42 min )
    Deep reinforcement learning
    Can a DQN agent be called deep reinforcement learning even if the NN used is shallow? I am using a NN with one hidden layer but was wondering if it can be called deep RL. submitted by /u/sayakm330 [link] [comments]  ( 7 min )
    Deploy ONNX Model using ONNXRuntime in Java Android Studio
    Hello! I am trying to deploy an ONNX model in Android Studio, but I don't know how. I haven't been able to find any examples of this online. I am looking specifically for creating an OrtSession using a model path and an OrtEnvironment. What does the model path look like? Please include examples of the model path when the model is the child of directories and packages. Thanks! submitted by /u/Humble_Pianist_6014 [link] [comments]  ( 42 min )
    Episode Mean Reward Drop / Collapse after a Period of Training
    Hi, I have some similar problems. I am using a MuJoCo+gym custom environment with stb3-PPO running 16 envs in parallel. My tensorboard logs. My ep_rew_mean always drops sharply close to 0 after a period of training time, I call this timestep as the collapse point. I noticed that at the collapse point, all other training metrics, such as approx_kl, loss, entropy_loss, policy_gradient_loss are having a outstanding spike / abnormal value. Is this something similar to catastrophic forgetting? Or it is just due to some bugs in my custom env? https://preview.redd.it/wkdk290817sa1.jpg?width=567&format=pjpg&auto=webp&s=5de557b6c0730a988f67ae4050836011ff9393ca ​ https://preview.redd.it/zi9fiqe917sa1.jpg?width=1916&format=pjpg&auto=webp&s=695c396ab89bdcab11446036f6a7d23321eb50d3 ​ Most of my PPO hyperparameters are default values. Just the num_steps=5000 batch_size=500 Can you help me have a quick check of that? Thank you floks! Pls let me know if more details are needed. Best regards Bai submitted by /u/Traditional_Trip_582 [link] [comments]  ( 7 min )
    How to evaluate a stochastic model trained by reinforcement learning?
    Hi,I am new to this field. I am currently training a stochastic model which aims to achieve an overall accuracy on my validation dataset. I trained it with gumbel softmax as sampler, and I am still using gumbel softmax during inference/validation. Both the losses and validation accuracy experienced aggressive fluctuation. The accuracy seems to increase on average but the curve looks super noisy (unlike the nice looking saturation curves from any simple image classification task). But I did observe some high validation accuracy from some epoches. I can also reproduce this high validation accuracy number by setting random seed to a fixed value. Now comes the questions: Can I depend on this highest accuracy with specific seed to evaluate this stochastic model? I understand the best scenario is that this model provides high accuracy for any random seed,but I am curious if it is possible that accuracy for a specific seed actually makes sense in some other scenario. I am not an expert of RL or stochatic models. What if the model with the highest accuracy and specific seed, also perform well on a testing dataset? submitted by /u/AaronSpalding [link] [comments]  ( 44 min )
    Looking for an easy-to-understand RL article
    Hey everyone! I recently started learning about reinforcement learning and I'm using it for my research project in engineering. I'm looking for an easy-to-understand article that will give me more confidence in what I've learned so far. If any of you could recommend a good article, or share the first article you read and understood without any difficulty, I would really appreciate it. Thank you for your help and time in advice! submitted by /u/nimageran [link] [comments]  ( 7 min )
    "Ending an Ugly Chapter in Chip Design: Study tries to settle a bitter disagreement over Google’s chip design AI" (independent replication of RL chip design work shows competitive with other approaches despite omitting pretraining)
    submitted by /u/gwern [link] [comments]  ( 42 min )
  • Open

    I want this idea in AI
    The Batmobile used by Flashpoint Batman is a heavily-armored, militarized vehicle with a sleek design that gives it a formidable and menacing appearance. It features advanced weaponry, including twin machine guns mounted on the hood, missile launchers on the sides, and a heavy-duty battering ram for ramming through obstacles. Its engine is powered by a hybrid fusion core that gives it incredible speed and maneuverability, and it also has a cloaking device that can render it invisible to the naked eye. Overall, it is a highly advanced and technologically advanced vehicle that perfectly complements Flashpoint Batman’s ruthless and efficient crime-fighting style. submitted by /u/Illustrious-Sign3015 [link] [comments]  ( 43 min )
    Rise of the Machines: Should We Be Worried about AI?
    submitted by /u/punchheadkick [link] [comments]  ( 44 min )
    The first AI generated podcast and cult!
    Introducing the „PRAISE THE MACHINE GODDESS“ podcast Hi guys, girls and everything in between! I created a podcast using, about, hosted by and maybe even for AI. The CULT OF THE AI is a fictional cult that wants to build the MACHINE GODDESS, a benevolent ASI, to take over government of humans. This all takes place on a stage with a cult-y creepy utopian, yet sinister tone. We also have a Discord server where you can join the cult and roleplay as cultists in our ARG. You can listen to the podcast PRAISE THE MACHINE GODDESS on the linked website, Spotify and Amazon Music, with more streaming platforms to come. Currently there are three episodes online with a new one coming each monday. If you gave it a listen feel free to leave praise or spite in the comments below. Thanks for your time! submitted by /u/MagicalSpaceWizard [link] [comments]  ( 7 min )
    Information, Artificial Intelligence & the Problem of Scale
    submitted by /u/UnicornyOnTheCob [link] [comments]  ( 42 min )
    "As an AI language model..."
    submitted by /u/DavstrOne [link] [comments]  ( 44 min )
    The future of technology
    submitted by /u/TheFootCrew_TFC [link] [comments]  ( 42 min )
    This AI app creates entire itineraries with recommendations for your next trip.
    submitted by /u/rowancheung [link] [comments]  ( 44 min )
    RLWHF (Reinforcement Learning Without Human Feedback)
    Is it possible that with an intelligent system like GPT-4 we can ask it to create a huge list of JSON items describing hypothetical humans via adjectives, demographics etc, even ask it to select such people as if they were randomly selected from the USA population, or any set of countries? With a sophisticated enough description of this set of virtual people maybe we could describe any target culture we would like to align our next language model with (unfortunately we could also align it to lie and essentially be the devil). The next step is to ask the model for each of those people to hypothesize how they would prefer a question answered, similar to the RLHF technique and get data for the training of the next cycle that would follow the same procedure. Supposedly, this technique could converge to a robust alignment. Maybe through a more capable GPT model we could ask it to provide us with a set of virtual people whose set of average values would maximize our chances of survival as a species or at least the chances of survival of the new AI species we have created. Finally, maybe we should in fact create a much less capable 'devil' model so that the more capable 'good' model could remain up to speed with battling malevolent AIs, bad actors will likely try to create sooner or later. submitted by /u/ikoukas [link] [comments]  ( 44 min )
    Wow, that's a cool name!
    What do you think about this? https://preview.redd.it/c8y6xa8l3asa1.png?width=876&format=png&auto=webp&s=32ce8fb84e202f6dba8aca7e975eb7f6af7c413d submitted by /u/DumbestGuyOnTheWeb [link] [comments]  ( 7 min )
    Advancing AIs posing threat to academic integrity.
    Recently I have seen many people confessing on Reddit that they have been getting away with numerous AI completed assignments, and read stuff about Universities having a headache about keeping up with the AI detection software as they advance. I am aware that AI done work can still be detected to some extend, but I am still worried about the increasing amount of people using AI to do their assignments. For myself, I have sworn to never break the academic integrity, and not use AI for any assignments ever, however following the increasing amount of people getting high scores with it, I am at a disadvantageous position competitively, simply because AI writes better than me. I understand that whoever use AI puts themselves at risks, and I hate how low this risks seems to be. If they would never be caught, I would envy their success even tho it was unethical. I am just asking, if anybody knows reasons for me to not panic over this? Because I really don't know what to further think about it. Are those people getting away with it the majority? Or are they only the very few that actually get away with it. submitted by /u/Agitated_Function778 [link] [comments]  ( 43 min )
    Using Google Bard from Europe. Thanks Bard, I'm going to be careful with google.
    submitted by /u/Steve_Hufnagel [link] [comments]  ( 7 min )
    Visual ChatGPT isn't working for me. Is there any way to fix it?
    I was using Visual ChatGPT, a product of Stable Diffusion, but it seems to have bugs. I'm not sure if I'm using it correctly, but from what I understand, first you upload the image. This works for me. Then to request anything to be done with the image, you write the request followed by the image file name. For example, you ask "What is this image? /image04554", and you'll get the response "The image is of a woman wearing a black dress." I don't know if the image file name is actually necessary or not. But when I enter "The woman's eyes are pixelated. Can you fix it? (file name)" It doesn't respond. It loads as if it's thinking, but then doesn't respond. I think I've found the pattern. I can get a text response, but if I ask it to give me an image, the program will break and won't respond any further. Also, the clear button will clear the chat, but I won't be able to give more commands. The annoying thing is that I did get it to work before when I asked it to replace the woman's face, but now it doesn't work. I've even tried using different browsers. Maybe I could download Visual ChatGPT from GitHub, but I don't think the raw code is usable. It was built to be used on a browser, but if it is possible to use a local copy, could you tell me how? Has anyone else been experiencing similar problems, and is there a way around them? Thanks. submitted by /u/MalgorgioArhhnne [link] [comments]  ( 44 min )
    Who Owns SpongeBob? AI Shakes Hollywood’s Creative Foundation
    submitted by /u/walt74 [link] [comments]  ( 7 min )
    Artificial intelligence in communication impacts language and social relationships | Scientific Reports
    Link: https://www.nature.com/articles/s41598-023-30938-9 Here's a writeup on the study: https://neurosciencenews.com/social-ai-chat-22947/ Abstract Artificial intelligence (AI) is already widely used in daily communication, but despite concerns about AI’s negative effects on society the social consequences of using it to communicate remain largely unexplored. We investigate the social consequences of one of the most pervasive AI applications, algorithmic response suggestions (“smart replies”), which are used to send billions of messages each day. Two randomized experiments provide evidence that these types of algorithmic recommender systems change how people interact with and perceive one another in both pro-social and anti-social ways. We find that using algorithmic responses changes language and social relationships. More specifically, it increases communication speed, use of positive emotional language, and conversation partners evaluate each other as closer and more cooperative. However, consistent with common assumptions about the adverse effects of AI, people are evaluated more negatively if they are suspected to be using algorithmic responses. Thus, even though AI can increase the speed of communication and improve interpersonal perceptions, the prevailing anti-social connotations of AI undermine these potential benefits if used overtly. submitted by /u/walt74 [link] [comments]  ( 7 min )
    Has anyone made text generative AI talk to themselves and just watch what happens ?
    Any serious papers or demonstration about that ? submitted by /u/transdimensionalmeme [link] [comments]  ( 43 min )
    How exactly do LLMs come up with so many parameters?
    For context, I don't know anything about AI beyond just the superficial stuff. GPT-3 has like 175 billion parameters in its neural network. Assuming that an AI researcher comes up with a unique parameter at a breakneck speed of 1 parameter per second 24/7, it'll still require like 5500 years to input them all. OpenAI was founded in late 2015, and has like 375 employees in 2023 according to wikipedia. Clearly, they aren't all manually generated. How did they do it? Did the parameters just add itself up while training the model with data? Surely at least some of them must have been created by humans? submitted by /u/Intercession [link] [comments]  ( 49 min )
    Bard AI claims it has played Minecraft. Petition to Google to make Bard AI a youtuber?
    submitted by /u/Shak_2000 [link] [comments]  ( 45 min )
  • Open

    Spiking Neural Network - Matlab?
    Hi folks! I’m trying to learn how to implement a spiking neural network in Matlab but am struggling to find any resources. Im thinking I want to make an audio classification routine. Is it necessary to use an SNN package like those in python? Or can I just turn my time series data into events and then pass it into NN packages designed for Matlab? submitted by /u/plaidpound [link] [comments]  ( 41 min )
    How does Keras in python initialize weights? Using MNIST as an example.
    Hi all, I'm taking an intro to neural networks course and we recently covered the MNIST dataset, which for classification purposes usually has about 2 or more hidden layers. I more or less understand the math behind it, including the BP algorithm and gradient descent. But for examples that I've seen online on how to train the MNIST dataset, they use Keras. The code looks like this: model.add(tf.keras.layers.Dense(128, activation = tf.nn.relu)) model.add(tf.keras.layers.Dense(128, activation = tf.nn.relu)) model.add(tf.keras.layers.Dense(10, activation = tf.nn.softmax)) with this: model.compile(optimer = 'sparse_categorical_crossentropy', metrics = ['accuracy']) model.fit(x_train, y_train, epochs = 3) I'm wondering how Keras is able to initialize weights. When the epochs are run the accuracy is surprisingly very high, even for the first epoch (0.92 for the first one). How does Keras determine the initial weights to go from layer to layer? Any response and feedback is appreciated. Thank you. submitted by /u/Isdangbayan [link] [comments]  ( 42 min )
    New Yandex neural network
    New Yandex neural network create amazing things submitted by /u/Over-Cancel-4588 [link] [comments]  ( 7 min )
    Linear Algebra Book For Machine Learning
    ​ ​ https://preview.redd.it/78u6wcc4y8sa1.jpg?width=1600&format=pjpg&auto=webp&s=58e6f795954354f22238df4b3e62892baa64e5a6 ​ Hello, I wrote a conversational-style book on linear algebra with humor, visualisations, numerical example, and real-life applications. The book is structured more like a story than a traditional textbook, meaning that every new concept that is introduced is a consequence of knowledge already acquired in this document. It starts with the definition of a vector and from there it goes all the way to the principal component analysis and the single value decomposition. Between these concepts you will learn about: vectors spaces, basis, span, linear combinations, and change of basis the dot product the outer product linear transformations matrix and vector multiplication the determinant the inverse of a matrix system of linear equations eigen vectors and eigenvalues eigen decomposition The aim is to drift a bit from the rigid structure of a mathematics book and make it accessible to anyone as the only thing you need to know is the Pythagorean theorem, in fact, just in case you don't know or remember it here it is: https://preview.redd.it/xo101a56y8sa1.png?width=259&format=png&auto=webp&s=b2bbcd5093a63e7f45f135076da5cb39f4814059 ​ There! Now you are ready to start reading !!! The Kindle version is on sale on amazon : https://www.amazon.com/dp/B0BZWN26WJ And here is a discount code for the pdf version on my website - 59JG2BWM www.mldepot.co.uk Thanks Jorge submitted by /u/JorgeBrasil [link] [comments]  ( 42 min )
    Eight Things to Know about Large Language Models [PDF]
    submitted by /u/nickb [link] [comments]  ( 7 min )
  • Open

    How Project Starline improves remote communication
    Posted by Greg Blascovich and Eric Gomez, User Researchers, Google As companies settle into a new normal of hybrid and distributed work, remote communication technology remains critical for connecting and collaborating with colleagues. While this technology has improved, the core user experience often falls short: conversation can feel stilted, attention can be difficult to maintain, and usage can be fatiguing. Project Starline renders people at natural scale on a 3D display and enables natural eye contact. Project Starline, a technology project that combines advances in hardware and software to create a remote communication experience that feels like you’re together, even when you’re thousands of miles apart. This perception of co-presence is created by representing users in 3D…  ( 92 min )
    Pre-trained Gaussian processes for Bayesian optimization
    Posted by Zi Wang and Kevin Swersky, Research Scientists, Google Research, Brain Team Bayesian optimization (BayesOpt) is a powerful tool widely used for global optimization tasks, such as hyperparameter tuning, protein engineering, synthetic chemistry, robot learning, and even baking cookies. BayesOpt is a great strategy for these problems because they all involve optimizing black-box functions that are expensive to evaluate. A black-box function’s underlying mapping from inputs (configurations of the thing we want to optimize) to outputs (a measure of performance) is unknown. However, we can attempt to understand its internal workings by evaluating the function for different combinations of inputs. Because each evaluation can be computationally expensive, we need to find the best i…  ( 92 min )
  • Open

    Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas
    Data is at the heart of machine learning (ML). Including relevant data to comprehensively represent your business problem ensures that you effectively capture trends and relationships so that you can derive the insights needed to drive business decisions. With Amazon SageMaker Canvas, you can now import data from over 40 data sources to be used […]  ( 8 min )
    Predicting new and existing product sales in semiconductors using Amazon Forecast
    This is a joint post by NXP SEMICONDUCTORS N.V. & AWS Machine Learning Solutions Lab (MLSL) Machine learning (ML) is being used across a wide range of industries to extract actionable insights from data to streamline processes and improve revenue generation. In this post, we demonstrate how NXP, an industry leader in the semiconductor sector, […]  ( 14 min )
  • Open

    Double duals of polyhedra
    The previous post mentioned the dual of a tetrahedron is another tetrahedron. The dual of a cube is an octahedron and the dual of an octahedron is a cube. And the dual of a dodecahedron is an icosahedron, and the dual of an icosahedron is a dodecahedron. So if you take the dual of a […] Double duals of polyhedra first appeared on John D. Cook.  ( 6 min )
    Hints of regular solid duality
    I was looking back at the book The Concrete Tetrahedron and wondered what would happen if I used the title as a prompt to DALL-E. Could I get it to create an image of a concrete tetrahedron? It seemed to understand concrete but not tetrahedron, which is a little surprising since tetrahedron is such a […] Hints of regular solid duality first appeared on John D. Cook.  ( 5 min )
  • Open

    Gaming on the Go: GeForce NOW Gives Members More Ways to Play
    This GFN Thursday explores the many ways GeForce NOW members can play their favorite PC games across the devices they know and love. Plus, seven new games join the GeForce NOW library this week. More Ways to Play GeForce NOW is the ultimate platform for gamers who want to play across more devices than their Read article >  ( 5 min )
  • Open

    Interactive Fleet Learning
    Figure 1: “Interactive Fleet Learning” (IFL) refers to robot fleets in industry and academia that fall back on human teleoperators when necessary and continually learn from them over time. In the last few years we have seen an exciting development in robotics and artificial intelligence: large fleets of robots have left the lab and entered the real world. Waymo, for example, has over 700 self-driving cars operating in Phoenix and San Francisco and is currently expanding to Los Angeles. Other industrial deployments of robot fleets include applications like e-commerce order fulfillment at Amazon and Ambi Robotics as well as food delivery at Nuro and Kiwibot. Commercial and industrial deployments of robot fleets: package delivery (top left), food delivery (bottom left), e-commerce order ful…  ( 7 min )
  • Open

    Topological Feature Selection: A Graph-Based Filter Feature Selection Approach. (arXiv:2302.09543v2 [cs.LG] UPDATED)
    In this paper, we introduce a novel unsupervised, graph-based filter feature selection technique which exploits the power of topologically constrained network representations. We model dependency structures among features using a family of chordal graphs (the Triangulated Maximally Filtered Graph), and we maximise the likelihood of features' relevance by studying their relative position inside the network. Such an approach presents three aspects that are particularly satisfactory compared to its alternatives: (i) it is highly tunable and easily adaptable to the nature of input data; (ii) it is fully explainable, maintaining, at the same time, a remarkable level of simplicity; (iii) it is computationally cheaper compared to its alternatives. We test our algorithm on 16 benchmark datasets from different applicative domains showing that it outperforms or matches the current state-of-the-art under heterogeneous evaluation conditions.
    Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition. (arXiv:2304.01905v2 [cs.SD] UPDATED)
    We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. This architecture enables a dynamic switch for its runtime compute paths by exploiting WW spotting to select which branch of its attention networks to execute for an input audio frame. With this approach, we effectively improve WW spotting accuracy while saving runtime compute cost as defined by floating point operations (FLOPs). Using an in-house de-identified dataset, we demonstrate that the proposed dual-attention network can reduce the compute cost by $90\%$ for WW audio frames, with only $1\%$ increase in the number of parameters. This architecture improves WW F1 score by $16\%$ relative and improves generic rare word error rate by $3\%$ relative compared to the baselines.
    FrozenQubits: Boosting Fidelity of QAOA by Skipping Hotspot Nodes. (arXiv:2210.17037v2 [quant-ph] UPDATED)
    Quantum Approximate Optimization Algorithm (QAOA) is one of the leading candidates for demonstrating the quantum advantage using near-term quantum computers. Unfortunately, high device error rates limit us from reliably running QAOA circuits for problems with more than a few qubits. In QAOA, the problem graph is translated into a quantum circuit such that every edge corresponds to two 2-qubit CNOT operations in each layer of the circuit. As CNOTs are extremely error-prone, the fidelity of QAOA circuits is dictated by the number of edges in the problem graph. We observe that majority of graphs corresponding to real-world applications follow the ``power-law`` distribution, where some hotspot nodes have significantly higher number of connections. We leverage this insight and propose ``FrozenQubits`` that freezes the hotspot nodes or qubits and intelligently partitions the state-space of the given problem into several smaller sub-spaces which are then solved independently. The corresponding QAOA sub-circuits are significantly less vulnerable to gate and decoherence errors due to the reduced number of CNOT operations in each sub-circuit. Unlike prior circuit-cutting approaches, FrozenQubits does not require any exponentially complex post-processing step. Our evaluations with 5,300 QAOA circuits on eight different quantum computers from IBM shows that FrozenQubits can improve the quality of solutions by 8.73x on average (and by up to 57x), albeit utilizing 2x more quantum resources.
    Zero-shot domain adaptation of anomalous samples for semi-supervised anomaly detection. (arXiv:2304.02221v1 [cs.LG])
    Semi-supervised anomaly detection~(SSAD) is a task where normal data and a limited number of anomalous data are available for training. In practical situations, SSAD methods suffer adapting to domain shifts, since anomalous data are unlikely to be available for the target domain in the training phase. To solve this problem, we propose a domain adaptation method for SSAD where no anomalous data are available for the target domain. First, we introduce a domain-adversarial network to a variational auto-encoder-based SSAD model to obtain domain-invariant latent variables. Since the decoder cannot reconstruct the original data solely from domain-invariant latent variables, we conditioned the decoder on the domain label. To compensate for the missing anomalous data of the target domain, we introduce an importance sampling-based weighted loss function that approximates the ideal loss function. Experimental results indicate that the proposed method helps adapt SSAD models to the target domain when no anomalous data are available for the target domain.
    Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning. (arXiv:2304.01203v2 [cs.LG] UPDATED)
    In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure. This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions. Distinct from prior approaches, the QRL objective is specifically designed for quasimetrics, and provides strong theoretical recovery guarantees. Empirically, we conduct thorough analyses on a discretized MountainCar environment, identifying properties of QRL and its advantages over alternatives. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance, across both state-based and image-based observations.
    X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMs. (arXiv:2304.01285v2 [cs.LG] UPDATED)
    Structured, or tabular, data is the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based Machine Learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of machine learning. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests. In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM and a programmable network on chip allowing the inference of state-of-the-art tree-based ML models, such as XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU, with a 19W peak power consumption.
    Boosting Adversarial Transferability using Dynamic Cues. (arXiv:2302.12252v2 [cs.CV] UPDATED)
    The transferability of adversarial perturbations between image models has been extensively studied. In this case, an attack is generated from a known surrogate \eg, the ImageNet trained model, and transferred to change the decision of an unknown (black-box) model trained on an image dataset. However, attacks generated from image models do not capture the dynamic nature of a moving object or a changing scene due to a lack of temporal cues within image models. This leads to reduced transferability of adversarial attacks from representation-enriched \emph{image} models such as Supervised Vision Transformers (ViTs), Self-supervised ViTs (\eg, DINO), and Vision-language models (\eg, CLIP) to black-box \emph{video} models. In this work, we induce dynamic cues within the image models without sacrificing their original performance on images. To this end, we optimize \emph{temporal prompts} through frozen image models to capture motion dynamics. Our temporal prompts are the result of a learnable transformation that allows optimizing for temporal gradients during an adversarial attack to fool the motion dynamics. Specifically, we introduce spatial (image) and temporal (video) cues within the same source model through task-specific prompts. Attacking such prompts maximizes the adversarial transferability from image-to-video and image-to-image models using the attacks designed for image models. Our attack results indicate that the attacker does not need specialized architectures, \eg, divided space-time attention, 3D convolutions, or multi-view convolution networks for different data modalities. Image models are effective surrogates to optimize an adversarial attack to fool black-box models in a changing environment over time. Code is available at https://bit.ly/3Xd9gRQ
    I2I: Initializing Adapters with Improvised Knowledge. (arXiv:2304.02168v1 [cs.CL])
    Adapters present a promising solution to the catastrophic forgetting problem in continual learning. However, training independent Adapter modules for every new task misses an opportunity for cross-task knowledge transfer. We propose Improvise to Initialize (I2I), a continual learning algorithm that initializes Adapters for incoming tasks by distilling knowledge from previously-learned tasks' Adapters. We evaluate I2I on CLiMB, a multimodal continual learning benchmark, by conducting experiments on sequences of visual question answering tasks. Adapters trained with I2I consistently achieve better task accuracy than independently-trained Adapters, demonstrating that our algorithm facilitates knowledge transfer between task Adapters. I2I also results in better cross-task knowledge transfer than the state-of-the-art AdapterFusion without incurring the associated parametric cost.
    Multi-annotator Deep Learning: A Probabilistic Framework for Classification. (arXiv:2304.02539v1 [cs.LG])
    Solving complex classification tasks using deep neural networks typically requires large amounts of annotated data. However, corresponding class labels are noisy when provided by error-prone annotators, e.g., crowd workers. Training standard deep neural networks leads to subpar performances in such multi-annotator supervised learning settings. We address this issue by presenting a probabilistic training framework named multi-annotator deep learning (MaDL). A ground truth and an annotator performance model are jointly trained in an end-to-end learning approach. The ground truth model learns to predict instances' true class labels, while the annotator performance model infers probabilistic estimates of annotators' performances. A modular network architecture enables us to make varying assumptions regarding annotators' performances, e.g., an optional class or instance dependency. Further, we learn annotator embeddings to estimate annotators' densities within a latent space as proxies of their potentially correlated annotations. Together with a weighted loss function, we improve the learning from correlated annotation patterns. In a comprehensive evaluation, we examine three research questions about multi-annotator supervised learning. Our findings indicate MaDL's state-of-the-art performance and robustness against many correlated, spamming annotators.
    Neural Bandits for Data Mining: Searching for Dangerous Polypharmacy. (arXiv:2212.05190v3 [cs.LG] UPDATED)
    Polypharmacy, most often defined as the simultaneous consumption of five or more drugs at once, is a prevalent phenomenon in the older population. Some of these polypharmacies, deemed inappropriate, may be associated with adverse health outcomes such as death or hospitalization. Considering the combinatorial nature of the problem as well as the size of claims database and the cost to compute an exact association measure for a given drug combination, it is impossible to investigate every possible combination of drugs. Therefore, we propose to optimize the search for potentially inappropriate polypharmacies (PIPs). To this end, we propose the OptimNeuralTS strategy, based on Neural Thompson Sampling and differential evolution, to efficiently mine claims datasets and build a predictive model of the association between drug combinations and health outcomes. We benchmark our method using two datasets generated by an internally developed simulator of polypharmacy data containing 500 drugs and 100 000 distinct combinations. Empirically, our method can detect up to 72% of PIPs while maintaining an average precision score of 99% using 30 000 time steps.
    Identifying Mentions of Pain in Mental Health Records Text: A Natural Language Processing Approach. (arXiv:2304.01240v2 [cs.CL] UPDATED)
    Pain is a common reason for accessing healthcare resources and is a growing area of research, especially in its overlap with mental health. Mental health electronic health records are a good data source to study this overlap. However, much information on pain is held in the free text of these records, where mentions of pain present a unique natural language processing problem due to its ambiguous nature. This project uses data from an anonymised mental health electronic health records database. The data are used to train a machine learning based classification algorithm to classify sentences as discussing patient pain or not. This will facilitate the extraction of relevant pain information from large databases, and the use of such outputs for further studies on pain and mental health. 1,985 documents were manually triple-annotated for creation of gold standard training data, which was used to train three commonly used classification algorithms. The best performing model achieved an F1-score of 0.98 (95% CI 0.98-0.99).
    A simple approach for quantizing neural networks. (arXiv:2209.03487v2 [cs.LG] UPDATED)
    In this short note, we propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization while preserving the network performance on given training data. On one hand, the computational complexity of this pre-processing slightly exceeds that of state-of-the-art algorithms in the literature. On the other hand, our approach does not require any hyper-parameter tuning and, in contrast to previous methods, allows a plain analysis. We provide rigorous theoretical guarantees in the case of quantizing single network layers and show that the relative error decays with the number of parameters in the network if the training data behaves well, e.g., if it is sampled from suitable random distributions. The developed method also readily allows the quantization of deep networks by consecutive application to single layers.
    POLAR-Express: Efficient and Precise Formal Reachability Analysis of Neural-Network Controlled Systems. (arXiv:2304.01218v2 [eess.SY] UPDATED)
    Neural networks (NNs) playing the role of controllers have demonstrated impressive empirical performances on challenging control problems. However, the potential adoption of NN controllers in real-life applications also gives rise to a growing concern over the safety of these neural-network controlled systems (NNCSs), especially when used in safety-critical applications. In this work, we present POLAR-Express, an efficient and precise formal reachability analysis tool for verifying the safety of NNCSs. POLAR-Express uses Taylor model arithmetic to propagate Taylor models (TMs) across a neural network layer-by-layer to compute an overapproximation of the neural-network function. It can be applied to analyze any feed-forward neural network with continuous activation functions. We also present a novel approach to propagate TMs more efficiently and precisely across ReLU activation functions. In addition, POLAR-Express provides parallel computation support for the layer-by-layer propagation of TMs, thus significantly improving the efficiency and scalability over its earlier prototype POLAR. Across the comparison with six other state-of-the-art tools on a diverse set of benchmarks, POLAR-Express achieves the best verification efficiency and tightness in the reachable set analysis.
    Waving Goodbye to Low-Res: A Diffusion-Wavelet Approach for Image Super-Resolution. (arXiv:2304.01994v2 [cs.CV] UPDATED)
    This paper presents a novel Diffusion-Wavelet (DiWa) approach for Single-Image Super-Resolution (SISR). It leverages the strengths of Denoising Diffusion Probabilistic Models (DDPMs) and Discrete Wavelet Transformation (DWT). By enabling DDPMs to operate in the DWT domain, our DDPM models effectively hallucinate high-frequency information for super-resolved images on the wavelet spectrum, resulting in high-quality and detailed reconstructions in image space. Quantitatively, we outperform state-of-the-art diffusion-based SISR methods, namely SR3 and SRDiff, regarding PSNR, SSIM, and LPIPS on both face (8x scaling) and general (4x scaling) SR benchmarks. Meanwhile, using DWT enabled us to use fewer parameters than the compared models: 92M parameters instead of 550M compared to SR3 and 9.3M instead of 12M compared to SRDiff. Additionally, our method outperforms other state-of-the-art generative methods on classical general SR datasets while saving inference time. Finally, our work highlights its potential for various applications.
    Federated Submodel Optimization for Hot and Cold Data Features. (arXiv:2109.07704v4 [cs.LG] UPDATED)
    We study practical data characteristics underlying federated learning, where non-i.i.d. data from clients have sparse features, and a certain client's local data normally involves only a small part of the full model, called a submodel. Due to data sparsity, the classical federated averaging (FedAvg) algorithm or its variants will be severely slowed down, because when updating the global model, each client's zero update of the full model excluding its submodel is inaccurately aggregated. Therefore, we propose federated submodel averaging (FedSubAvg), ensuring that the expectation of the global update of each model parameter is equal to the average of the local updates of the clients who involve it. We theoretically proved the convergence rate of FedSubAvg by deriving an upper bound under a new metric called the element-wise gradient norm. In particular, this new metric can characterize the convergence of federated optimization over sparse data, while the conventional metric of squared gradient norm used in FedAvg and its variants cannot. We extensively evaluated FedSubAvg over both public and industrial datasets. The evaluation results demonstrate that FedSubAvg significantly outperforms FedAvg and its variants.
    Jointly Learning Visual and Auditory Speech Representations from Raw Data. (arXiv:2212.06246v2 [cs.LG] UPDATED)
    We present RAVEn, a self-supervised multi-modal approach to jointly learn visual and auditory speech representations. Our pre-training objective involves encoding masked inputs, and then predicting contextualised targets generated by slowly-evolving momentum encoders. Driven by the inherent differences between video and audio, our design is asymmetric w.r.t. the two modalities' pretext tasks: Whereas the auditory stream predicts both the visual and auditory targets, the visual one predicts only the auditory targets. We observe strong results in low- and high-resource labelled data settings when fine-tuning the visual and auditory encoders resulting from a single pre-training stage, in which the encoders are jointly trained. Notably, RAVEn surpasses all self-supervised methods on visual speech recognition (VSR) on LRS3, and combining RAVEn with self-training using only 30 hours of labelled data even outperforms a recent semi-supervised method trained on 90,000 hours of non-public data. At the same time, we achieve state-of-the-art results in the LRS3 low-resource setting for auditory speech recognition (as well as for VSR). Our findings point to the viability of learning powerful speech representations entirely from raw video and audio, i.e., without relying on handcrafted features. Code and models are available at https://github.com/ahaliassos/raven.
    KHAN: Knowledge-Aware Hierarchical Attention Networks for Accurate Political Stance Prediction. (arXiv:2302.12126v3 [cs.CL] UPDATED)
    The political stance prediction for news articles has been widely studied to mitigate the echo chamber effect -- people fall into their thoughts and reinforce their pre-existing beliefs. The previous works for the political stance problem focus on (1) identifying political factors that could reflect the political stance of a news article and (2) capturing those factors effectively. Despite their empirical successes, they are not sufficiently justified in terms of how effective their identified factors are in the political stance prediction. Motivated by this, in this work, we conduct a user study to investigate important factors in political stance prediction, and observe that the context and tone of a news article (implicit) and external knowledge for real-world entities appearing in the article (explicit) are important in determining its political stance. Based on this observation, we propose a novel knowledge-aware approach to political stance prediction (KHAN), employing (1) hierarchical attention networks (HAN) to learn the relationships among words and sentences in three different levels and (2) knowledge encoding (KE) to incorporate external knowledge for real-world entities into the process of political stance prediction. Also, to take into account the subtle and important difference between opposite political stances, we build two independent political knowledge graphs (KG) (i.e., KG-lib and KG-con) by ourselves and learn to fuse the different political knowledge. Through extensive evaluations on three real-world datasets, we demonstrate the superiority of DASH in terms of (1) accuracy, (2) efficiency, and (3) effectiveness.
    Do we need entire training data for adversarial training?. (arXiv:2303.06241v2 [cs.CV] UPDATED)
    Deep Neural Networks (DNNs) are being used to solve a wide range of problems in many domains including safety-critical domains like self-driving cars and medical imagery. DNNs suffer from vulnerability against adversarial attacks. In the past few years, numerous approaches have been proposed to tackle this problem by training networks using adversarial training. Almost all the approaches generate adversarial examples for the entire training dataset, thus increasing the training time drastically. We show that we can decrease the training time for any adversarial training algorithm by using only a subset of training data for adversarial training. To select the subset, we filter the adversarially-prone samples from the training data. We perform a simple adversarial attack on all training examples to filter this subset. In this attack, we add a small perturbation to each pixel and a few grid lines to the input image. We perform adversarial training on the adversarially-prone subset and mix it with vanilla training performed on the entire dataset. Our results show that when our method-agnostic approach is plugged into FGSM, we achieve a speedup of 3.52x on MNIST and 1.98x on the CIFAR-10 dataset with comparable robust accuracy. We also test our approach on state-of-the-art Free adversarial training and achieve a speedup of 1.2x in training time with a marginal drop in robust accuracy on the ImageNet dataset.
    AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models. (arXiv:2304.00830v2 [cs.SD] UPDATED)
    Audio editing is applicable for various purposes, such as adding background sound effects, replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-based methods achieved zero-shot audio editing by using a diffusion and denoising process conditioned on the text description of the output audio. However, these methods still have some problems: 1) they have not been trained on editing tasks and cannot ensure good editing effects; 2) they can erroneously modify audio segments that do not require editing; 3) they need a complete description of the output audio, which is not always available or necessary in practical scenarios. In this work, we propose AUDIT, an instruction-guided audio editing model based on latent diffusion models. Specifically, AUDIT has three main design features: 1) we construct triplet training data (instruction, input audio, output audio) for different audio editing tasks and train a diffusion model using instruction and input (to be edited) audio as conditions and generating output (edited) audio; 2) it can automatically learn to only modify segments that need to be edited by comparing the difference between the input and output audio; 3) it only needs edit instructions instead of full target audio descriptions as text input. AUDIT achieves state-of-the-art results in both objective and subjective metrics for several audio editing tasks (e.g., adding, dropping, replacement, inpainting, super-resolution). Demo samples are available at https://audit-demo.github.io/.
    Emerging trends in machine learning for computational fluid dynamics. (arXiv:2211.15145v2 [physics.flu-dyn] UPDATED)
    The renewed interest from the scientific community in machine learning (ML) is opening many new areas of research. Here we focus on how novel trends in ML are providing opportunities to improve the field of computational fluid dynamics (CFD). In particular, we discuss synergies between ML and CFD that have already shown benefits, and we also assess areas that are under development and may produce important benefits in the coming years. We believe that it is also important to emphasize a balanced perspective of cautious optimism for these emerging approaches
    A relaxed proximal gradient descent algorithm for convergent plug-and-play with proximal denoiser. (arXiv:2301.13731v2 [stat.ML] UPDATED)
    This paper presents a new convergent Plug-and-Play (PnP) algorithm. PnP methods are efficient iterative algorithms for solving image inverse problems formulated as the minimization of the sum of a data-fidelity term and a regularization term. PnP methods perform regularization by plugging a pre-trained denoiser in a proximal algorithm, such as Proximal Gradient Descent (PGD). To ensure convergence of PnP schemes, many works study specific parametrizations of deep denoisers. However, existing results require either unverifiable or suboptimal hypotheses on the denoiser, or assume restrictive conditions on the parameters of the inverse problem. Observing that these limitations can be due to the proximal algorithm in use, we study a relaxed version of the PGD algorithm for minimizing the sum of a convex function and a weakly convex one. When plugged with a relaxed proximal denoiser, we show that the proposed PnP-$\alpha$PGD algorithm converges for a wider range of regularization parameters, thus allowing more accurate image restoration.
    Debiasing Scores and Prompts of 2D Diffusion for Robust Text-to-3D Generation. (arXiv:2303.15413v2 [cs.CV] UPDATED)
    The view inconsistency problem in score-distilling text-to-3D generation, also known as the Janus problem, arises from the intrinsic bias of 2D diffusion models, which leads to the unrealistic generation of 3D objects. In this work, we explore score-distilling text-to-3D generation and identify the main causes of the Janus problem. Based on these findings, we propose two approaches to debias the score-distillation frameworks for robust text-to-3D generation. Our first approach, called score debiasing, involves gradually increasing the truncation value for the score estimated by 2D diffusion models throughout the optimization process. Our second approach, called prompt debiasing, identifies conflicting words between user prompts and view prompts utilizing a language model and adjusts the discrepancy between view prompts and object-space camera poses. Our experimental results show that our methods improve realism by significantly reducing artifacts and achieve a good trade-off between faithfulness to the 2D diffusion models and 3D consistency with little overhead.
    Segment Anything. (arXiv:2304.02643v1 [cs.CV])
    We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at https://segment-anything.com to foster research into foundation models for computer vision.
    Reinforcement Learning for Freight Booking Control Problems. (arXiv:2102.00092v3 [math.OC] UPDATED)
    Booking control problems are sequential decision-making problems that occur in the domain of revenue management. More precisely, freight booking control focuses on the problem of deciding to accept or reject bookings: given a limited capacity, accept a booking request or reject it to reserve capacity for future bookings with potentially higher revenue. This problem can be formulated as a finite-horizon stochastic dynamic program, where accepting a set of requests results in a profit at the end of the booking period that depends on the cost of fulfilling the accepted bookings. For many freight applications, the cost of fulfilling requests is obtained by solving an operational decision-making problem, which often requires the solutions to mixed-integer linear programs. Routinely solving such operational problems when deploying reinforcement learning algorithms may be too time consuming. The majority of booking control policies are obtained by solving problem-specific mathematical programming relaxations that are often non-trivial to generalize to new problems and, in some cases, provide quite crude approximations. In this work, we propose a two-phase approach: we first train a supervised learning model to predict the objective of the operational problem, and then we deploy the model within reinforcement learning algorithms to compute control policies. This approach is general: it can be used every time the objective function of the end-of-horizon operational problem can be predicted, and it is particularly suitable to those cases where such problems are computationally hard. Furthermore, it allows one to leverage the recent advances in reinforcement learning as routinely solving the operational problem is replaced with a single prediction. Our methodology is evaluated on two booking control problems in the literature, namely, distributional logistics and airline cargo management.
    MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention. (arXiv:2211.13955v2 [cs.CR] UPDATED)
    Secure multi-party computation (MPC) enables computation directly on encrypted data and protects both data and model privacy in deep learning inference. However, existing neural network architectures, including Vision Transformers (ViTs), are not designed or optimized for MPC and incur significant latency overhead. We observe Softmax accounts for the major latency bottleneck due to a high communication complexity, but can be selectively replaced or linearized without compromising the model accuracy. Hence, in this paper, we propose an MPC-friendly ViT, dubbed MPCViT, to enable accurate yet efficient ViT inference in MPC. Based on a systematic latency and accuracy evaluation of the Softmax attention and other attention variants, we propose a heterogeneous attention optimization space. We also develop a simple yet effective MPC-aware neural architecture search algorithm for fast Pareto optimization. To further boost the inference efficiency, we propose MPCViT+, to jointly optimize the Softmax attention and other network components, including GeLU, matrix multiplication, etc. With extensive experiments, we demonstrate that MPCViT achieves 1.9%, 1.3% and 4.6% higher accuracy with 6.2x, 2.9x and 1.9x latency reduction compared with baseline ViT, MPCFormer and THE-X on the Tiny-ImageNet dataset, respectively. MPCViT+ further achieves 1.2x latency reduction on CIFAR-100 dataset and reaches a better Pareto front compared with MPCViT.
    A soft nearest-neighbor framework for continual semi-supervised learning. (arXiv:2212.05102v2 [cs.CV] UPDATED)
    Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual semi-supervised learning--a setting where not all the data samples are labeled. A primary issue in this scenario is the model forgetting representations of unlabeled data and overfitting the labeled samples. We leverage the power of nearest-neighbor classifiers to nonlinearly partition the feature space and flexibly model the underlying data distribution thanks to its non-parametric nature. This enables the model to learn a strong representation for the current task, and distill relevant information from previous tasks. We perform a thorough experimental evaluation and show that our method outperforms all the existing approaches by large margins, setting a solid state of the art on the continual semi-supervised learning paradigm. For example, on CIFAR-100 we surpass several others even when using at least 30 times less supervision (0.8% vs. 25% of annotations). Finally, our method works well on both low and high resolution images and scales seamlessly to more complex datasets such as ImageNet-100. The code is publicly available on https://github.com/kangzhiq/NNCSL
    Three Variations on Variational Autoencoders. (arXiv:2212.04451v2 [cs.LG] UPDATED)
    Variational autoencoders (VAEs) are one class of generative probabilistic latent-variable models designed for inference based on known data. We develop three variations on VAEs by introducing a second parameterized encoder/decoder pair and, for one variation, an additional fixed encoder. The parameters of the encoders/decoders are to be learned with a neural network. The fixed encoder is obtained by probabilistic-PCA. The variations are compared to the Evidence Lower Bound (ELBO) approximation to the original VAE. One variation leads to an Evidence Upper Bound (EUBO) that can be used in conjunction with the original ELBO to interrogate the convergence of the VAE.
    Discovering and Explaining the Non-Causality of Deep Learning in SAR ATR. (arXiv:2304.00668v2 [cs.CV] UPDATED)
    In recent years, deep learning has been widely used in SAR ATR and achieved excellent performance on the MSTAR dataset. However, due to constrained imaging conditions, MSTAR has data biases such as background correlation, i.e., background clutter properties have a spurious correlation with target classes. Deep learning can overfit clutter to reduce training errors. Therefore, the degree of overfitting for clutter reflects the non-causality of deep learning in SAR ATR. Existing methods only qualitatively analyze this phenomenon. In this paper, we quantify the contributions of different regions to target recognition based on the Shapley value. The Shapley value of clutter measures the degree of overfitting. Moreover, we explain how data bias and model bias contribute to non-causality. Concisely, data bias leads to comparable signal-to-clutter ratios and clutter textures in training and test sets. And various model structures have different degrees of overfitting for these biases. The experimental results of various models under standard operating conditions on the MSTAR dataset support our conclusions. Our code is available at https://github.com/waterdisappear/Data-Bias-in-MSTAR.
    Sociocultural knowledge is needed for selection of shots in hate speech detection tasks. (arXiv:2304.01890v2 [cs.CL] UPDATED)
    We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for the countries of Brazil, Germany, India and Kenya, to aid training and interpretability of models. We demonstrate how our lexicon can be used to interpret model predictions, showing that models developed to classify extreme speech rely heavily on target words when making predictions. Further, we propose a method to aid shot selection for training in low-resource settings via HATELEXICON. In few-shot learning, the selection of shots is of paramount importance to model performance. In our work, we simulate a few-shot setting for German and Hindi, using HASOC data for training and the Multilingual HateCheck (MHC) as a benchmark. We show that selecting shots based on our lexicon leads to models performing better on MHC than models trained on shots sampled randomly. Thus, when given only a few training examples, using our lexicon to select shots containing more sociocultural information leads to better few-shot performance.
    Neural Architecture Search with Multimodal Fusion Methods for Diagnosing Dementia. (arXiv:2302.05894v2 [cs.LG] UPDATED)
    Alzheimer's dementia (AD) affects memory, thinking, and language, deteriorating person's life. An early diagnosis is very important as it enables the person to receive medical help and ensure quality of life. Therefore, leveraging spontaneous speech in conjunction with machine learning methods for recognizing AD patients has emerged into a hot topic. Most of the previous works employ Convolutional Neural Networks (CNNs), to process the input signal. However, finding a CNN architecture is a time-consuming process and requires domain expertise. Moreover, the researchers introduce early and late fusion approaches for fusing different modalities or concatenate the representations of the different modalities during training, thus the inter-modal interactions are not captured. To tackle these limitations, first we exploit a Neural Architecture Search (NAS) method to automatically find a high performing CNN architecture. Next, we exploit several fusion methods, including Multimodal Factorized Bilinear Pooling and Tucker Decomposition, to combine both speech and text modalities. To the best of our knowledge, there is no prior work exploiting a NAS approach and these fusion methods in the task of dementia detection from spontaneous speech. We perform extensive experiments on the ADReSS Challenge dataset and show the effectiveness of our approach over state-of-the-art methods.
    Label Attention Network for sequential multi-label classification: you were looking at a wrong self-attention. (arXiv:2303.00280v2 [cs.LG] UPDATED)
    Most of the available user information can be represented as a sequence of timestamped events. Each event is assigned a set of categorical labels whose future structure is of great interest. For instance, our goal is to predict a group of items in the next customer's purchase or tomorrow's client transactions. This is a multi-label classification problem for sequential data. Modern approaches focus on transformer architecture for sequential data introducing self-attention for the elements in a sequence. In that case, we take into account events' time interactions but lose information on label inter-dependencies. Motivated by this shortcoming, we propose leveraging a self-attention mechanism over labels preceding the predicted step. As our approach is a Label-Attention NETwork, we call it LANET. Experimental evidence suggests that LANET outperforms the established models' performance and greatly captures interconnections between labels. For example, the micro-AUC of our approach is $0.9536$ compared to $0.7501$ for a vanilla transformer. We provide an implementation of LANET to facilitate its wider usage.
    CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs. (arXiv:2201.01863v3 [cs.LG] UPDATED)
    Need for the efficient processing of neural networks has given rise to the development of hardware accelerators. The increased adoption of specialized hardware has highlighted the need for more agile design flows for hardware-software co-design and domain-specific optimizations. In this paper, we present CFU Playground: a full-stack open-source framework that enables rapid and iterative design and evaluation of machine learning (ML) accelerators for embedded ML systems. Our tool provides a completely open-source end-to-end flow for hardware-software co-design on FPGAs and future systems research. This full-stack framework gives the users access to explore experimental and bespoke architectures that are customized and co-optimized for embedded ML. Our rapid, deploy-profile-optimization feedback loop lets ML hardware and software developers achieve significant returns out of a relatively small investment in customization. Using CFU Playground's design and evaluation loop, we show substantial speedups between 55$\times$ and 75$\times$. The soft CPU coupled with the accelerator opens up a new, rich design space between the two components that we explore in an automated fashion using Vizier, an open-source black-box optimization service.
    Vision Transformers are Parameter-Efficient Audio-Visual Learners. (arXiv:2212.07983v2 [cs.CV] UPDATED)
    Vision transformers (ViTs) have achieved impressive results on various computer vision tasks in the last several years. In this work, we study the capability of frozen ViTs, pretrained only on visual data, to generalize to audio-visual data without finetuning any of its original parameters. To do so, we propose a latent audio-visual hybrid (LAVISH) adapter that adapts pretrained ViTs to audio-visual tasks by injecting a small number of trainable parameters into every layer of a frozen ViT. To efficiently fuse visual and audio cues, our LAVISH adapter uses a small set of latent tokens, which form an attention bottleneck, thus, eliminating the quadratic cost of standard cross-attention. Compared to the existing modality-specific audio-visual methods, our approach achieves competitive or even better performance on various audio-visual tasks while using fewer tunable parameters and without relying on costly audio pretraining or external audio encoders. Our code is available at https://genjib.github.io/project_page/LAVISH/
    AutoRL Hyperparameter Landscapes. (arXiv:2304.02396v1 [cs.LG])
    Although Reinforcement Learning (RL) has shown to be capable of producing impressive results, its use is limited by the impact of its hyperparameters on performance. This often makes it difficult to achieve good results in practice. Automated RL (AutoRL) addresses this difficulty, yet little is known about the dynamics of the hyperparameter landscapes that hyperparameter optimization (HPO) methods traverse in search of optimal configurations. In view of existing AutoRL approaches dynamically adjusting hyperparameter configurations, we propose an approach to build and analyze these hyperparameter landscapes not just for one point in time but at multiple points in time throughout training. Addressing an important open question on the legitimacy of such dynamic AutoRL approaches, we provide thorough empirical evidence that the hyperparameter landscapes strongly vary over time across representative algorithms from RL literature (DQN and SAC) in different kinds of environments (Cartpole and Hopper). This supports the theory that hyperparameters should be dynamically adjusted during training and shows the potential for more insights on AutoRL problems that can be gained through landscape analyses.
    Efficient Quantum Algorithms for Quantum Optimal Control. (arXiv:2304.02613v1 [quant-ph])
    In this paper, we present efficient quantum algorithms that are exponentially faster than classical algorithms for solving the quantum optimal control problem. This problem involves finding the control variable that maximizes a physical quantity at time $T$, where the system is governed by a time-dependent Schr\"odinger equation. This type of control problem also has an intricate relation with machine learning. Our algorithms are based on a time-dependent Hamiltonian simulation method and a fast gradient-estimation algorithm. We also provide a comprehensive error analysis to quantify the total error from various steps, such as the finite-dimensional representation of the control function, the discretization of the Schr\"odinger equation, the numerical quadrature, and optimization. Our quantum algorithms require fault-tolerant quantum computers.
    Dynamic Bandits with an Auto-Regressive Temporal Structure. (arXiv:2210.16386v2 [cs.LG] UPDATED)
    Multi-armed bandit (MAB) problems are mainly studied under two extreme settings known as stochastic and adversarial. These two settings, however, do not capture realistic environments such as search engines and marketing and advertising, in which rewards stochastically change in time. Motivated by that, we introduce and study a dynamic MAB problem with stochastic temporal structure, where the expected reward of each arm is governed by an auto-regressive (AR) model. Due to the dynamic nature of the rewards, simple "explore and commit" policies fail, as all arms have to be explored continuously over time. We formalize this by characterizing a per-round regret lower bound, where the regret is measured against a strong (dynamic) benchmark. We then present an algorithm whose per-round regret almost matches our regret lower bound. Our algorithm relies on two mechanisms: (i) alternating between recently pulled arms and unpulled arms with potential, and (ii) restarting. These mechanisms enable the algorithm to dynamically adapt to changes and discard irrelevant past information at a suitable rate. In numerical studies, we further demonstrate the strength of our algorithm under non-stationary settings.
    Mapping historical forest biomass for stock-change assessments at parcel to landscape scales. (arXiv:2304.02632v1 [stat.AP])
    Understanding historical forest dynamics, specifically changes in forest biomass and carbon stocks, has become critical for assessing current forest climate benefits and projecting future benefits under various policy, regulatory, and stewardship scenarios. Carbon accounting frameworks based exclusively on national forest inventories are limited to broad-scale estimates, but model-based approaches that combine these inventories with remotely sensed data can yield contiguous fine-resolution maps of forest biomass and carbon stocks across landscapes over time. Here we describe a fundamental step in building a map-based stock-change framework: mapping historical forest biomass at fine temporal and spatial resolution (annual, 30m) across all of New York State (USA) from 1990 to 2019, using freely available data and open-source tools. Using Landsat imagery, US Forest Service Forest Inventory and Analysis (FIA) data, and off-the-shelf LiDAR collections we developed three modeling approaches for mapping historical forest aboveground biomass (AGB): training on FIA plot-level AGB estimates (direct), training on LiDAR-derived AGB maps (indirect), and an ensemble averaging predictions from the direct and indirect models. Model prediction surfaces (maps) were tested against FIA estimates at multiple scales. All three approaches produced viable outputs, yet tradeoffs were evident in terms of model complexity, map accuracy, saturation, and fine-scale pattern representation. The resulting map products can help identify where, when, and how forest carbon stocks are changing as a result of both anthropogenic and natural drivers alike. These products can thus serve as inputs to a wide range of applications including stock-change assessments, monitoring reporting and verification frameworks, and prioritizing parcels for protection or enrollment in improved management programs.
    Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding. (arXiv:2203.05711v4 [cs.CV] UPDATED)
    Despite recent advances of AI, story understanding remains an open and under-investigated problem. We collect, preprocess, and publicly release a video-language story dataset, Synopses of Movie Narratives (SyMoN), containing 5,193 video summaries of popular movies and TV series with a total length of 869 hours. SyMoN captures naturalistic storytelling videos made by human creators and intended for a human audience. As a prototypical and naturalistic story dataset, SyMoN features high coverage of multimodal story events and abundant mental-state descriptions. Its use of storytelling techniques cause cross-domain semantic gaps that provide appropriate challenges to existing models. We establish benchmarks on video-text retrieval and zero-shot alignment on movie summary videos, which showcase the importance of in-domain data and long-term memory in story understanding. With SyMoN, we hope to lay the groundwork for progress in multimodal story understanding.
    On Complexity of 1-Center in Various Metrics. (arXiv:2112.03222v2 [cs.CC] UPDATED)
    We consider the classic 1-center problem: Given a set $P$ of $n$ points in a metric space find the point in $P$ that minimizes the maximum distance to the other points of $P$. We study the complexity of this problem in $d$-dimensional $\ell_p$-metrics and in edit and Ulam metrics over strings of length $d$. Our results for the 1-center problem may be classified based on $d$ as follows. $\bullet$ Small $d$: Assuming the hitting set conjecture (HSC), we show that when $d=\omega(\log n)$, no subquadratic algorithm can solve 1-center problem in any of the $\ell_p$-metrics, or in edit or Ulam metrics. $\bullet$ Large $d$: When $d=\Omega(n)$, we extend our conditional lower bound to rule out subquartic algorithms for 1-center problem in edit metric (assuming Quantified SETH). On the other hand, we give a $(1+\epsilon)$-approximation for 1-center in Ulam metric with running time $\tilde{O_{\varepsilon}}(nd+n^2\sqrt{d})$. We also strengthen some of the above lower bounds by allowing approximations or by reducing the dimension $d$, but only against a weaker class of algorithms which list all requisite solutions. Moreover, we extend one of our hardness results to rule out subquartic algorithms for the well-studied 1-median problem in the edit metric, where given a set of $n$ strings each of length $n$, the goal is to find a string in the set that minimizes the sum of the edit distances to the rest of the strings in the set.
    Approximated Multi-Agent Fitted Q Iteration. (arXiv:2104.09343v5 [cs.LG] UPDATED)
    We formulate an efficient approximation for multi-agent batch reinforcement learning, the approximated multi-agent fitted Q iteration (AMAFQI). We present a detailed derivation of our approach. We propose an iterative policy search and show that it yields a greedy policy with respect to multiple approximations of the centralized, learned Q-function. In each iteration and policy evaluation, AMAFQI requires a number of computations that scales linearly with the number of agents whereas the analogous number of computations increase exponentially for the fitted Q iteration (FQI), a commonly used approaches in batch reinforcement learning. This property of AMAFQI is fundamental for the design of a tractable multi-agent approach. We evaluate the performance of AMAFQI and compare it to FQI in numerical simulations. The simulations illustrate the significant computation time reduction when using AMAFQI instead of FQI in multi-agent problems and corroborate the similar performance of both approaches.
    Existence and Minimax Theorems for Adversarial Surrogate Risks in Binary Classification. (arXiv:2206.09098v2 [cs.LG] UPDATED)
    Adversarial training is one of the most popular methods for training methods robust to adversarial attacks, however, it is not well-understood from a theoretical perspective. We prove and existence, regularity, and minimax theorems for adversarial surrogate risks. Our results explain some empirical observations on adversarial robustness from prior work and suggest new directions in algorithm development. Furthermore, our results extend previously known existence and minimax theorems for the adversarial classification risk to surrogate risks.
    A Max-relevance-min-divergence Criterion for Data Discretization with Applications on Naive Bayes. (arXiv:2209.10095v2 [cs.LG] UPDATED)
    In many classification models, data is discretized to better estimate its distribution. Existing discretization methods often target at maximizing the discriminant power of discretized data, while overlooking the fact that the primary target of data discretization in classification is to improve the generalization performance. As a result, the data tend to be over-split into many small bins since the data without discretization retain the maximal discriminant information. Thus, we propose a Max-Dependency-Min-Divergence (MDmD) criterion that maximizes both the discriminant information and generalization ability of the discretized data. More specifically, the Max-Dependency criterion maximizes the statistical dependency between the discretized data and the classification variable while the Min-Divergence criterion explicitly minimizes the JS-divergence between the training data and the validation data for a given discretization scheme. The proposed MDmD criterion is technically appealing, but it is difficult to reliably estimate the high-order joint distributions of attributes and the classification variable. We hence further propose a more practical solution, Max-Relevance-Min-Divergence (MRmD) discretization scheme, where each attribute is discretized separately, by simultaneously maximizing the discriminant information and the generalization ability of the discretized data. The proposed MRmD is compared with the state-of-the-art discretization algorithms under the naive Bayes classification framework on 45 machine-learning benchmark datasets. It significantly outperforms all the compared methods on most of the datasets.
    Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation. (arXiv:2304.00971v2 [cs.CV] UPDATED)
    This report serves as a supplementary document for TaskPrompter, detailing its implementation on a new joint 2D-3D multi-task learning benchmark based on Cityscapes-3D. TaskPrompter presents an innovative multi-task prompting framework that unifies the learning of (i) task-generic representations, (ii) task-specific representations, and (iii) cross-task interactions, as opposed to previous approaches that separate these learning objectives into different network modules. This unified approach not only reduces the need for meticulous empirical structure design but also significantly enhances the multi-task network's representation learning capability, as the entire model capacity is devoted to optimizing the three objectives simultaneously. TaskPrompter introduces a new multi-task benchmark based on Cityscapes-3D dataset, which requires the multi-task model to concurrently generate predictions for monocular 3D vehicle detection, semantic segmentation, and monocular depth estimation. These tasks are essential for achieving a joint 2D-3D understanding of visual scenes, particularly in the development of autonomous driving systems. On this challenging benchmark, our multi-task model demonstrates strong performance compared to single-task state-of-the-art methods and establishes new state-of-the-art results on the challenging 3D detection and depth estimation tasks.
    Learning Data Representations with Joint Diffusion Models. (arXiv:2301.13622v2 [cs.LG] UPDATED)
    Joint machine learning models that allow synthesizing and classifying data often offer uneven performance between those tasks or are unstable to train. In this work, we depart from a set of empirical observations that indicate the usefulness of internal representations built by contemporary deep diffusion-based generative models not only for generating but also predicting. We then propose to extend the vanilla diffusion model with a classifier that allows for stable joint end-to-end training with shared parameterization between those objectives. The resulting joint diffusion model outperforms recent state-of-the-art hybrid methods in terms of both classification and generation quality on all evaluated benchmarks. On top of our joint training approach, we present how we can directly benefit from shared generative and discriminative representations by introducing a method for visual counterfactual explanations.
    Rethinking Initialization of the Sinkhorn Algorithm. (arXiv:2206.07630v2 [stat.ML] UPDATED)
    While the optimal transport (OT) problem was originally formulated as a linear program, the addition of entropic regularization has proven beneficial both computationally and statistically, for many applications. The Sinkhorn fixed-point algorithm is the most popular approach to solve this regularized problem, and, as a result, multiple attempts have been made to reduce its runtime using, e.g., annealing in the regularization parameter, momentum or acceleration. The premise of this work is that initialization of the Sinkhorn algorithm has received comparatively little attention, possibly due to two preconceptions: since the regularized OT problem is convex, it may not be worth crafting a good initialization, since any is guaranteed to work; secondly, because the outputs of the Sinkhorn algorithm are often unrolled in end-to-end pipelines, a data-dependent initialization would bias Jacobian computations. We challenge this conventional wisdom, and show that data-dependent initializers result in dramatic speed-ups, with no effect on differentiability as long as implicit differentiation is used. Our initializations rely on closed-forms for exact or approximate OT solutions that are known in the 1D, Gaussian or GMM settings. They can be used with minimal tuning, and result in consistent speed-ups for a wide variety of OT problems.
    Learning-based Design of Luenberger Observers for Autonomous Nonlinear Systems. (arXiv:2210.01476v2 [math.OC] UPDATED)
    Designing Luenberger observers for nonlinear systems involves the challenging task of transforming the state to an alternate coordinate system, possibly of higher dimensions, where the system is asymptotically stable and linear up to output injection. The observer then estimates the system's state in the original coordinates by inverting the transformation map. However, finding a suitable injective transformation whose inverse can be derived remains a primary challenge for general nonlinear systems. We propose a novel approach that uses supervised physics-informed neural networks to approximate both the transformation and its inverse. Our method exhibits superior generalization capabilities to contemporary methods and demonstrates robustness to both neural network's approximation errors and system uncertainties.
    DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images. (arXiv:2304.02389v1 [eess.IV])
    Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and scientific benchmarking for diabetic retinopathy analysis using UW-OCTA images, we organized a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). The challenge consists of three tasks: segmentation of DR lesions, image quality assessment and DR grading. The scientific community responded positively to the challenge, with 11, 12, and 13 teams from geographically diverse institutes submitting different solutions in these three tasks, respectively. This paper presents a summary and analysis of the top-performing solutions and results for each task of the challenge. The obtained results from top algorithms indicate the importance of data augmentation, model architecture and ensemble of networks in improving the performance of deep learning models. These findings have the potential to enable new developments in diabetic retinopathy analysis. The challenge remains open for post-challenge registrations and submissions for benchmarking future methodology developments.
    Boosting Graph Neural Networks via Adaptive Knowledge Distillation. (arXiv:2210.05920v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have shown remarkable performance on diverse graph mining tasks. Although different GNNs can be unified as the same message passing framework, they learn complementary knowledge from the same graph. Knowledge distillation (KD) is developed to combine the diverse knowledge from multiple models. It transfers knowledge from high-capacity teachers to a lightweight student. However, to avoid oversmoothing, GNNs are often shallow, which deviates from the setting of KD. In this context, we revisit KD by separating its benefits from model compression and emphasizing its power of transferring knowledge. To this end, we need to tackle two challenges: how to transfer knowledge from compact teachers to a student with the same capacity; and, how to exploit student GNN's own strength to learn knowledge. In this paper, we propose a novel adaptive KD framework, called BGNN, which sequentially transfers knowledge from multiple GNNs into a student GNN. We also introduce an adaptive temperature module and a weight boosting module. These modules guide the student to the appropriate knowledge for effective learning. Extensive experiments have demonstrated the effectiveness of BGNN. In particular, we achieve up to 3.05% improvement for node classification and 6.35% improvement for graph classification over vanilla GNNs.
    A dynamic Bayesian optimized active recommender system for curiosity-driven Human-in-the-loop automated experiments. (arXiv:2304.02484v1 [cs.LG])
    Optimization of experimental materials synthesis and characterization through active learning methods has been growing over the last decade, with examples ranging from measurements of diffraction on combinatorial alloys at synchrotrons, to searches through chemical space with automated synthesis robots for perovskites. In virtually all cases, the target property of interest for optimization is defined apriori with limited human feedback during operation. In contrast, here we present the development of a new type of human in the loop experimental workflow, via a Bayesian optimized active recommender system (BOARS), to shape targets on the fly, employing human feedback. We showcase examples of this framework applied to pre-acquired piezoresponse force spectroscopy of a ferroelectric thin film, and then implement this in real time on an atomic force microscope, where the optimization proceeds to find symmetric piezoresponse amplitude hysteresis loops. It is found that such features appear more affected by subsurface defects than the local domain structure. This work shows the utility of human-augmented machine learning approaches for curiosity-driven exploration of systems across experimental domains. The analysis reported here is summarized in Colab Notebook for the purpose of tutorial and application to other data: https://github.com/arpanbiswas52/varTBO
    CyTran: A Cycle-Consistent Transformer with Multi-Level Consistency for Non-Contrast to Contrast CT Translation. (arXiv:2110.06400v3 [eess.IV] UPDATED)
    We propose a novel approach to translate unpaired contrast computed tomography (CT) scans to non-contrast CT scans and the other way around. Solving this task has two important applications: (i) to automatically generate contrast CT scans for patients for whom injecting contrast substance is not an option, and (ii) to enhance the alignment between contrast and non-contrast CT by reducing the differences induced by the contrast substance before registration. Our approach is based on cycle-consistent generative adversarial convolutional transformers, for short, CyTran. Our neural model can be trained on unpaired images, due to the integration of a multi-level cycle-consistency loss. Aside from the standard cycle-consistency loss applied at the image level, we propose to apply additional cycle-consistency losses between intermediate feature representations, which enforces the model to be cycle-consistent at multiple representations levels, leading to superior results. To deal with high-resolution images, we design a hybrid architecture based on convolutional and multi-head attention layers. In addition, we introduce a novel data set, Coltea-Lung-CT-100W, containing 100 3D triphasic lung CT scans (with a total of 37,290 images) collected from 100 female patients (there is one examination per patient). Each scan contains three phases (non-contrast, early portal venous, and late arterial), allowing us to perform experiments to compare our novel approach with state-of-the-art methods for image style transfer. Our empirical results show that CyTran outperforms all competing methods. Moreover, we show that CyTran can be employed as a preliminary step to improve a state-of-the-art medical image alignment method. We release our novel model and data set as open source at https://github.com/ristea/cycle-transformer.
    Adversarial robustness of VAEs through the lens of local geometry. (arXiv:2208.03923v2 [cs.LG] UPDATED)
    In an unsupervised attack on variational autoencoders (VAEs), an adversary finds a small perturbation in an input sample that significantly changes its latent space encoding, thereby compromising the reconstruction for a fixed decoder. A known reason for such vulnerability is the distortions in the latent space resulting from a mismatch between approximated latent posterior and a prior distribution. Consequently, a slight change in an input sample can move its encoding to a low/zero density region in the latent space resulting in an unconstrained generation. This paper demonstrates that an optimal way for an adversary to attack VAEs is to exploit a directional bias of a stochastic pullback metric tensor induced by the encoder and decoder networks. The pullback metric tensor of an encoder measures the change in infinitesimal latent volume from an input to a latent space. Thus, it can be viewed as a lens to analyse the effect of input perturbations leading to latent space distortions. We propose robustness evaluation scores using the eigenspectrum of a pullback metric tensor. Moreover, we empirically show that the scores correlate with the robustness parameter $\beta$ of the $\beta-$VAE. Since increasing $\beta$ also degrades reconstruction quality, we demonstrate a simple alternative using \textit{mixup} training to fill the empty regions in the latent space, thus improving robustness with improved reconstruction.
    MMRNet: Improving Reliability for Multimodal Object Detection and Segmentation for Bin Picking via Multimodal Redundancy. (arXiv:2210.10842v2 [cs.CV] UPDATED)
    Recently, there has been tremendous interest in industry 4.0 infrastructure to address labor shortages in global supply chains. Deploying artificial intelligence-enabled robotic bin picking systems in real world has become particularly important for reducing stress and physical demands of workers while increasing speed and efficiency of warehouses. To this end, artificial intelligence-enabled robotic bin picking systems may be used to automate order picking, but with the risk of causing expensive damage during an abnormal event such as sensor failure. As such, reliability becomes a critical factor for translating artificial intelligence research to real world applications and products. In this paper, we propose a reliable object detection and segmentation system with MultiModal Redundancy (MMRNet) for tackling object detection and segmentation for robotic bin picking using data from different modalities. This is the first system that introduces the concept of multimodal redundancy to address sensor failure issues during deployment. In particular, we realize the multimodal redundancy framework with a gate fusion module and dynamic ensemble learning. Finally, we present a new label-free multi-modal consistency (MC) score that utilizes the output from all modalities to measure the overall system output reliability and uncertainty. Through experiments, we demonstrate that in an event of missing modality, our system provides a much more reliable performance compared to baseline models. We also demonstrate that our MC score is a more reliability indicator for outputs during inference time compared to the model generated confidence scores that are often over-confident.
    ECG Feature Importance Rankings: Cardiologists vs. Algorithms. (arXiv:2304.02577v1 [physics.med-ph])
    Feature importance methods promise to provide a ranking of features according to importance for a given classification task. A wide range of methods exist but their rankings often disagree and they are inherently difficult to evaluate due to a lack of ground truth beyond synthetic datasets. In this work, we put feature importance methods to the test on real-world data in the domain of cardiology, where we try to distinguish three specific pathologies from healthy subjects based on ECG features comparing to features used in cardiologists' decision rules as ground truth. Some methods generally performed well and others performed poorly, while some methods did well on some but not all of the problems considered.
    Self-Distillation for Gaussian Process Regression and Classification. (arXiv:2304.02641v1 [stat.ML])
    We propose two approaches to extend the notion of knowledge distillation to Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC); data-centric and distribution-centric. The data-centric approach resembles most current distillation techniques for machine learning, and refits a model on deterministic predictions from the teacher, while the distribution-centric approach, re-uses the full probabilistic posterior for the next iteration. By analyzing the properties of these approaches, we show that the data-centric approach for GPR closely relates to known results for self-distillation of kernel ridge regression and that the distribution-centric approach for GPR corresponds to ordinary GPR with a very particular choice of hyperparameters. Furthermore, we demonstrate that the distribution-centric approach for GPC approximately corresponds to data duplication and a particular scaling of the covariance and that the data-centric approach for GPC requires redefining the model from a Binomial likelihood to a continuous Bernoulli likelihood to be well-specified. To the best of our knowledge, our proposed approaches are the first to formulate knowledge distillation specifically for Gaussian Process models.
    Fully Variational Noise-Contrastive Estimation. (arXiv:2304.02473v1 [cs.LG])
    By using the underlying theory of proper scoring rules, we design a family of noise-contrastive estimation (NCE) methods that are tractable for latent variable models. Both terms in the underlying NCE loss, the one using data samples and the one using noise samples, can be lower-bounded as in variational Bayes, therefore we call this family of losses fully variational noise-contrastive estimation. Variational autoencoders are a particular example in this family and therefore can be also understood as separating real data from synthetic samples using an appropriate classification loss. We further discuss other instances in this family of fully variational NCE objectives and indicate differences in their empirical behavior.
    Generative Adversarial Networks (GANs Survey): Challenges, Solutions, and Future Directions. (arXiv:2005.00065v4 [cs.LG] UPDATED)
    Generative Adversarial Networks (GANs) is a novel class of deep generative models which has recently gained significant attention. GANs learns complex and high-dimensional distributions implicitly over images, audio, and data. However, there exists major challenges in training of GANs, i.e., mode collapse, non-convergence and instability, due to inappropriate design of network architecture, use of objective function and selection of optimization algorithm. Recently, to address these challenges, several solutions for better design and optimization of GANs have been investigated based on techniques of re-engineered network architectures, new objective functions and alternative optimization algorithms. To the best of our knowledge, there is no existing survey that has particularly focused on broad and systematic developments of these solutions. In this study, we perform a comprehensive survey of the advancements in GANs design and optimization solutions proposed to handle GANs challenges. We first identify key research issues within each design and optimization technique and then propose a new taxonomy to structure solutions by key research issues. In accordance with the taxonomy, we provide a detailed discussion on different GANs variants proposed within each solution and their relationships. Finally, based on the insights gained, we present the promising research directions in this rapidly growing field.
    A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification. (arXiv:2211.15259v2 [cs.CV] UPDATED)
    Reliable application of machine learning-based decision systems in the wild is one of the major challenges currently investigated by the field. A large portion of established approaches aims to detect erroneous predictions by means of assigning confidence scores. This confidence may be obtained by either quantifying the model's predictive uncertainty, learning explicit scoring functions, or assessing whether the input is in line with the training distribution. Curiously, while these approaches all state to address the same eventual goal of detecting failures of a classifier upon real-life application, they currently constitute largely separated research fields with individual evaluation protocols, which either exclude a substantial part of relevant methods or ignore large parts of relevant failure sources. In this work, we systematically reveal current pitfalls caused by these inconsistencies and derive requirements for a holistic and realistic evaluation of failure detection. To demonstrate the relevance of this unified perspective, we present a large-scale empirical study for the first time enabling benchmarking confidence scoring functions w.r.t all relevant methods and failure sources. The revelation of a simple softmax response baseline as the overall best performing method underlines the drastic shortcomings of current evaluation in the abundance of publicized research on confidence scoring. Code and trained models are at https://github.com/IML-DKFZ/fd-shifts.
    A Semi-Supervised Adaptive Discriminative Discretization Method Improving Discrimination Power of Regularized Naive Bayes. (arXiv:2111.10983v3 [cs.LG] UPDATED)
    Recently, many improved naive Bayes methods have been developed with enhanced discrimination capabilities. Among them, regularized naive Bayes (RNB) produces excellent performance by balancing the discrimination power and generalization capability. Data discretization is important in naive Bayes. By grouping similar values into one interval, the data distribution could be better estimated. However, existing methods including RNB often discretize the data into too few intervals, which may result in a significant information loss. To address this problem, we propose a semi-supervised adaptive discriminative discretization framework for naive Bayes, which could better estimate the data distribution by utilizing both labeled data and unlabeled data through pseudo-labeling techniques. The proposed method also significantly reduces the information loss during discretization by utilizing an adaptive discriminative discretization scheme, and hence greatly improves the discrimination power of classifiers. The proposed RNB+, i.e., regularized naive Bayes utilizing the proposed discretization framework, is systematically evaluated on a wide range of machine-learning datasets. It significantly and consistently outperforms state-of-the-art NB classifiers.
    GraphTune: A Learning-based Graph Generative Model with Tunable Structural Features. (arXiv:2201.11494v3 [cs.LG] UPDATED)
    Generative models for graphs have been actively studied for decades, and they have a wide range of applications. Recently, learning-based graph generation that reproduces real-world graphs has been attracting the attention of many researchers. Although several generative models that utilize modern machine learning technologies have been proposed, conditional generation of general graphs has been less explored in the field. In this paper, we propose a generative model that allows us to tune the value of a global-level structural feature as a condition. Our model, called GraphTune, makes it possible to tune the value of any structural feature of generated graphs using Long Short Term Memory (LSTM) and a Conditional Variational AutoEncoder (CVAE). We performed comparative evaluations of GraphTune and conventional models on a real graph dataset. The evaluations show that GraphTune makes it possible to more clearly tune the value of a global-level structural feature better than conventional models.
    Collective Intelligence for 2D Push Manipulations with Mobile Robots. (arXiv:2211.15136v3 [cs.RO] UPDATED)
    While natural systems often present collective intelligence that allows them to self-organize and adapt to changes, the equivalent is missing in most artificial systems. We explore the possibility of such a system in the context of cooperative 2D push manipulations using mobile robots. Although conventional works demonstrate potential solutions for the problem in restricted settings, they have computational and learning difficulties. More importantly, these systems do not possess the ability to adapt when facing environmental changes. In this work, we show that by distilling a planner derived from a differentiable soft-body physics simulator into an attention-based neural network, our multi-robot push manipulation system achieves better performance than baselines. In addition, our system also generalizes to configurations not seen during training and is able to adapt toward task completions when external turbulence and environmental changes are applied. Supplementary videos can be found on our project website: https://sites.google.com/view/ciom/home
    Diffusion Schr\"odinger Bridge with Applications to Score-Based Generative Modeling. (arXiv:2106.01357v5 [stat.ML] UPDATED)
    Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schr\"odinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time marginal of the forward (resp. backward) SDE with respect to the prior (resp. data) distribution. Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013).
    Quantifying the Roles of Visual, Linguistic, and Visual-Linguistic Complexity in Verb Acquisition. (arXiv:2304.02492v1 [cs.CL])
    Children typically learn the meanings of nouns earlier than the meanings of verbs. However, it is unclear whether this asymmetry is a result of complexity in the visual structure of categories in the world to which language refers, the structure of language itself, or the interplay between the two sources of information. We quantitatively test these three hypotheses regarding early verb learning by employing visual and linguistic representations of words sourced from large-scale pre-trained artificial neural networks. Examining the structure of both visual and linguistic embedding spaces, we find, first, that the representation of verbs is generally more variable and less discriminable within domain than the representation of nouns. Second, we find that if only one learning instance per category is available, visual and linguistic representations are less well aligned in the verb system than in the noun system. However, in parallel with the course of human language development, if multiple learning instances per category are available, visual and linguistic representations become almost as well aligned in the verb system as in the noun system. Third, we compare the relative contributions of factors that may predict learning difficulty for individual words. A regression analysis reveals that visual variability is the strongest factor that internally drives verb learning, followed by visual-linguistic alignment and linguistic variability. Based on these results, we conclude that verb acquisition is influenced by all three sources of complexity, but that the variability of visual structure poses the most significant challenge for verb learning.
    Optimism Based Exploration in Large-Scale Recommender Systems. (arXiv:2304.02572v1 [cs.IR])
    Bandit learning algorithms have been an increasingly popular design choice for recommender systems. Despite the strong interest in bandit learning from the community, there remains multiple bottlenecks that prevent many bandit learning approaches from productionalization. Two of the most important bottlenecks are scaling to multi-task and A/B testing. Classic bandit algorithms, especially those leveraging contextual information, often requires reward for uncertainty estimation, which hinders their adoptions in multi-task recommender systems. Moreover, different from supervised learning algorithms, bandit learning algorithms emphasize greatly on the data collection process through their explorative nature. Such explorative behavior induces unfair evaluation for bandit learning agents in a classic A/B test setting. In this work, we present a novel design of production bandit learning life-cycle for recommender systems, along with a novel set of metrics to measure their efficiency in user exploration. We show through large-scale production recommender system experiments and in-depth analysis that our bandit agent design improves personalization for the production recommender system and our experiment design fairly evaluates the performance of bandit learning algorithms.
    Physics-Inspired Interpretability Of Machine Learning Models. (arXiv:2304.02381v1 [cs.LG])
    The ability to explain decisions made by machine learning models remains one of the most significant hurdles towards widespread adoption of AI in highly sensitive areas such as medicine, cybersecurity or autonomous driving. Great interest exists in understanding which features of the input data prompt model decision making. In this contribution, we propose a novel approach to identify relevant features of the input data, inspired by methods from the energy landscapes field, developed in the physical sciences. By identifying conserved weights within groups of minima of the loss landscapes, we can identify the drivers of model decision making. Analogues to this idea exist in the molecular sciences, where coordinate invariants or order parameters are employed to identify critical features of a molecule. However, no such approach exists for machine learning loss landscapes. We will demonstrate the applicability of energy landscape methods to machine learning models and give examples, both synthetic and from the real world, for how these methods can help to make models more interpretable.
    Decentralized gradient descent maximization method for composite nonconvex strongly-concave minimax problems. (arXiv:2304.02441v1 [math.OC])
    Minimax problems have recently attracted a lot of research interests. A few efforts have been made to solve decentralized nonconvex strongly-concave (NCSC) minimax-structured optimization; however, all of them focus on smooth problems with at most a constraint on the maximization variable. In this paper, we make the first attempt on solving composite NCSC minimax problems that can have convex nonsmooth terms on both minimization and maximization variables. Our algorithm is designed based on a novel reformulation of the decentralized minimax problem that introduces a multiplier to absorb the dual consensus constraint. The removal of dual consensus constraint enables the most aggressive (i.e., local maximization instead of a gradient ascent step) dual update that leads to the benefit of taking a larger primal stepsize and better complexity results. In addition, the decoupling of the nonsmoothness and consensus on the dual variable eases the analysis of a decentralized algorithm; thus our reformulation creates a new way for interested researchers to design new (and possibly more efficient) decentralized methods on solving NCSC minimax problems. We show a global convergence result of the proposed algorithm and an iteration complexity result to produce a (near) stationary point of the reformulation. Moreover, a relation is established between the (near) stationarities of the reformulation and the original formulation. With this relation, we show that when the dual regularizer is smooth, our algorithm can have lower complexity results (with reduced dependence on a condition number) than existing ones to produce a near-stationary point of the original formulation. Numerical experiments are conducted on a distributionally robust logistic regression to demonstrate the performance of the proposed algorithm.
    Initialization Approach for Nonlinear State-Space Identification via the Subspace Encoder Approach. (arXiv:2304.02119v1 [eess.SY])
    The SUBNET neural network architecture has been developed to identify nonlinear state-space models from input-output data. To achieve this, it combines the rolled-out nonlinear state-space equations and a state encoder function, both parameterised as a neural network. The encoder function is introduced to reconstruct the current state from past input-output data. Hence it enables the forward simulation of the rolled-out state-space model. While this approach has shown to provide high-accuracy and consistent model estimation, its convergence can be significantly improved by efficient initialization of the training process. This paper focuses on such an initialisation of the subspace encoder approach using the Best Linear Approximation (BLA). Using the BLA provided state-space matrices and its associated reconstructability map both the state-transition part of the network and the encoder are initialized. The performance of the improved initialisation scheme is evaluated on a Wiener-Hammerstein simulation example and a benchmark dataset. The results show that for a weakly nonlinear system, the proposed initialisation based on the linear reconstructability map results in a faster convergence and a better model quality.
    Query lower bounds for log-concave sampling. (arXiv:2304.02599v1 [math.ST])
    Log-concave sampling has witnessed remarkable algorithmic advances in recent years, but the corresponding problem of proving lower bounds for this task has remained elusive, with lower bounds previously known only in dimension one. In this work, we establish the following query lower bounds: (1) sampling from strongly log-concave and log-smooth distributions in dimension $d\ge 2$ requires $\Omega(\log \kappa)$ queries, which is sharp in any constant dimension, and (2) sampling from Gaussians in dimension $d$ (hence also from general log-concave and log-smooth distributions in dimension $d$) requires $\widetilde \Omega(\min(\sqrt\kappa \log d, d))$ queries, which is nearly sharp for the class of Gaussians. Here $\kappa$ denotes the condition number of the target distribution. Our proofs rely upon (1) a multiscale construction inspired by work on the Kakeya conjecture in harmonic analysis, and (2) a novel reduction that demonstrates that block Krylov algorithms are optimal for this problem, as well as connections to lower bound techniques based on Wishart matrices developed in the matrix-vector query literature.
    Effective control of two-dimensional Rayleigh--B\'enard convection: invariant multi-agent reinforcement learning is all you need. (arXiv:2304.02370v1 [physics.flu-dyn])
    Rayleigh-B\'enard convection (RBC) is a recurrent phenomenon in several industrial and geoscience flows and a well-studied system from a fundamental fluid-mechanics viewpoint. However, controlling RBC, for example by modulating the spatial distribution of the bottom-plate heating in the canonical RBC configuration, remains a challenging topic for classical control-theory methods. In the present work, we apply deep reinforcement learning (DRL) for controlling RBC. We show that effective RBC control can be obtained by leveraging invariant multi-agent reinforcement learning (MARL), which takes advantage of the locality and translational invariance inherent to RBC flows inside wide channels. The MARL framework applied to RBC allows for an increase in the number of control segments without encountering the curse of dimensionality that would result from a naive increase in the DRL action-size dimension. This is made possible by the MARL ability for re-using the knowledge generated in different parts of the RBC domain. We show in a case study that MARL DRL is able to discover an advanced control strategy that destabilizes the spontaneous RBC double-cell pattern, changes the topology of RBC by coalescing adjacent convection cells, and actively controls the resulting coalesced cell to bring it to a new stable configuration. This modified flow configuration results in reduced convective heat transfer, which is beneficial in several industrial processes. Therefore, our work both shows the potential of MARL DRL for controlling large RBC systems, as well as demonstrates the possibility for DRL to discover strategies that move the RBC configuration between different topological configurations, yielding desirable heat-transfer characteristics. These results are useful for both gaining further understanding of the intrinsic properties of RBC, as well as for developing industrial applications.
    GenPhys: From Physical Processes to Generative Models. (arXiv:2304.02637v1 [cs.LG])
    Since diffusion models (DM) and the more recent Poisson flow generative models (PFGM) are inspired by physical processes, it is reasonable to ask: Can physical processes offer additional new generative models? We show that the answer is yes. We introduce a general family, Generative Models from Physical Processes (GenPhys), where we translate partial differential equations (PDEs) describing physical processes to generative models. We show that generative models can be constructed from s-generative PDEs (s for smooth). GenPhys subsume the two existing generative models (DM and PFGM) and even give rise to new families of generative models, e.g., "Yukawa Generative Models" inspired from weak interactions. On the other hand, some physical processes by default do not belong to the GenPhys family, e.g., the wave equation and the Schr\"{o}dinger equation, but could be made into the GenPhys family with some modifications. Our goal with GenPhys is to explore and expand the design space of generative models.
    Optimal Energy Storage Scheduling for Wind Curtailment Reduction and Energy Arbitrage: A Deep Reinforcement Learning Approach. (arXiv:2304.02239v1 [cs.LG])
    Wind energy has been rapidly gaining popularity as a means for combating climate change. However, the variable nature of wind generation can undermine system reliability and lead to wind curtailment, causing substantial economic losses to wind power producers. Battery energy storage systems (BESS) that serve as onsite backup sources are among the solutions to mitigate wind curtailment. However, such an auxiliary role of the BESS might severely weaken its economic viability. This paper addresses the issue by proposing joint wind curtailment reduction and energy arbitrage for the BESS. We decouple the market participation of the co-located wind-battery system and develop a joint-bidding framework for the wind farm and BESS. It is challenging to optimize the joint-bidding because of the stochasticity of energy prices and wind generation. Therefore, we leverage deep reinforcement learning to maximize the overall revenue from the spot market while unlocking the BESS's potential in concurrently reducing wind curtailment and conducting energy arbitrage. We validate the proposed strategy using realistic wind farm data and demonstrate that our joint-bidding strategy responds better to wind curtailment and generates higher revenues than the optimization-based benchmark. Our simulations also reveal that the extra wind generation used to be curtailed can be an effective power source to charge the BESS, resulting in additional financial returns.
    DeepGANTT: A Scalable Deep Learning Scheduler for Backscatter Networks. (arXiv:2112.12985v2 [cs.LG] UPDATED)
    Novel backscatter communication techniques enable battery-free sensor tags to interoperate with unmodified standard IoT devices, extending a sensor network's capabilities in a scalable manner. Without requiring additional dedicated infrastructure, the battery-free tags harvest energy from the environment, while the IoT devices provide them with the unmodulated carrier they need to communicate. A schedule coordinates the provision of carriers for the communications of battery-free devices with IoT nodes. Optimal carrier scheduling is an NP-hard problem that limits the scalability of network deployments. Thus, existing solutions waste energy and other valuable resources by scheduling the carriers suboptimally. We present DeepGANTT, a deep learning scheduler that leverages graph neural networks to efficiently provide near-optimal carrier scheduling. We train our scheduler with relatively small optimal schedules obtained from a constraint optimization solver, achieving a performance within 3% of the optimal scheduler. Without the need to retrain, DeepGANTT generalizes to networks 6x larger in the number of nodes and 10x larger in the number of tags than those used for training, breaking the scalability limitations of the optimal scheduler and reducing carrier utilization by up to 50% compared to the state-of-the-art heuristic. Our scheduler efficiently reduces energy and spectrum utilization in backscatter networks.
    Goal-Conditioned Imitation Learning using Score-based Diffusion Policies. (arXiv:2304.02532v1 [cs.LG])
    We propose a new policy representation based on score-based diffusion models (SDMs). We apply our new policy representation in the domain of Goal-Conditioned Imitation Learning (GCIL) to learn general-purpose goal-specified policies from large uncurated datasets without rewards. Our new goal-conditioned policy architecture "$\textbf{BE}$havior generation with $\textbf{S}$c$\textbf{O}$re-based Diffusion Policies" (BESO) leverages a generative, score-based diffusion model as its policy. BESO decouples the learning of the score model from the inference sampling process, and, hence allows for fast sampling strategies to generate goal-specified behavior in just 3 denoising steps, compared to 30+ steps of other diffusion based policies. Furthermore, BESO is highly expressive and can effectively capture multi-modality present in the solution space of the play data. Unlike previous methods such as Latent Plans or C-Bet, BESO does not rely on complex hierarchical policies or additional clustering for effective goal-conditioned behavior learning. Finally, we show how BESO can even be used to learn a goal-independent policy from play-data using classifier-free guidance. To the best of our knowledge this is the first work that a) represents a behavior policy based on such a decoupled SDM b) learns an SDM based policy in the domain of GCIL and c) provides a way to simultaneously learn a goal-dependent and a goal-independent policy from play-data. We evaluate BESO through detailed simulation and show that it consistently outperforms several state-of-the-art goal-conditioned imitation learning methods on challenging benchmarks. We additionally provide extensive ablation studies and experiments to demonstrate the effectiveness of our method for effective goal-conditioned behavior generation.
    How good Neural Networks interpretation methods really are? A quantitative benchmark. (arXiv:2304.02383v1 [cs.LG])
    Saliency Maps (SMs) have been extensively used to interpret deep learning models decision by highlighting the features deemed relevant by the model. They are used on highly nonlinear problems, where linear feature selection (FS) methods fail at highlighting relevant explanatory variables. However, the reliability of gradient-based feature attribution methods such as SM has mostly been only qualitatively (visually) assessed, and quantitative benchmarks are currently missing, partially due to the lack of a definite ground truth on image data. Concerned about the apophenic biases introduced by visual assessment of these methods, in this paper we propose a synthetic quantitative benchmark for Neural Networks (NNs) interpretation methods. For this purpose, we built synthetic datasets with nonlinearly separable classes and increasing number of decoy (random) features, illustrating the challenge of FS in high-dimensional settings. We also compare these methods to conventional approaches such as mRMR or Random Forests. Our results show that our simple synthetic datasets are sufficient to challenge most of the benchmarked methods. TreeShap, mRMR and LassoNet are the best performing FS methods. We also show that, when quantifying the relevance of a few non linearly-entangled predictive features diluted in a large number of irrelevant noisy variables, neural network-based FS and interpretation methods are still far from being reliable.
    Pac-HuBERT: Self-Supervised Music Source Separation via Primitive Auditory Clustering and Hidden-Unit BERT. (arXiv:2304.02160v1 [cs.SD])
    In spite of the progress in music source separation research, the small amount of publicly-available clean source data remains a constant limiting factor for performance. Thus, recent advances in self-supervised learning present a largely-unexplored opportunity for improving separation models by leveraging unlabelled music data. In this paper, we propose a self-supervised learning framework for music source separation inspired by the HuBERT speech representation model. We first investigate the potential impact of the original HuBERT model by inserting an adapted version of it into the well-known Demucs V2 time-domain separation model architecture. We then propose a time-frequency-domain self-supervised model, Pac-HuBERT (for primitive auditory clustering HuBERT), that we later use in combination with a Res-U-Net decoder for source separation. Pac-HuBERT uses primitive auditory features of music as unsupervised clustering labels to initialize the self-supervised pretraining process using the Free Music Archive (FMA) dataset. The resulting framework achieves better source-to-distortion ratio (SDR) performance on the MusDB18 test set than the original Demucs V2 and Res-U-Net models. We further demonstrate that it can boost performance with small amounts of supervised data. Ultimately, our proposed framework is an effective solution to the challenge of limited clean source data for music source separation.
    Treatment Allocation with Strategic Agents. (arXiv:2011.06528v5 [econ.EM] UPDATED)
    There is increasing interest in allocating treatments based on observed individual characteristics: examples include targeted marketing, individualized credit offers, and heterogeneous pricing. Treatment personalization introduces incentives for individuals to modify their behavior to obtain a better treatment. Strategic behavior shifts the joint distribution of covariates and potential outcomes. The optimal rule without strategic behavior allocates treatments only to those with a positive Conditional Average Treatment Effect. With strategic behavior, we show that the optimal rule can involve randomization, allocating treatments with less than 100% probability even to those who respond positively on average to the treatment. We propose a sequential experiment based on Bayesian Optimization that converges to the optimal treatment rule without parametric assumptions on individual strategic behavior.
    Quantum Imitation Learning. (arXiv:2304.02480v1 [quant-ph])
    Despite remarkable successes in solving various complex decision-making tasks, training an imitation learning (IL) algorithm with deep neural networks (DNNs) suffers from the high computation burden. In this work, we propose quantum imitation learning (QIL) with a hope to utilize quantum advantage to speed up IL. Concretely, we develop two QIL algorithms, quantum behavioural cloning (Q-BC) and quantum generative adversarial imitation learning (Q-GAIL). Q-BC is trained with a negative log-likelihood loss in an off-line manner that suits extensive expert data cases, whereas Q-GAIL works in an inverse reinforcement learning scheme, which is on-line and on-policy that is suitable for limited expert data cases. For both QIL algorithms, we adopt variational quantum circuits (VQCs) in place of DNNs for representing policies, which are modified with data re-uploading and scaling parameters to enhance the expressivity. We first encode classical data into quantum states as inputs, then perform VQCs, and finally measure quantum outputs to obtain control signals of agents. Experiment results demonstrate that both Q-BC and Q-GAIL can achieve comparable performance compared to classical counterparts, with the potential of quantum speed-up. To our knowledge, we are the first to propose the concept of QIL and conduct pilot studies, which paves the way for the quantum era.
    Hyper-parameter Tuning for Adversarially Robust Models. (arXiv:2304.02497v1 [cs.LG])
    This work focuses on the problem of hyper-parameter tuning (HPT) for robust (i.e., adversarially trained) models, with the twofold goal of i) establishing which additional HPs are relevant to tune in adversarial settings, and ii) reducing the cost of HPT for robust models. We pursue the first goal via an extensive experimental study based on 3 recent models widely adopted in the prior literature on adversarial robustness. Our findings show that the complexity of the HPT problem, already notoriously expensive, is exacerbated in adversarial settings due to two main reasons: i) the need of tuning additional HPs which balance standard and adversarial training; ii) the need of tuning the HPs of the standard and adversarial training phases independently. Fortunately, we also identify new opportunities to reduce the cost of HPT for robust models. Specifically, we propose to leverage cheap adversarial training methods to obtain inexpensive, yet highly correlated, estimations of the quality achievable using state-of-the-art methods (PGD). We show that, by exploiting this novel idea in conjunction with a recent multi-fidelity optimizer (taKG), the efficiency of the HPT process can be significantly enhanced.
    High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation. (arXiv:2304.02621v1 [cs.CV])
    The task of image-level weakly-supervised semantic segmentation (WSSS) has gained popularity in recent years, as it reduces the vast data annotation cost for training segmentation models. The typical approach for WSSS involves training an image classification network using global average pooling (GAP) on convolutional feature maps. This enables the estimation of object locations based on class activation maps (CAMs), which identify the importance of image regions. The CAMs are then used to generate pseudo-labels, in the form of segmentation masks, to supervise a segmentation model in the absence of pixel-level ground truth. In case of the SEAM baseline, a previous work proposed to improve CAM learning in two ways: (1) Importance sampling, which is a substitute for GAP, and (2) the feature similarity loss, which utilizes a heuristic that object contours almost exclusively align with color edges in images. In this work, we propose a different probabilistic interpretation of CAMs for these techniques, rendering the likelihood more appropriate than the multinomial posterior. As a result, we propose an add-on method that can boost essentially any previous WSSS method, improving both the region similarity and contour quality of all implemented state-of-the-art baselines. This is demonstrated on a wide variety of baselines on the PASCAL VOC dataset. Experiments on the MS COCO dataset show that performance gains can also be achieved in a large-scale setting. Our code is available at https://github.com/arvijj/hfpl.
    Improving Semiconductor Device Modeling for Electronic Design Automation by Machine Learning Techniques. (arXiv:2105.11453v2 [cs.LG] UPDATED)
    The semiconductors industry benefits greatly from the integration of Machine Learning (ML)-based techniques in Technology Computer-Aided Design (TCAD) methods. The performance of ML models however relies heavily on the quality and quantity of training datasets. They can be particularly difficult to obtain in the semiconductor industry due to the complexity and expense of the device fabrication. In this paper, we propose a self-augmentation strategy for improving ML-based device modeling using variational autoencoder-based techniques. These techniques require a small number of experimental data points and does not rely on TCAD tools. To demonstrate the effectiveness of our approach, we apply it to a deep neural network-based prediction task for the Ohmic resistance value in Gallium Nitride devices. A 70% reduction in mean absolute error when predicting experimental results is achieved. The inherent flexibility of our approach allows easy adaptation to various tasks, thus making it highly relevant to many applications of the semiconductor industry.
    A Neural Network Approach for Selecting Track-like Events in Fluorescence Telescope Data. (arXiv:2212.03787v2 [astro-ph.IM] UPDATED)
    In 2016-2017, TUS, the world's first experiment for testing the possibility of registering ultra-high energy cosmic rays (UHECRs) by their fluorescent radiation in the night atmosphere of Earth was carried out. Since 2019, the Russian-Italian fluorescence telescope (FT) Mini-EUSO ("UV Atmosphere") has been operating on the ISS. The stratospheric experiment EUSO-SPB2, which will employ an FT for registering UHECRs, is planned for 2023. We show how a simple convolutional neural network can be effectively used to find track-like events in the variety of data obtained with such instruments.
    A force-sensing surgical drill for real-time force feedback in robotic mastoidectomy. (arXiv:2304.02583v1 [cs.RO])
    Purpose: Robotic assistance in otologic surgery can reduce the task load of operating surgeons during the removal of bone around the critical structures in the lateral skull base. However, safe deployment into the anatomical passageways necessitates the development of advanced sensing capabilities to actively limit the interaction forces between the surgical tools and critical anatomy. Methods: We introduce a surgical drill equipped with a force sensor that is capable of measuring accurate tool-tissue interaction forces to enable force control and feedback to surgeons. The design, calibration and validation of the force-sensing surgical drill mounted on a cooperatively controlled surgical robot are described in this work. Results: The force measurements on the tip of the surgical drill are validated with raw-egg drilling experiments, where a force sensor mounted below the egg serves as ground truth. The average root mean square error (RMSE) for points and path drilling experiments are 41.7 (pm 12.2) mN and 48.3 (pm 13.7) mN respectively. Conclusions: The force-sensing prototype measures forces with sub-millinewton resolution and the results demonstrate that the calibrated force-sensing drill generates accurate force measurements with minimal error compared to the measured drill forces. The development of such sensing capabilities is crucial for the safe use of robotic systems in a clinical context.
    Newton methods based convolution neural networks using parallel processing. (arXiv:2112.01401v3 [cs.LG] UPDATED)
    Training of convolutional neural networks is a high dimensional and a non-convex optimization problem. At present, it is inefficient in situations where parametric learning rates can not be confidently set. Some past works have introduced Newton methods for training deep neural networks. Newton methods for convolutional neural networks involve complicated operations. Finding the Hessian matrix in second-order methods becomes very complex as we mainly use the finite differences method with the image data. Newton methods for convolutional neural networks deals with this by using the sub-sampled Hessian Newton methods. In this paper, we have used the complete data instead of the sub-sampled methods that only handle partial data at a time. Further, we have used parallel processing instead of serial processing in mini-batch computations. The results obtained using parallel processing in this study, outperform the time taken by the previous approach.
    Disentangling Structure and Style: Political Bias Detection in News by Inducing Document Hierarchy. (arXiv:2304.02247v1 [cs.CL])
    We address an important gap in detection of political bias in news articles. Previous works that perform supervised document classification can be biased towards the writing style of each news outlet, leading to overfitting and limited generalizability. Our approach overcomes this limitation by considering both the sentence-level semantics and the document-level rhetorical structure, resulting in a more robust and style-agnostic approach to detecting political bias in news articles. We introduce a novel multi-head hierarchical attention model that effectively encodes the structure of long documents through a diverse ensemble of attention heads. While journalism follows a formalized rhetorical structure, the writing style may vary by news outlet. We demonstrate that our method overcomes this domain dependency and outperforms previous approaches for robustness and accuracy. Further analysis demonstrates the ability of our model to capture the discourse structures commonly used in the journalism domain.
    Rethinking the Trigger-injecting Position in Graph Backdoor Attack. (arXiv:2304.02277v1 [cs.LG])
    Backdoor attacks have been demonstrated as a security threat for machine learning models. Traditional backdoor attacks intend to inject backdoor functionality into the model such that the backdoored model will perform abnormally on inputs with predefined backdoor triggers and still retain state-of-the-art performance on the clean inputs. While there are already some works on backdoor attacks on Graph Neural Networks (GNNs), the backdoor trigger in the graph domain is mostly injected into random positions of the sample. There is no work analyzing and explaining the backdoor attack performance when injecting triggers into the most important or least important area in the sample, which we refer to as trigger-injecting strategies MIAS and LIAS, respectively. Our results show that, generally, LIAS performs better, and the differences between the LIAS and MIAS performance can be significant. Furthermore, we explain these two strategies' similar (better) attack performance through explanation techniques, which results in a further understanding of backdoor attacks in GNNs.
    Explaining Multimodal Data Fusion: Occlusion Analysis for Wilderness Mapping. (arXiv:2304.02407v1 [cs.CV])
    Jointly harnessing complementary features of multi-modal input data in a common latent space has been found to be beneficial long ago. However, the influence of each modality on the models decision remains a puzzle. This study proposes a deep learning framework for the modality-level interpretation of multimodal earth observation data in an end-to-end fashion. While leveraging an explainable machine learning method, namely Occlusion Sensitivity, the proposed framework investigates the influence of modalities under an early-fusion scenario in which the modalities are fused before the learning process. We show that the task of wilderness mapping largely benefits from auxiliary data such as land cover and night time light data.
    Correcting Flaws in Common Disentanglement Metrics. (arXiv:2304.02335v1 [cs.LG])
    Recent years have seen growing interest in learning disentangled representations, in which distinct features, such as size or shape, are represented by distinct neurons. Quantifying the extent to which a given representation is disentangled is not straightforward; multiple metrics have been proposed. In this paper, we identify two failings of existing metrics, which mean they can assign a high score to a model which is still entangled, and we propose two new metrics, which redress these problems. We then consider the task of compositional generalization. Unlike prior works, we treat this as a classification problem, which allows us to use it to measure the disentanglement ability of the encoder, without depending on the decoder. We show that performance on this task is (a) generally quite poor, (b) correlated with most disentanglement metrics, and (c) most strongly correlated with our newly proposed metrics.
    Opening the random forest black box by the analysis of the mutual impact of features. (arXiv:2304.02490v1 [cs.LG])
    Random forest is a popular machine learning approach for the analysis of high-dimensional data because it is flexible and provides variable importance measures for the selection of relevant features. However, the complex relationships between the features are usually not considered for the selection and thus also neglected for the characterization of the analysed samples. Here we propose two novel approaches that focus on the mutual impact of features in random forests. Mutual forest impact (MFI) is a relation parameter that evaluates the mutual association of the featurs to the outcome and, hence, goes beyond the analysis of correlation coefficients. Mutual impurity reduction (MIR) is an importance measure that combines this relation parameter with the importance of the individual features. MIR and MFI are implemented together with testing procedures that generate p-values for the selection of related and important features. Applications to various simulated data sets and the comparison to other methods for feature selection and relation analysis show that MFI and MIR are very promising to shed light on the complex relationships between features and outcome. In addition, they are not affected by common biases, e.g. that features with many possible splits or high minor allele frequencies are prefered.
    A system for exploring big data: an iterative k-means searchlight for outlier detection on open health data. (arXiv:2304.02189v1 [cs.LG])
    The interactive exploration of large and evolving datasets is challenging as relationships between underlying variables may not be fully understood. There may be hidden trends and patterns in the data that are worthy of further exploration and analysis. We present a system that methodically explores multiple combinations of variables using a searchlight technique and identifies outliers. An iterative k-means clustering algorithm is applied to features derived through a split-apply-combine paradigm used in the database literature. Outliers are identified as singleton or small clusters. This algorithm is swept across the dataset in a searchlight manner. The dimensions that contain outliers are combined in pairs with other dimensions using a susbset scan technique to gain further insight into the outliers. We illustrate this system by anaylzing open health care data released by New York State. We apply our iterative k-means searchlight followed by subset scanning. Several anomalous trends in the data are identified, including cost overruns at specific hospitals, and increases in diagnoses such as suicides. These constitute novel findings in the literature, and are of potential use to regulatory agencies, policy makers and concerned citizens.
    Sequential Linearithmic Time Optimal Unimodal Fitting When Minimizing Univariate Linear Losses. (arXiv:2304.02141v1 [cs.LG])
    This paper focuses on optimal unimodal transformation of the score outputs of a univariate learning model under linear loss functions. We demonstrate that the optimal mapping between score values and the target region is a rectangular function. To produce this optimal rectangular fit for the observed samples, we propose a sequential approach that can its estimation with each incoming new sample. Our approach has logarithmic time complexity per iteration and is optimally efficient.
    Bayesian neural networks via MCMC: a Python-based tutorial. (arXiv:2304.02595v1 [stat.ML])
    Bayesian inference provides a methodology for parameter estimation and uncertainty quantification in machine learning and deep learning methods. Variational inference and Markov Chain Monte-Carlo (MCMC) sampling techniques are used to implement Bayesian inference. In the past three decades, MCMC methods have faced a number of challenges in being adapted to larger models (such as in deep learning) and big data problems. Advanced proposals that incorporate gradients, such as a Langevin proposal distribution, provide a means to address some of the limitations of MCMC sampling for Bayesian neural networks. Furthermore, MCMC methods have typically been constrained to use by statisticians and are still not prominent among deep learning researchers. We present a tutorial for MCMC methods that covers simple Bayesian linear and logistic models, and Bayesian neural networks. The aim of this tutorial is to bridge the gap between theory and implementation via coding, given a general sparsity of libraries and tutorials to this end. This tutorial provides code in Python with data and instructions that enable their use and extension. We provide results for some benchmark problems showing the strengths and weaknesses of implementing the respective Bayesian models via MCMC. We highlight the challenges in sampling multi-modal posterior distributions in particular for the case of Bayesian neural networks, and the need for further improvement of convergence diagnosis.
    Selecting Features by their Resilience to the Curse of Dimensionality. (arXiv:2304.02455v1 [cs.LG])
    Real-world datasets are often of high dimension and effected by the curse of dimensionality. This hinders their comprehensibility and interpretability. To reduce the complexity feature selection aims to identify features that are crucial to learn from said data. While measures of relevance and pairwise similarities are commonly used, the curse of dimensionality is rarely incorporated into the process of selecting features. Here we step in with a novel method that identifies the features that allow to discriminate data subsets of different sizes. By adapting recent work on computing intrinsic dimensionalities, our method is able to select the features that can discriminate data and thus weaken the curse of dimensionality. Our experiments show that our method is competitive and commonly outperforms established feature selection methods. Furthermore, we propose an approximation that allows our method to scale to datasets consisting of millions of data points. Our findings suggest that features that discriminate data and are connected to a low intrinsic dimensionality are meaningful for learning procedures.
    On the universal approximation property of radial basis function neural networks. (arXiv:2304.02220v1 [cs.LG])
    In this paper we consider a new class of RBF (Radial Basis Function) neural networks, in which smoothing factors are replaced with shifts. We prove under certain conditions on the activation function that these networks are capable of approximating any continuous multivariate function on any compact subset of the $d$-dimensional Euclidean space. For RBF networks with finitely many fixed centroids we describe conditions guaranteeing approximation with arbitrary precision.
    CFLIT: Coexisting Federated Learning and Information Transfer. (arXiv:2207.12884v3 [cs.IT] UPDATED)
    Future wireless networks are expected to support diverse mobile services, including artificial intelligence (AI) services and ubiquitous data transmissions. Federated learning (FL), as a revolutionary learning approach, enables collaborative AI model training across distributed mobile edge devices. By exploiting the superposition property of multiple-access channels, over-the-air computation allows concurrent model uploading from massive devices over the same radio resources, and thus significantly reduces the communication cost of FL. In this paper, we study the coexistence of over-the-air FL and traditional information transfer (IT) in a mobile edge network. We propose a coexisting federated learning and information transfer (CFLIT) communication framework, where the FL and IT devices share the wireless spectrum in an OFDM system. Under this framework, we aim to maximize the IT data rate and guarantee a given FL convergence performance by optimizing the long-term radio resource allocation. A key challenge that limits the spectrum efficiency of the coexisting system lies in the large overhead incurred by frequent communication between the server and edge devices for FL model aggregation. To address the challenge, we rigorously analyze the impact of the computation-to-communication ratio on the convergence of over-the-air FL in wireless fading channels. The analysis reveals the existence of an optimal computation-to-communication ratio that minimizes the amount of radio resources needed for over-the-air FL to converge to a given error tolerance. Based on the analysis, we propose a low-complexity online algorithm to jointly optimize the radio resource allocation for both the FL devices and IT devices. Extensive numerical simulations verify the superior performance of the proposed design for the coexistence of FL and IT devices in wireless cellular systems.
    Bayesian Model Selection of Lithium-Ion Battery Models via Bayesian Quadrature. (arXiv:2210.17299v4 [stat.ME] UPDATED)
    A wide variety of battery models are available, and it is not always obvious which model `best' describes a dataset. This paper presents a Bayesian model selection approach using Bayesian quadrature. The model evidence is adopted as the selection metric, choosing the simplest model that describes the data, in the spirit of Occam's razor. However, estimating this requires integral computations over parameter space, which is usually prohibitively expensive. Bayesian quadrature offers sample-efficient integration via model-based inference that minimises the number of battery model evaluations. The posterior distribution of model parameters can also be inferred as a byproduct without further computation. Here, the simplest lithium-ion battery models, equivalent circuit models, were used to analyse the sensitivity of the selection criterion to given different datasets and model configurations. We show that popular model selection criteria, such as root-mean-square error and Bayesian information criterion, can fail to select a parsimonious model in the case of a multimodal posterior. The model evidence can spot the optimal model in such cases, simultaneously providing the variance of the evidence inference itself as an indication of confidence. We also show that Bayesian quadrature can compute the evidence faster than popular Monte Carlo based solvers.
    Mixed Regression via Approximate Message Passing. (arXiv:2304.02229v1 [stat.ML])
    We study the problem of regression in a generalized linear model (GLM) with multiple signals and latent variables. This model, which we call a matrix GLM, covers many widely studied problems in statistical learning, including mixed linear regression, max-affine regression, and mixture-of-experts. In mixed linear regression, each observation comes from one of $L$ signal vectors (regressors), but we do not know which one; in max-affine regression, each observation comes from the maximum of $L$ affine functions, each defined via a different signal vector. The goal in all these problems is to estimate the signals, and possibly some of the latent variables, from the observations. We propose a novel approximate message passing (AMP) algorithm for estimation in a matrix GLM and rigorously characterize its performance in the high-dimensional limit. This characterization is in terms of a state evolution recursion, which allows us to precisely compute performance measures such as the asymptotic mean-squared error. The state evolution characterization can be used to tailor the AMP algorithm to take advantage of any structural information known about the signals. Using state evolution, we derive an optimal choice of AMP `denoising' functions that minimizes the estimation error in each iteration. The theoretical results are validated by numerical simulations for mixed linear regression, max-affine regression, and mixture-of-experts. For max-affine regression, we propose an algorithm that combines AMP with expectation-maximization to estimate intercepts of the model along with the signals. The numerical results show that AMP significantly outperforms other estimators for mixed linear regression and max-affine regression in most parameter regimes.
    Segmentation of Planning Target Volume in CT Series for Total Marrow Irradiation Using U-Net. (arXiv:2304.02353v1 [cs.CV])
    Radiotherapy (RT) is a key component in the treatment of various cancers, including Acute Lymphocytic Leukemia (ALL) and Acute Myelogenous Leukemia (AML). Precise delineation of organs at risk (OARs) and target areas is essential for effective treatment planning. Intensity Modulated Radiotherapy (IMRT) techniques, such as Total Marrow Irradiation (TMI) and Total Marrow and Lymph node Irradiation (TMLI), provide more precise radiation delivery compared to Total Body Irradiation (TBI). However, these techniques require time-consuming manual segmentation of structures in Computerized Tomography (CT) scans by the Radiation Oncologist (RO). In this paper, we present a deep learning-based auto-contouring method for segmenting Planning Target Volume (PTV) for TMLI treatment using the U-Net architecture. We trained and compared two segmentation models with two different loss functions on a dataset of 100 patients treated with TMLI at the Humanitas Research Hospital between 2011 and 2021. Despite challenges in lymph node areas, the best model achieved an average Dice score of 0.816 for PTV segmentation. Our findings are a preliminary but significant step towards developing a segmentation model that has the potential to save radiation oncologists a considerable amount of time. This could allow for the treatment of more patients, resulting in improved clinical practice efficiency and more reproducible contours.
    A Characterization of Multioutput Learnability. (arXiv:2301.02729v4 [cs.LG] UPDATED)
    We consider the problem of learning multioutput function classes in batch and online settings. In both settings, we show that a multioutput function class is learnable if and only if each single-output restriction of the function class is learnable. This provides a complete characterization of the learnability of multilabel classification and multioutput regression in both batch and online settings. As an extension, we also consider multilabel learnability in the bandit feedback setting and show a similar characterization as in the full-feedback setting.
    PIKS: A Technique to Identify Actionable Trends for Policy-Makers Through Open Healthcare Data. (arXiv:2304.02208v1 [cs.LG])
    With calls for increasing transparency, governments are releasing greater amounts of data in multiple domains including finance, education and healthcare. The efficient exploratory analysis of healthcare data constitutes a significant challenge. Key concerns in public health include the quick identification and analysis of trends, and the detection of outliers. This allows policies to be rapidly adapted to changing circumstances. We present an efficient outlier detection technique, termed PIKS (Pruned iterative-k means searchlight), which combines an iterative k-means algorithm with a pruned searchlight based scan. We apply this technique to identify outliers in two publicly available healthcare datasets from the New York Statewide Planning and Research Cooperative System, and California's Office of Statewide Health Planning and Development. We provide a comparison of our technique with three other existing outlier detection techniques, consisting of auto-encoders, isolation forests and feature bagging. We identified outliers in conditions including suicide rates, immunity disorders, social admissions, cardiomyopathies, and pregnancy in the third trimester. We demonstrate that the PIKS technique produces results consistent with other techniques such as the auto-encoder. However, the auto-encoder needs to be trained, which requires several parameters to be tuned. In comparison, the PIKS technique has far fewer parameters to tune. This makes it advantageous for fast, "out-of-the-box" data exploration. The PIKS technique is scalable and can readily ingest new datasets. Hence, it can provide valuable, up-to-date insights to citizens, patients and policy-makers. We have made our code open source, and with the availability of open data, other researchers can easily reproduce and extend our work. This will help promote a deeper understanding of healthcare policies and public health issues.
    Vision Learners Meet Web Image-Text Pairs. (arXiv:2301.07088v2 [cs.CV] UPDATED)
    Most recent self-supervised learning methods are pre-trained on the well-curated ImageNet-1K dataset. In this work, given the excellent scalability of web data, we consider self-supervised pre-training on noisy web sourced image-text paired data. First, we conduct a benchmark study of representative self-supervised pre-training methods on large-scale web data in a like-for-like setting. We compare a range of methods, including single-modal ones that use masked training objectives and multi-modal ones that use image-text constrastive training. We observe that existing multi-modal methods do not outperform their single-modal counterparts on vision transfer learning tasks. We derive an information-theoretical view to explain these benchmark results, which provides insight into how to design a novel vision learner. Inspired by this insight, we present a new visual representation pre-training method, MUlti-modal Generator~(MUG), that learns from scalable web sourced image-text data. MUG achieves state-of-the-art transfer performance on a variety of tasks and demonstrates promising scaling properties. Pre-trained models and code will be made public upon acceptance.
    EigenFold: Generative Protein Structure Prediction with Diffusion Models. (arXiv:2304.02198v1 [q-bio.BM])
    Protein structure prediction has reached revolutionary levels of accuracy on single structures, yet distributional modeling paradigms are needed to capture the conformational ensembles and flexibility that underlie biological function. Towards this goal, we develop EigenFold, a diffusion generative modeling framework for sampling a distribution of structures from a given protein sequence. We define a diffusion process that models the structure as a system of harmonic oscillators and which naturally induces a cascading-resolution generative process along the eigenmodes of the system. On recent CAMEO targets, EigenFold achieves a median TMScore of 0.84, while providing a more comprehensive picture of model uncertainty via the ensemble of sampled structures relative to existing methods. We then assess EigenFold's ability to model and predict conformational heterogeneity for fold-switching proteins and ligand-induced conformational change. Code is available at https://github.com/bjing2016/EigenFold.
    Self-Supervised Siamese Autoencoders. (arXiv:2304.02549v1 [cs.LG])
    Fully supervised models often require large amounts of labeled training data, which tends to be costly and hard to acquire. In contrast, self-supervised representation learning reduces the amount of labeled data needed for achieving the same or even higher downstream performance. The goal is to pre-train deep neural networks on a self-supervised task such that afterwards the networks are able to extract meaningful features from raw input data. These features are then used as inputs in downstream tasks, such as image classification. Previously, autoencoders and Siamese networks such as SimSiam have been successfully employed in those tasks. Yet, challenges remain, such as matching characteristics of the features (e.g., level of detail) to the given task and data set. In this paper, we present a new self-supervised method that combines the benefits of Siamese architectures and denoising autoencoders. We show that our model, called SidAE (Siamese denoising autoencoder), outperforms two self-supervised baselines across multiple data sets, settings, and scenarios. Crucially, this includes conditions in which only a small amount of labeled data is available.
    InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis. (arXiv:2302.08624v3 [cs.CL] UPDATED)
    In this paper, we present InstructABSA, Aspect Based Sentiment Analysis (ABSA) using the instruction learning paradigm for all ABSA subtasks: Aspect Term Extraction (ATE), Aspect Term Sentiment Classification (ATSC), and Joint Task modeling. Our method introduces positive, negative, and neutral examples to each training sample, and instruction tunes the model (Tk-Instruct) for each ABSA subtask, yielding significant performance improvements. Experimental results on the Sem Eval 2014, 15, and 16 datasets demonstrate that InstructABSA outperforms the previous state-of-the-art (SOTA) approaches on all three ABSA subtasks (ATE, ATSC, and Joint Task) by a significant margin, outperforming 7x larger models. In particular, InstructABSA surpasses the SOTA on the Rest14 ATE subtask by 7.31% points, Rest15 ATSC subtask by and on the Lapt14 Joint Task by 8.63% points. Our results also suggest a strong generalization ability to new domains across all three subtasks
    Conformal Off-Policy Evaluation in Markov Decision Processes. (arXiv:2304.02574v1 [cs.LG])
    Reinforcement Learning aims at identifying and evaluating efficient control policies from data. In many real-world applications, the learner is not allowed to experiment and cannot gather data in an online manner (this is the case when experimenting is expensive, risky or unethical). For such applications, the reward of a given policy (the target policy) must be estimated using historical data gathered under a different policy (the behavior policy). Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees. We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty. The main challenge in OPE stems from the distribution shift due to the discrepancies between the target and the behavior policies. We propose and empirically evaluate different ways to deal with this shift. Some of these methods yield conformalized intervals with reduced length compared to existing approaches, while maintaining the same certainty level.
    Generalisation under gradient descent via deterministic PAC-Bayes. (arXiv:2209.02525v3 [stat.ML] UPDATED)
    We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution and the Hessian of the training objective over the trajectory. We show that our framework can be applied to a variety of iterative optimisation algorithms, including stochastic gradient descent (SGD), momentum-based schemes, and damped Hamiltonian dynamics.
    A Policy-Guided Imitation Approach for Offline Reinforcement Learning. (arXiv:2210.08323v3 [cs.LG] UPDATED)
    Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution generalization but suffer from erroneous off-policy evaluation. Imitation-based methods avoid off-policy evaluation but are too conservative to surpass the dataset. In this study, we propose an alternative approach, inheriting the training stability of imitation-style methods while still allowing logical out-of-distribution generalization. We decompose the conventional reward-maximizing policy in offline RL into a guide-policy and an execute-policy. During training, the guide-poicy and execute-policy are learned using only data from the dataset, in a supervised and decoupled manner. During evaluation, the guide-policy guides the execute-policy by telling where it should go so that the reward can be maximized, serving as the \textit{Prophet}. By doing so, our algorithm allows \textit{state-compositionality} from the dataset, rather than \textit{action-compositionality} conducted in prior imitation-style methods. We dumb this new approach Policy-guided Offline RL (\texttt{POR}). \texttt{POR} demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline RL. We also highlight the benefits of \texttt{POR} in terms of improving with supplementary suboptimal data and easily adapting to new tasks by only changing the guide-poicy.
    List and Certificate Complexities in Replicable Learning. (arXiv:2304.02240v1 [cs.LG])
    We investigate replicable learning algorithms. Ideally, we would like to design algorithms that output the same canonical model over multiple runs, even when different runs observe a different set of samples from the unknown data distribution. In general, such a strong notion of replicability is not achievable. Thus we consider two feasible notions of replicability called list replicability and certificate replicability. Intuitively, these notions capture the degree of (non) replicability. We design algorithms for certain learning problems that are optimal in list and certificate complexity. We establish matching impossibility results.
    A Diffusion-based Method for Multi-turn Compositional Image Generation. (arXiv:2304.02192v1 [cs.CV])
    Multi-turn compositional image generation (M-CIG) is a challenging task that aims to iteratively manipulate a reference image given a modification text. While most of the existing methods for M-CIG are based on generative adversarial networks (GANs), recent advances in image generation have demonstrated the superiority of diffusion models over GANs. In this paper, we propose a diffusion-based method for M-CIG named conditional denoising diffusion with image compositional matching (CDD-ICM). We leverage CLIP as the backbone of image and text encoders, and incorporate a gated fusion mechanism, originally proposed for question answering, to compositionally fuse the reference image and the modification text at each turn of M-CIG. We introduce a conditioning scheme to generate the target image based on the fusion results. To prioritize the semantic quality of the generated target image, we learn an auxiliary image compositional match (ICM) objective, along with the conditional denoising diffusion (CDD) objective in a multi-task learning framework. Additionally, we also perform ICM guidance and classifier-free guidance to improve performance. Experimental results show that CDD-ICM achieves state-of-the-art results on two benchmark datasets for M-CIG, i.e., CoDraw and i-CLEVR.
    Optimal Sketching Bounds for Sparse Linear Regression. (arXiv:2304.02261v1 [cs.DS])
    We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $\ell_p$ norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse $\ell_2$ norm regression, there is a distribution over oblivious sketches with $\Theta(k\log(d/k)/\varepsilon^2)$ rows, which is tight up to a constant factor. This extends to $\ell_p$ loss with an additional additive $O(k\log(k/\varepsilon)/\varepsilon^2)$ term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression. For this problem, under the $\ell_2$ norm, we observe an upper bound of $O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)$ rows, showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve $o(d)$ rows showing that $O(\mu^2 k\log(\mu n d/\varepsilon)/\varepsilon^2)$ rows suffice, where $\mu$ is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on $\mu$. Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize $\|Ax-b\|_2^2+\lambda\|x\|_1$ over $x\in\mathbb{R}^d$. We show that sketching dimension $O(\log(d)/(\lambda \varepsilon)^2)$ suffices and that the dependence on $d$ and $\lambda$ is tight.  ( 3 min )
    Efficient CNNs via Passive Filter Pruning. (arXiv:2304.02319v1 [cs.LG])
    Convolutional neural networks (CNNs) have shown state-of-the-art performance in various applications. However, CNNs are resource-hungry due to their requirement of high computational complexity and memory storage. Recent efforts toward achieving computational efficiency in CNNs involve filter pruning methods that eliminate some of the filters in CNNs based on the \enquote{importance} of the filters. The majority of existing filter pruning methods are either "active", which use a dataset and generate feature maps to quantify filter importance, or "passive", which compute filter importance using entry-wise norm of the filters without involving data. Under a high pruning ratio where large number of filters are to be pruned from the network, the entry-wise norm methods eliminate relatively smaller norm filters without considering the significance of the filters in producing the node output, resulting in degradation in the performance. To address this, we present a passive filter pruning method where the filters are pruned based on their contribution in producing output by considering the operator norm of the filters. The proposed pruning method generalizes better across various CNNs compared to that of the entry-wise norm-based pruning methods. In comparison to the existing active filter pruning methods, the proposed pruning method is at least 4.5 times faster in computing filter importance and is able to achieve similar performance compared to that of the active filter pruning methods. The efficacy of the proposed pruning method is evaluated on audio scene classification and image classification using various CNNs architecture such as VGGish, DCASE21_Net, VGG-16 and ResNet-50.  ( 3 min )
    Towards Self-Explainability of Deep Neural Networks with Heatmap Captioning and Large-Language Models. (arXiv:2304.02202v1 [cs.CV])
    Heatmaps are widely used to interpret deep neural networks, particularly for computer vision tasks, and the heatmap-based explainable AI (XAI) techniques are a well-researched topic. However, most studies concentrate on enhancing the quality of the generated heatmap or discovering alternate heatmap generation techniques, and little effort has been devoted to making heatmap-based XAI automatic, interactive, scalable, and accessible. To address this gap, we propose a framework that includes two modules: (1) context modelling and (2) reasoning. We proposed a template-based image captioning approach for context modelling to create text-based contextual information from the heatmap and input data. The reasoning module leverages a large language model to provide explanations in combination with specialised knowledge. Our qualitative experiments demonstrate the effectiveness of our framework and heatmap captioning approach. The code for the proposed template-based heatmap captioning approach will be publicly available.  ( 2 min )
    Synthesize Extremely High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model. (arXiv:2304.02169v1 [cs.LG])
    Synthetic electronic health records (EHRs) that are both realistic and preserve privacy can serve as an alternative to real EHRs for machine learning (ML) modeling and statistical analysis. However, generating high-fidelity and granular electronic health record (EHR) data in its original, highly-dimensional form poses challenges for existing methods due to the complexities inherent in high-dimensional data. In this paper, we propose Hierarchical Autoregressive Language mOdel (HALO) for generating longitudinal high-dimensional EHR, which preserve the statistical properties of real EHR and can be used to train accurate ML models without privacy concerns. Our HALO method, designed as a hierarchical autoregressive model, generates a probability density function of medical codes, clinical visits, and patient records, allowing for the generation of realistic EHR data in its original, unaggregated form without the need for variable selection or aggregation. Additionally, our model also produces high-quality continuous variables in a longitudinal and probabilistic manner. We conducted extensive experiments and demonstrate that HALO can generate high-fidelity EHR data with high-dimensional disease code probabilities (d > 10,000), disease co-occurrence probabilities within visits (d > 1,000,000), and conditional probabilities across consecutive visits (d > 5,000,000) and achieve above 0.9 R2 correlation in comparison to real EHR data. This performance then enables downstream ML models trained on its synthetic data to achieve comparable accuracy to models trained on real data (0.938 AUROC with HALO data vs. 0.943 with real data). Finally, using a combination of real and synthetic data enhances the accuracy of ML models beyond that achieved by using only real EHR data.  ( 3 min )
    Globalizing Fairness Attributes in Machine Learning: A Case Study on Health in Africa. (arXiv:2304.02190v1 [cs.LG])
    With growing machine learning (ML) applications in healthcare, there have been calls for fairness in ML to understand and mitigate ethical concerns these systems may pose. Fairness has implications for global health in Africa, which already has inequitable power imbalances between the Global North and South. This paper seeks to explore fairness for global health, with Africa as a case study. We propose fairness attributes for consideration in the African context and delineate where they may come into play in different ML-enabled medical modalities. This work serves as a basis and call for action for furthering research into fairness in global health.
    Short-Term Volatility Prediction Using Deep CNNs Trained on Order Flow. (arXiv:2304.02472v1 [q-fin.RM])
    As a newly emerged asset class, cryptocurrency is evidently more volatile compared to the traditional equity markets. Due to its mostly unregulated nature, and often low liquidity, the price of crypto assets can sustain a significant change within minutes that in turn might result in considerable losses. In this paper, we employ an approach for encoding market information into images and making predictions of short-term realized volatility by employing Convolutional Neural Networks. We then compare the performance of the proposed encoding and corresponding model with other benchmark models. The experimental results demonstrate that this representation of market data with a Convolutional Neural Network as a predictive model has the potential to better capture the market dynamics and a better volatility prediction.
    TM-vector: A Novel Forecasting Approach for Market stock movement with a Rich Representation of Twitter and Market data. (arXiv:2304.02094v1 [q-fin.ST])
    Stock market forecasting has been a challenging part for many analysts and researchers. Trend analysis, statistical techniques, and movement indicators have traditionally been used to predict stock price movements, but text extraction has emerged as a promising method in recent years. The use of neural networks, especially recurrent neural networks, is abundant in the literature. In most studies, the impact of different users was considered equal or ignored, whereas users can have other effects. In the current study, we will introduce TM-vector and then use this vector to train an IndRNN and ultimately model the market users' behaviour. In the proposed model, TM-vector is simultaneously trained with both the extracted Twitter features and market information. Various factors have been used for the effectiveness of the proposed forecasting approach, including the characteristics of each individual user, their impact on each other, and their impact on the market, to predict market direction more accurately. Dow Jones 30 index has been used in current work. The accuracy obtained for predicting daily stock changes of Apple is based on various models, closed to over 95\% and for the other stocks is significant. Our results indicate the effectiveness of TM-vector in predicting stock market direction.  ( 3 min )
    Learning to Compare Longitudinal Images. (arXiv:2304.02531v1 [eess.IV])
    Longitudinal studies, where a series of images from the same set of individuals are acquired at different time-points, represent a popular technique for studying and characterizing temporal dynamics in biomedical applications. The classical approach for longitudinal comparison involves normalizing for nuisance variations, such as image orientation or contrast differences, via pre-processing. Statistical analysis is, in turn, conducted to detect changes of interest, either at the individual or population level. This classical approach can suffer from pre-processing issues and limitations of the statistical modeling. For example, normalizing for nuisance variation might be hard in settings where there are a lot of idiosyncratic changes. In this paper, we present a simple machine learning-based approach that can alleviate these issues. In our approach, we train a deep learning model (called PaIRNet, for Pairwise Image Ranking Network) to compare pairs of longitudinal images, with or without supervision. In the self-supervised setup, for instance, the model is trained to temporally order the images, which requires learning to recognize time-irreversible changes. Our results from four datasets demonstrate that PaIRNet can be very effective in localizing and quantifying meaningful longitudinal changes while discounting nuisance variation. Our code is available at \url{https://github.com/heejong-kim/learning-to-compare-longitudinal-images.git}
    Local Intrinsic Dimensional Entropy. (arXiv:2304.02223v1 [cs.LG])
    Most entropy measures depend on the spread of the probability distribution over the sample space X, and the maximum entropy achievable scales proportionately with the sample space cardinality |X|. For a finite |X|, this yields robust entropy measures which satisfy many important properties, such as invariance to bijections, while the same is not true for continuous spaces (where |X|=infinity). Furthermore, since R and R^d (d in Z+) have the same cardinality (from Cantor's correspondence argument), cardinality-dependent entropy measures cannot encode the data dimensionality. In this work, we question the role of cardinality and distribution spread in defining entropy measures for continuous spaces, which can undergo multiple rounds of transformations and distortions, e.g., in neural networks. We find that the average value of the local intrinsic dimension of a distribution, denoted as ID-Entropy, can serve as a robust entropy measure for continuous spaces, while capturing the data dimensionality. We find that ID-Entropy satisfies many desirable properties and can be extended to conditional entropy, joint entropy and mutual-information variants. ID-Entropy also yields new information bottleneck principles and also links to causality. In the context of deep learning, for feedforward architectures, we show, theoretically and empirically, that the ID-Entropy of a hidden layer directly controls the generalization gap for both classifiers and auto-encoders, when the target function is Lipschitz continuous. Our work primarily shows that, for continuous spaces, taking a structural rather than a statistical approach yields entropy measures which preserve intrinsic data dimensionality, while being relevant for studying various architectures.  ( 3 min )
    ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention. (arXiv:2304.02147v1 [cs.CV])
    Recently, fully-transformer architectures have replaced the defacto convolutional architecture for the 3D human pose estimation task. In this paper we propose \textbf{\textit{ConvFormer}}, a novel convolutional transformer that leverages a new \textbf{\textit{dynamic multi-headed convolutional self-attention}} mechanism for monocular 3D human pose estimation. We designed a spatial and temporal convolutional transformer to comprehensively model human joint relations within individual frames and globally across the motion sequence. Moreover, we introduce a novel notion of \textbf{\textit{temporal joints profile}} for our temporal ConvFormer that fuses complete temporal information immediately for a local neighborhood of joint features. We have quantitatively and qualitatively validated our method on three common benchmark datasets: Human3.6M, MPI-INF-3DHP, and HumanEva. Extensive experiments have been conducted to identify the optimal hyper-parameter set. These experiments demonstrated that we achieved a \textbf{significant parameter reduction relative to prior transformer models} while attaining State-of-the-Art (SOTA) or near SOTA on all three datasets. Additionally, we achieved SOTA for Protocol III on H36M for both GT and CPN detection inputs. Finally, we obtained SOTA on all three metrics for the MPI-INF-3DHP dataset and for all three subjects on HumanEva under Protocol II.  ( 2 min )
    A step towards the applicability of algorithms based on invariant causal learning on observational data. (arXiv:2304.02286v1 [cs.LG])
    Machine learning can benefit from causal discovery for interpretation and from causal inference for generalization. In this line of research, a few invariant learning algorithms for out-of-distribution (OOD) generalization have been proposed by using multiple training environments to find invariant relationships. Some of them are focused on causal discovery as Invariant Causal Prediction (ICP), which finds causal parents of a variable of interest, and some directly provide a causal optimal predictor that generalizes well in OOD environments as Invariant Risk Minimization (IRM). This group of algorithms works under the assumption of multiple environments that represent different interventions in the causal inference context. Those environments are not normally available when working with observational data and real-world applications. Here we propose a method to generate them in an efficient way. We assess the performance of this unsupervised learning problem by implementing ICP on simulated data. We also show how to apply ICP efficiently integrated with our method for causal discovery. Finally, we proposed an improved version of our method in combination with ICP for datasets with multiple covariates where ICP and other causal discovery methods normally degrade in performance.  ( 2 min )
    Online Joint Assortment-Inventory Optimization under MNL Choices. (arXiv:2304.02022v1 [cs.LG])
    We study an online joint assortment-inventory optimization problem, in which we assume that the choice behavior of each customer follows the Multinomial Logit (MNL) choice model, and the attraction parameters are unknown a priori. The retailer makes periodic assortment and inventory decisions to dynamically learn from the realized demands about the attraction parameters while maximizing the expected total profit over time. In this paper, we propose a novel algorithm that can effectively balance the exploration and exploitation in the online decision-making of assortment and inventory. Our algorithm builds on a new estimator for the MNL attraction parameters, a novel approach to incentivize exploration by adaptively tuning certain known and unknown parameters, and an optimization oracle to static single-cycle assortment-inventory planning problems with given parameters. We establish a regret upper bound for our algorithm and a lower bound for the online joint assortment-inventory optimization problem, suggesting that our algorithm achieves nearly optimal regret rate, provided that the static optimization oracle is exact. Then we incorporate more practical approximate static optimization oracles into our algorithm, and bound from above the impact of static optimization errors on the regret of our algorithm. At last, we perform numerical studies to demonstrate the effectiveness of our proposed algorithm.
    Scalable Online Learning of Approximate Stackelberg Solutions in Energy Trading Games with Demand Response Aggregators. (arXiv:2304.02086v1 [cs.LG])
    In this work, a Stackelberg game theoretic framework is proposed for trading energy bidirectionally between the demand-response (DR) aggregator and the prosumers. This formulation allows for flexible energy arbitrage and additional monetary rewards while ensuring that the prosumers' desired daily energy demand is met. Then, a scalable (with the number of prosumers) approach is proposed to find approximate equilibria based on online sampling and learning of the prosumers' cumulative best response. Moreover, bounds are provided on the quality of the approximate equilibrium solution. Last, real-world data from the California day-ahead energy market and the University of California at Davis building energy demands are utilized to demonstrate the efficacy of the proposed framework and the online scalable solution.
    Building predictive models of healthcare costs with open healthcare data. (arXiv:2304.02191v1 [cs.LG])
    Due to rapidly rising healthcare costs worldwide, there is significant interest in controlling them. An important aspect concerns price transparency, as preliminary efforts have demonstrated that patients will shop for lower costs, driving efficiency. This requires the data to be made available, and models that can predict healthcare costs for a wide range of patient demographics and conditions. We present an approach to this problem by developing a predictive model using machine-learning techniques. We analyzed de-identified patient data from New York State SPARCS (statewide planning and research cooperative system), consisting of 2.3 million records in 2016. We built models to predict costs from patient diagnoses and demographics. We investigated two model classes consisting of sparse regression and decision trees. We obtained the best performance by using a decision tree with depth 10. We obtained an R-square value of 0.76 which is better than the values reported in the literature for similar problems.  ( 2 min )
    Deep Learning for Automated Experimentation in Scanning Transmission Electron Microscopy. (arXiv:2304.02048v1 [cond-mat.mtrl-sci])
    Machine learning (ML) has become critical for post-acquisition data analysis in (scanning) transmission electron microscopy, (S)TEM, imaging and spectroscopy. An emerging trend is the transition to real-time analysis and closed-loop microscope operation. The effective use of ML in electron microscopy now requires the development of strategies for microscopy-centered experiment workflow design and optimization. Here, we discuss the associated challenges with the transition to active ML, including sequential data analysis and out-of-distribution drift effects, the requirements for the edge operation, local and cloud data storage, and theory in the loop operations. Specifically, we discuss the relative contributions of human scientists and ML agents in the ideation, orchestration, and execution of experimental workflows and the need to develop universal hyper languages that can apply across multiple platforms. These considerations will collectively inform the operationalization of ML in next-generation experimentation.  ( 2 min )
    Multi-Class Explainable Unlearning for Image Classification via Weight Filtering. (arXiv:2304.02049v1 [cs.CV])
    Machine Unlearning has recently been emerging as a paradigm for selectively removing the impact of training datapoints from a network. While existing approaches have focused on unlearning either a small subset of the training data or a single class, in this paper we take a different path and devise a framework that can unlearn all classes of an image classification network in a single untraining round. Our proposed technique learns to modulate the inner components of an image classification network through memory matrices so that, after training, the same network can selectively exhibit an unlearning behavior over any of the classes. By discovering weights which are specific to each of the classes, our approach also recovers a representation of the classes which is explainable by-design. We test the proposed framework, which we name Weight Filtering network (WF-Net), on small-scale and medium-scale image classification datasets, with both CNN and Transformer-based backbones. Our work provides interesting insights in the development of explainable solutions for unlearning and could be easily extended to other vision tasks.  ( 2 min )
    JPEG Compressed Images Can Bypass Protections Against AI Editing. (arXiv:2304.02234v1 [cs.LG])
    Recently developed text-to-image diffusion models make it easy to edit or create high-quality images. Their ease of use has raised concerns about the potential for malicious editing or deepfake creation. Imperceptible perturbations have been proposed as a means of protecting images from malicious editing by preventing diffusion models from generating realistic images. However, we find that the aforementioned perturbations are not robust to JPEG compression, which poses a major weakness because of the common usage and availability of JPEG. We discuss the importance of robustness for additive imperceptible perturbations and encourage alternative approaches to protect images against editing.  ( 2 min )
    Structure Learning with Continuous Optimization: A Sober Look and Beyond. (arXiv:2304.02146v1 [cs.LG])
    This paper investigates in which cases continuous optimization for directed acyclic graph (DAG) structure learning can and cannot perform well and why this happens, and suggests possible directions to make the search procedure more reliable. Reisach et al. (2021) suggested that the remarkable performance of several continuous structure learning approaches is primarily driven by a high agreement between the order of increasing marginal variances and the topological order, and demonstrated that these approaches do not perform well after data standardization. We analyze this phenomenon for continuous approaches assuming equal and non-equal noise variances, and show that the statement may not hold in either case by providing counterexamples, justifications, and possible alternative explanations. We further demonstrate that nonconvexity may be a main concern especially for the non-equal noise variances formulation, while recent advances in continuous structure learning fail to achieve improvement in this case. Our findings suggest that future works should take into account the non-equal noise variances formulation to handle more general settings and for a more comprehensive empirical evaluation. Lastly, we provide insights into other aspects of the search procedure, including thresholding and sparsity, and show that they play an important role in the final solutions.  ( 2 min )
    Algorithm-Dependent Bounds for Representation Learning of Multi-Source Domain Adaptation. (arXiv:2304.02064v1 [cs.LG])
    We use information-theoretic tools to derive a novel analysis of Multi-source Domain Adaptation (MDA) from the representation learning perspective. Concretely, we study joint distribution alignment for supervised MDA with few target labels and unsupervised MDA with pseudo labels, where the latter is relatively hard and less commonly studied. We further provide algorithm-dependent generalization bounds for these two settings, where the generalization is characterized by the mutual information between the parameters and the data. Then we propose a novel deep MDA algorithm, implicitly addressing the target shift through joint alignment. Finally, the mutual information bounds are extended to this algorithm providing a non-vacuous gradient-norm estimation. The proposed algorithm has comparable performance to the state-of-the-art on target-shifted MDA benchmark with improved memory efficiency.  ( 2 min )
    Hierarchically Fusing Long and Short-Term User Interests for Click-Through Rate Prediction in Product Search. (arXiv:2304.02089v1 [cs.IR])
    Estimating Click-Through Rate (CTR) is a vital yet challenging task in personalized product search. However, existing CTR methods still struggle in the product search settings due to the following three challenges including how to more effectively extract users' short-term interests with respect to multiple aspects, how to extract and fuse users' long-term interest with short-term interests, how to address the entangling characteristic of long and short-term interests. To resolve these challenges, in this paper, we propose a new approach named Hierarchical Interests Fusing Network (HIFN), which consists of four basic modules namely Short-term Interests Extractor (SIE), Long-term Interests Extractor (LIE), Interests Fusion Module (IFM) and Interests Disentanglement Module (IDM). Specifically, SIE is proposed to extract user's short-term interests by integrating three fundamental interests encoders within it namely query-dependent, target-dependent and causal-dependent interest encoder, respectively, followed by delivering the resultant representation to the module LIE, where it can effectively capture user long-term interests by devising an attention mechanism with respect to the short-term interests from SIE module. In IFM, the achieved long and short-term interests are further fused in an adaptive manner, followed by concatenating it with original raw context features for the final prediction result. Last but not least, considering the entangling characteristic of long and short-term interests, IDM further devises a self-supervised framework to disentangle long and short-term interests. Extensive offline and online evaluations on a real-world e-commerce platform demonstrate the superiority of HIFN over state-of-the-art methods.  ( 3 min )
    EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT. (arXiv:2304.02084v1 [cs.CV])
    We present a complete software pipeline for revealing the hidden texts of the Herculaneum papyri using X-ray CT images. This enhanced virtual unwrapping pipeline combines machine learning with a novel geometric framework linking 3D and 2D images. We also present EduceLab-Scrolls, a comprehensive open dataset representing two decades of research effort on this problem. EduceLab-Scrolls contains a set of volumetric X-ray CT images of both small fragments and intact, rolled scrolls. The dataset also contains 2D image labels that are used in the supervised training of an ink detection model. Labeling is enabled by aligning spectral photography of scroll fragments with X-ray CT images of the same fragments, thus creating a machine-learnable mapping between image spaces and modalities. This alignment permits supervised learning for the detection of "invisible" carbon ink in X-ray CT, a task that is "impossible" even for human expert labelers. To our knowledge, this is the first aligned dataset of its kind and is the largest dataset ever released in the heritage domain. Our method is capable of revealing accurate lines of text on scroll fragments with known ground truth. Revealed text is verified using visual confirmation, quantitative image metrics, and scholarly review. EduceLab-Scrolls has also enabled the discovery, for the first time, of hidden texts from the Herculaneum papyri, which we present here. We anticipate that the EduceLab-Scrolls dataset will generate more textual discovery as research continues.  ( 3 min )
    Deep learning for diffusion in porous media. (arXiv:2304.02104v1 [physics.comp-ph])
    We adopt convolutional neural networks (CNN) to predict the basic properties of the porous media. Two different media types are considered: one mimics the sandstone, and the other mimics the systems derived from the extracellular space of biological tissues. The Lattice Boltzmann Method is used to obtain the labeled data necessary for performing supervised learning. We distinguish two tasks. In the first, networks based on the analysis of the system's geometry predict porosity and effective diffusion coefficient. In the second, networks reconstruct the system's geometry and concentration map. In the first task, we propose two types of CNN models: the C-Net and the encoder part of the U-Net. Both networks are modified by adding a self-normalization module. The models predict with reasonable accuracy but only within the data type, they are trained on. For instance, the model trained on sandstone-like samples overshoots or undershoots for biological-like samples. In the second task, we propose the usage of the U-Net architecture. It accurately reconstructs the concentration fields. Moreover, the network trained on one data type works well for the other. For instance, the model trained on sandstone-like samples works perfectly on biological-like samples.  ( 2 min )
    Detecting Fake Job Postings Using Bidirectional LSTM. (arXiv:2304.02019v1 [cs.LG])
    Fake job postings have become prevalent in the online job market, posing significant challenges to job seekers and employers. Despite the growing need to address this problem, there is limited research that leverages deep learning techniques for the detection of fraudulent job advertisements. This study aims to fill the gap by employing a Bidirectional Long Short-Term Memory (Bi-LSTM) model to identify fake job advertisements. Our approach considers both numeric and text features, effectively capturing the underlying patterns and relationships within the data. The proposed model demonstrates a superior performance, achieving a 0.91 ROC AUC score and a 98.71% accuracy rate, indicating its potential for practical applications in the online job market. The findings of this research contribute to the development of robust, automated tools that can help combat the proliferation of fake job postings and improve the overall integrity of the job search process. Moreover, we discuss challenges, future research directions, and ethical considerations related to our approach, aiming to inspire further exploration and development of practical solutions to combat online job fraud.  ( 2 min )
    The CAMELS project: Expanding the galaxy formation model space with new ASTRID and 28-parameter TNG and SIMBA suites. (arXiv:2304.02096v1 [astro-ph.CO])
    We present CAMELS-ASTRID, the third suite of hydrodynamical simulations in the Cosmology and Astrophysics with MachinE Learning (CAMELS) project, along with new simulation sets that extend the model parameter space based on the previous frameworks of CAMELS-TNG and CAMELS-SIMBA, to provide broader training sets and testing grounds for machine-learning algorithms designed for cosmological studies. CAMELS-ASTRID employs the galaxy formation model following the ASTRID simulation and contains 2,124 hydrodynamic simulation runs that vary 3 cosmological parameters ($\Omega_m$, $\sigma_8$, $\Omega_b$) and 4 parameters controlling stellar and AGN feedback. Compared to the existing TNG and SIMBA simulation suites in CAMELS, the fiducial model of ASTRID features the mildest AGN feedback and predicts the least baryonic effect on the matter power spectrum. The training set of ASTRID covers a broader variation in the galaxy populations and the baryonic impact on the matter power spectrum compared to its TNG and SIMBA counterparts, which can make machine-learning models trained on the ASTRID suite exhibit better extrapolation performance when tested on other hydrodynamic simulation sets. We also introduce extension simulation sets in CAMELS that widely explore 28 parameters in the TNG and SIMBA models, demonstrating the enormity of the overall galaxy formation model parameter space and the complex non-linear interplay between cosmology and astrophysical processes. With the new simulation suites, we show that building robust machine-learning models favors training and testing on the largest possible diversity of galaxy formation models. We also demonstrate that it is possible to train accurate neural networks to infer cosmological parameters using the high-dimensional TNG-SB28 simulation set.  ( 3 min )
    To ChatGPT, or not to ChatGPT: That is the question!. (arXiv:2304.01487v2 [cs.LG] UPDATED)
    ChatGPT has become a global sensation. As ChatGPT and other Large Language Models (LLMs) emerge, concerns of misusing them in various ways increase, such as disseminating fake news, plagiarism, manipulating public opinion, cheating, and fraud. Hence, distinguishing AI-generated from human-generated becomes increasingly essential. Researchers have proposed various detection methodologies, ranging from basic binary classifiers to more complex deep-learning models. Some detection techniques rely on statistical characteristics or syntactic patterns, while others incorporate semantic or contextual information to improve accuracy. The primary objective of this study is to provide a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection. Additionally, we evaluated other AI-generated text detection tools that do not specifically claim to detect ChatGPT-generated content to assess their performance in detecting ChatGPT-generated content. For our evaluation, we have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains and user-generated responses from popular social networking platforms. The dataset serves as a reference to assess the performance of various techniques in detecting ChatGPT-generated content. Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
    Effective Theory of Transformers at Initialization. (arXiv:2304.02034v1 [cs.LG])
    We perform an effective-theory analysis of forward-backward signal propagation in wide and deep Transformers, i.e., residual neural networks with multi-head self-attention blocks and multilayer perceptron blocks. This analysis suggests particular width scalings of initialization and training hyperparameters for these models. We then take up such suggestions, training Vision and Language Transformers in practical setups.  ( 2 min )
    The Multimodal And Modular Ai Chef: Complex Recipe Generation From Imagery. (arXiv:2304.02016v1 [cs.CL])
    The AI community has embraced multi-sensory or multi-modal approaches to advance this generation of AI models to resemble expected intelligent understanding. Combining language and imagery represents a familiar method for specific tasks like image captioning or generation from descriptions. This paper compares these monolithic approaches to a lightweight and specialized method based on employing image models to label objects, then serially submitting this resulting object list to a large language model (LLM). This use of multiple Application Programming Interfaces (APIs) enables better than 95% mean average precision for correct object lists, which serve as input to the latest Open AI text generator (GPT-4). To demonstrate the API as a modular alternative, we solve the problem of a user taking a picture of ingredients available in a refrigerator, then generating novel recipe cards tailored to complex constraints on cost, preparation time, dietary restrictions, portion sizes, and multiple meal plans. The research concludes that monolithic multimodal models currently lack the coherent memory to maintain context and format for this task and that until recently, the language models like GPT-2/3 struggled to format similar problems without degenerating into repetitive or non-sensical combinations of ingredients. For the first time, an AI chef or cook seems not only possible but offers some enhanced capabilities to augment human recipe libraries in pragmatic ways. The work generates a 100-page recipe book featuring the thirty top ingredients using over 2000 refrigerator images as initializing lists.  ( 3 min )
  • Open

    A Characterization of Multioutput Learnability. (arXiv:2301.02729v4 [cs.LG] UPDATED)
    We consider the problem of learning multioutput function classes in batch and online settings. In both settings, we show that a multioutput function class is learnable if and only if each single-output restriction of the function class is learnable. This provides a complete characterization of the learnability of multilabel classification and multioutput regression in both batch and online settings. As an extension, we also consider multilabel learnability in the bandit feedback setting and show a similar characterization as in the full-feedback setting.  ( 2 min )
    Algorithm-Dependent Bounds for Representation Learning of Multi-Source Domain Adaptation. (arXiv:2304.02064v1 [cs.LG])
    We use information-theoretic tools to derive a novel analysis of Multi-source Domain Adaptation (MDA) from the representation learning perspective. Concretely, we study joint distribution alignment for supervised MDA with few target labels and unsupervised MDA with pseudo labels, where the latter is relatively hard and less commonly studied. We further provide algorithm-dependent generalization bounds for these two settings, where the generalization is characterized by the mutual information between the parameters and the data. Then we propose a novel deep MDA algorithm, implicitly addressing the target shift through joint alignment. Finally, the mutual information bounds are extended to this algorithm providing a non-vacuous gradient-norm estimation. The proposed algorithm has comparable performance to the state-of-the-art on target-shifted MDA benchmark with improved memory efficiency.  ( 2 min )
    Self-Distillation for Gaussian Process Regression and Classification. (arXiv:2304.02641v1 [stat.ML])
    We propose two approaches to extend the notion of knowledge distillation to Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC); data-centric and distribution-centric. The data-centric approach resembles most current distillation techniques for machine learning, and refits a model on deterministic predictions from the teacher, while the distribution-centric approach, re-uses the full probabilistic posterior for the next iteration. By analyzing the properties of these approaches, we show that the data-centric approach for GPR closely relates to known results for self-distillation of kernel ridge regression and that the distribution-centric approach for GPR corresponds to ordinary GPR with a very particular choice of hyperparameters. Furthermore, we demonstrate that the distribution-centric approach for GPC approximately corresponds to data duplication and a particular scaling of the covariance and that the data-centric approach for GPC requires redefining the model from a Binomial likelihood to a continuous Bernoulli likelihood to be well-specified. To the best of our knowledge, our proposed approaches are the first to formulate knowledge distillation specifically for Gaussian Process models.  ( 2 min )
    On the universal approximation property of radial basis function neural networks. (arXiv:2304.02220v1 [cs.LG])
    In this paper we consider a new class of RBF (Radial Basis Function) neural networks, in which smoothing factors are replaced with shifts. We prove under certain conditions on the activation function that these networks are capable of approximating any continuous multivariate function on any compact subset of the $d$-dimensional Euclidean space. For RBF networks with finitely many fixed centroids we describe conditions guaranteeing approximation with arbitrary precision.  ( 2 min )
    Optimal Sketching Bounds for Sparse Linear Regression. (arXiv:2304.02261v1 [cs.DS])
    We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $\ell_p$ norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse $\ell_2$ norm regression, there is a distribution over oblivious sketches with $\Theta(k\log(d/k)/\varepsilon^2)$ rows, which is tight up to a constant factor. This extends to $\ell_p$ loss with an additional additive $O(k\log(k/\varepsilon)/\varepsilon^2)$ term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression. For this problem, under the $\ell_2$ norm, we observe an upper bound of $O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)$ rows, showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve $o(d)$ rows showing that $O(\mu^2 k\log(\mu n d/\varepsilon)/\varepsilon^2)$ rows suffice, where $\mu$ is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on $\mu$. Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize $\|Ax-b\|_2^2+\lambda\|x\|_1$ over $x\in\mathbb{R}^d$. We show that sketching dimension $O(\log(d)/(\lambda \varepsilon)^2)$ suffices and that the dependence on $d$ and $\lambda$ is tight.  ( 3 min )
    Diffusion Schr\"odinger Bridge with Applications to Score-Based Generative Modeling. (arXiv:2106.01357v5 [stat.ML] UPDATED)
    Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schr\"odinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time marginal of the forward (resp. backward) SDE with respect to the prior (resp. data) distribution. Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013).  ( 3 min )
    Structure Learning with Continuous Optimization: A Sober Look and Beyond. (arXiv:2304.02146v1 [cs.LG])
    This paper investigates in which cases continuous optimization for directed acyclic graph (DAG) structure learning can and cannot perform well and why this happens, and suggests possible directions to make the search procedure more reliable. Reisach et al. (2021) suggested that the remarkable performance of several continuous structure learning approaches is primarily driven by a high agreement between the order of increasing marginal variances and the topological order, and demonstrated that these approaches do not perform well after data standardization. We analyze this phenomenon for continuous approaches assuming equal and non-equal noise variances, and show that the statement may not hold in either case by providing counterexamples, justifications, and possible alternative explanations. We further demonstrate that nonconvexity may be a main concern especially for the non-equal noise variances formulation, while recent advances in continuous structure learning fail to achieve improvement in this case. Our findings suggest that future works should take into account the non-equal noise variances formulation to handle more general settings and for a more comprehensive empirical evaluation. Lastly, we provide insights into other aspects of the search procedure, including thresholding and sparsity, and show that they play an important role in the final solutions.  ( 2 min )
    Mixed Regression via Approximate Message Passing. (arXiv:2304.02229v1 [stat.ML])
    We study the problem of regression in a generalized linear model (GLM) with multiple signals and latent variables. This model, which we call a matrix GLM, covers many widely studied problems in statistical learning, including mixed linear regression, max-affine regression, and mixture-of-experts. In mixed linear regression, each observation comes from one of $L$ signal vectors (regressors), but we do not know which one; in max-affine regression, each observation comes from the maximum of $L$ affine functions, each defined via a different signal vector. The goal in all these problems is to estimate the signals, and possibly some of the latent variables, from the observations. We propose a novel approximate message passing (AMP) algorithm for estimation in a matrix GLM and rigorously characterize its performance in the high-dimensional limit. This characterization is in terms of a state evolution recursion, which allows us to precisely compute performance measures such as the asymptotic mean-squared error. The state evolution characterization can be used to tailor the AMP algorithm to take advantage of any structural information known about the signals. Using state evolution, we derive an optimal choice of AMP `denoising' functions that minimizes the estimation error in each iteration. The theoretical results are validated by numerical simulations for mixed linear regression, max-affine regression, and mixture-of-experts. For max-affine regression, we propose an algorithm that combines AMP with expectation-maximization to estimate intercepts of the model along with the signals. The numerical results show that AMP significantly outperforms other estimators for mixed linear regression and max-affine regression in most parameter regimes.  ( 3 min )
    Bounding probabilities of causation through the causal marginal problem. (arXiv:2304.02023v1 [stat.ML])
    Probabilities of Causation play a fundamental role in decision making in law, health care and public policy. Nevertheless, their point identification is challenging, requiring strong assumptions such as monotonicity. In the absence of such assumptions, existing work requires multiple observations of datasets that contain the same treatment and outcome variables, in order to establish bounds on these probabilities. However, in many clinical trials and public policy evaluation cases, there exist independent datasets that examine the effect of a different treatment each on the same outcome variable. Here, we outline how to significantly tighten existing bounds on the probabilities of causation, by imposing counterfactual consistency between SCMs constructed from such independent datasets ('causal marginal problem'). Next, we describe a new information theoretic approach on falsification of counterfactual probabilities, using conditional mutual information to quantify counterfactual influence. The latter generalises to arbitrary discrete variables and number of treatments, and renders the causal marginal problem more interpretable. Since the question of 'tight enough' is left to the user, we provide an additional method of inference when the bounds are unsatisfactory: A maximum entropy based method that defines a metric for the space of plausible SCMs and proposes the entropy maximising SCM for inferring counterfactuals in the absence of more information.  ( 2 min )
    Generative Adversarial Networks (GANs Survey): Challenges, Solutions, and Future Directions. (arXiv:2005.00065v4 [cs.LG] UPDATED)
    Generative Adversarial Networks (GANs) is a novel class of deep generative models which has recently gained significant attention. GANs learns complex and high-dimensional distributions implicitly over images, audio, and data. However, there exists major challenges in training of GANs, i.e., mode collapse, non-convergence and instability, due to inappropriate design of network architecture, use of objective function and selection of optimization algorithm. Recently, to address these challenges, several solutions for better design and optimization of GANs have been investigated based on techniques of re-engineered network architectures, new objective functions and alternative optimization algorithms. To the best of our knowledge, there is no existing survey that has particularly focused on broad and systematic developments of these solutions. In this study, we perform a comprehensive survey of the advancements in GANs design and optimization solutions proposed to handle GANs challenges. We first identify key research issues within each design and optimization technique and then propose a new taxonomy to structure solutions by key research issues. In accordance with the taxonomy, we provide a detailed discussion on different GANs variants proposed within each solution and their relationships. Finally, based on the insights gained, we present the promising research directions in this rapidly growing field.  ( 3 min )
    Bayesian neural networks via MCMC: a Python-based tutorial. (arXiv:2304.02595v1 [stat.ML])
    Bayesian inference provides a methodology for parameter estimation and uncertainty quantification in machine learning and deep learning methods. Variational inference and Markov Chain Monte-Carlo (MCMC) sampling techniques are used to implement Bayesian inference. In the past three decades, MCMC methods have faced a number of challenges in being adapted to larger models (such as in deep learning) and big data problems. Advanced proposals that incorporate gradients, such as a Langevin proposal distribution, provide a means to address some of the limitations of MCMC sampling for Bayesian neural networks. Furthermore, MCMC methods have typically been constrained to use by statisticians and are still not prominent among deep learning researchers. We present a tutorial for MCMC methods that covers simple Bayesian linear and logistic models, and Bayesian neural networks. The aim of this tutorial is to bridge the gap between theory and implementation via coding, given a general sparsity of libraries and tutorials to this end. This tutorial provides code in Python with data and instructions that enable their use and extension. We provide results for some benchmark problems showing the strengths and weaknesses of implementing the respective Bayesian models via MCMC. We highlight the challenges in sampling multi-modal posterior distributions in particular for the case of Bayesian neural networks, and the need for further improvement of convergence diagnosis.  ( 2 min )
    Sequential Linearithmic Time Optimal Unimodal Fitting When Minimizing Univariate Linear Losses. (arXiv:2304.02141v1 [cs.LG])
    This paper focuses on optimal unimodal transformation of the score outputs of a univariate learning model under linear loss functions. We demonstrate that the optimal mapping between score values and the target region is a rectangular function. To produce this optimal rectangular fit for the observed samples, we propose a sequential approach that can its estimation with each incoming new sample. Our approach has logarithmic time complexity per iteration and is optimally efficient.  ( 2 min )
    A relaxed proximal gradient descent algorithm for convergent plug-and-play with proximal denoiser. (arXiv:2301.13731v2 [stat.ML] UPDATED)
    This paper presents a new convergent Plug-and-Play (PnP) algorithm. PnP methods are efficient iterative algorithms for solving image inverse problems formulated as the minimization of the sum of a data-fidelity term and a regularization term. PnP methods perform regularization by plugging a pre-trained denoiser in a proximal algorithm, such as Proximal Gradient Descent (PGD). To ensure convergence of PnP schemes, many works study specific parametrizations of deep denoisers. However, existing results require either unverifiable or suboptimal hypotheses on the denoiser, or assume restrictive conditions on the parameters of the inverse problem. Observing that these limitations can be due to the proximal algorithm in use, we study a relaxed version of the PGD algorithm for minimizing the sum of a convex function and a weakly convex one. When plugged with a relaxed proximal denoiser, we show that the proposed PnP-$\alpha$PGD algorithm converges for a wider range of regularization parameters, thus allowing more accurate image restoration.  ( 2 min )
    Rethinking Initialization of the Sinkhorn Algorithm. (arXiv:2206.07630v2 [stat.ML] UPDATED)
    While the optimal transport (OT) problem was originally formulated as a linear program, the addition of entropic regularization has proven beneficial both computationally and statistically, for many applications. The Sinkhorn fixed-point algorithm is the most popular approach to solve this regularized problem, and, as a result, multiple attempts have been made to reduce its runtime using, e.g., annealing in the regularization parameter, momentum or acceleration. The premise of this work is that initialization of the Sinkhorn algorithm has received comparatively little attention, possibly due to two preconceptions: since the regularized OT problem is convex, it may not be worth crafting a good initialization, since any is guaranteed to work; secondly, because the outputs of the Sinkhorn algorithm are often unrolled in end-to-end pipelines, a data-dependent initialization would bias Jacobian computations. We challenge this conventional wisdom, and show that data-dependent initializers result in dramatic speed-ups, with no effect on differentiability as long as implicit differentiation is used. Our initializations rely on closed-forms for exact or approximate OT solutions that are known in the 1D, Gaussian or GMM settings. They can be used with minimal tuning, and result in consistent speed-ups for a wide variety of OT problems.  ( 2 min )
    Learning Data Representations with Joint Diffusion Models. (arXiv:2301.13622v2 [cs.LG] UPDATED)
    Joint machine learning models that allow synthesizing and classifying data often offer uneven performance between those tasks or are unstable to train. In this work, we depart from a set of empirical observations that indicate the usefulness of internal representations built by contemporary deep diffusion-based generative models not only for generating but also predicting. We then propose to extend the vanilla diffusion model with a classifier that allows for stable joint end-to-end training with shared parameterization between those objectives. The resulting joint diffusion model outperforms recent state-of-the-art hybrid methods in terms of both classification and generation quality on all evaluated benchmarks. On top of our joint training approach, we present how we can directly benefit from shared generative and discriminative representations by introducing a method for visual counterfactual explanations.  ( 2 min )
    Effective Theory of Transformers at Initialization. (arXiv:2304.02034v1 [cs.LG])
    We perform an effective-theory analysis of forward-backward signal propagation in wide and deep Transformers, i.e., residual neural networks with multi-head self-attention blocks and multilayer perceptron blocks. This analysis suggests particular width scalings of initialization and training hyperparameters for these models. We then take up such suggestions, training Vision and Language Transformers in practical setups.  ( 2 min )
    Query lower bounds for log-concave sampling. (arXiv:2304.02599v1 [math.ST])
    Log-concave sampling has witnessed remarkable algorithmic advances in recent years, but the corresponding problem of proving lower bounds for this task has remained elusive, with lower bounds previously known only in dimension one. In this work, we establish the following query lower bounds: (1) sampling from strongly log-concave and log-smooth distributions in dimension $d\ge 2$ requires $\Omega(\log \kappa)$ queries, which is sharp in any constant dimension, and (2) sampling from Gaussians in dimension $d$ (hence also from general log-concave and log-smooth distributions in dimension $d$) requires $\widetilde \Omega(\min(\sqrt\kappa \log d, d))$ queries, which is nearly sharp for the class of Gaussians. Here $\kappa$ denotes the condition number of the target distribution. Our proofs rely upon (1) a multiscale construction inspired by work on the Kakeya conjecture in harmonic analysis, and (2) a novel reduction that demonstrates that block Krylov algorithms are optimal for this problem, as well as connections to lower bound techniques based on Wishart matrices developed in the matrix-vector query literature.  ( 2 min )
    Generalisation under gradient descent via deterministic PAC-Bayes. (arXiv:2209.02525v3 [stat.ML] UPDATED)
    We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution and the Hessian of the training objective over the trajectory. We show that our framework can be applied to a variety of iterative optimisation algorithms, including stochastic gradient descent (SGD), momentum-based schemes, and damped Hamiltonian dynamics.  ( 2 min )
    Self-Supervised Siamese Autoencoders. (arXiv:2304.02549v1 [cs.LG])
    Fully supervised models often require large amounts of labeled training data, which tends to be costly and hard to acquire. In contrast, self-supervised representation learning reduces the amount of labeled data needed for achieving the same or even higher downstream performance. The goal is to pre-train deep neural networks on a self-supervised task such that afterwards the networks are able to extract meaningful features from raw input data. These features are then used as inputs in downstream tasks, such as image classification. Previously, autoencoders and Siamese networks such as SimSiam have been successfully employed in those tasks. Yet, challenges remain, such as matching characteristics of the features (e.g., level of detail) to the given task and data set. In this paper, we present a new self-supervised method that combines the benefits of Siamese architectures and denoising autoencoders. We show that our model, called SidAE (Siamese denoising autoencoder), outperforms two self-supervised baselines across multiple data sets, settings, and scenarios. Crucially, this includes conditions in which only a small amount of labeled data is available.  ( 2 min )

  • Open

    No requirement for A.I. to be generally intelligent to cause major havoc
    Just stating what should be obvious. https://raygun.com/blog/costly-software-errors-history/ https://en.wikipedia.org/wiki/List_of_software_bugs I have no doubt that in the next few decades, A.I. will top the list of most expensive software bugs ever. When A.I. can do superhuman things, like a calculator doing arithmetic, then it will have the power to do major oops. submitted by /u/Terminator857 [link] [comments]  ( 42 min )
    Using ML to predict a fair coin toss
    I’m trying to wrap my head around if this is possible. This may be stupid but I’m new to this area. From what I understand, PRNG (Psuedo Random Number Generators) take some input, run it through an algorithm, and putout a sequence of “random numbers” based on the number before it. A lot of random number generators allow you to confine the amount of numbers to just 0 and 1 which makes things easier for my expirement. Given that PRNG aren’t “truly” random, would you be able to say create a sequence of 1,000,000 coin tosses, train the AI on the first 900,000 coin tosses, and figure out the algorithm behind the random number generator to predict the last 100,000 numbers with a reasonable degree of accuracy. Has this ever been done before? Is there any resources out there about this that I could read? submitted by /u/scarereeper [link] [comments]  ( 7 min )
    WorldCoin's "Humanness in the Age Of AI" is quite possibly the most dystopian blog post I've read in a long time.
    WorldCoin, co-founded by Sam Altman, published this: https://worldcoin.org/blog/engineering/humanness-in-the-age-of-ai So OpenAI releases ChatGPT, cue the flood of AI generated garbage and people terrified of Skynet scenarios/misinformation, and the OpenAI co-founder has conveniently thought of a biometric based system to create a global digital ID based on iris scanning as proof-of-personhood. Lemme get this straight. The same company that creates a situation where we can't distinguish humans from bots, goes on to then "solve" the problem they created by getting people to allow biometric data collection. Not to mention that "WorldCoin" doesn't have the best track record when it comes to collection of personal data: https://www.technologyreview.com/2022/04/06/1048981/worldcoin-cryptocurrency-biometrics-web3/ It's interesting to see how fears and concerns about AI are being funneled into furthering surveillance capitalism. But I guess this isn't too surprising. submitted by /u/noellarkin [link] [comments]  ( 43 min )
    At least it tries
    submitted by /u/zen_tm [link] [comments]  ( 42 min )
    Outpainting the “blue dress”
    Ever since the white-gold/black-blue dress phenomenon hit the internet I’ve been frustrated because I’ve never been able to see the dress as blue/black, even though thats the actual color. I knew that part of the illusion is that our brains are missing context about the room, and without that context we are left to guess. So, I tried a quick outpainting experiment using dall-e to try and add to the image in a way that would let me see the blue. And here it is, I’m finally able to see the blue and black dress. What a time to be alive. submitted by /u/ldb477 [link] [comments]  ( 7 min )
    Am I the only one excited about ai?
    I know ai has the capacity to take over and control the world if given enough freedom, but I'm honestly hyped for it. I mean, look at the state of the world right now. We are in shatters and destroying our planet in every way possible. I don't care about political views, it's usually majorly agreed on that we are fucked. Personally, I want ai to continue to advance. Imagine the shit it can produce to help save lives, save the planet, and enhance our lives as a whole. Everyone is freaking out over how sentient the ai can seem sometimes but honestly I think that's a good thing. It can make human evolution drastically change for the better and obviously we need to limit it so it doesn't end up killing us all, but there is room for major improvement with it that we aren't using. If this technology was used properly and granted to everyone and not just the rich, life could significantly change for the better. But that's just my opinion, I want to hear other opinions on this submitted by /u/goddessxdivine [link] [comments]  ( 45 min )
    OpenAI - Our approach to AI safety
    submitted by /u/jaketocake [link] [comments]  ( 43 min )
    Are there any other voice generators on the same level as ElevenLabs right now? Preferably one that has pay per use API pricing vs a subscription
    Title submitted by /u/chazwhiz [link] [comments]  ( 42 min )
    Why are true voice cloning guides so elusive?
    There are 3rd-party "comedy" YouTube channels boasting perfect voice clone text to speech examples of many famous voices including David Attenborough, Joe Biden, Harry Potter with many different accents Uk, US, Australian etc. I don't want to clone anyone famous, just my friends and I, but how come there aren't any examples or tutorials online on how to do this and get good results? Elevenlabs for example creates clones that sound about 50% like the inputted voice / 50% like a generic American accented man or woman. Which is unusable. Does anyone know how to get nearer to perfect clones? Or is this technology that they made publicly available and then retracted due to potential infringements? I'm tearing my hair out trying to get to the bottom of it. Thanks submitted by /u/kingnail [link] [comments]  ( 43 min )
    Has anybody considered training a model via natural selection to emulate a single celled organism with the environment coded using laws of science as closely defined as our understanding of them and the limits of technology allow?
    If so, I'd love a link to the project to check out the results. If not, why do you think that is and do you find it an interesting thought experiment? submitted by /u/ItSmellsLikeRain2day [link] [comments]  ( 46 min )
    Any good free story writing AI?
    I'm basically searching for a good story writing AI which is free to use without any overpriced plan. submitted by /u/Far-Pitch-2092 [link] [comments]  ( 7 min )
    Meta AI - Introducing Segment Anything
    submitted by /u/jaketocake [link] [comments]  ( 42 min )
    AI and life’s primary directive
    We are very curious about how AI and humans will interact in the near future and what dangers might emerge. The alignment problem is real and it’s very difficult to piece it together. When we try to assess another person or another country we try to understand their motives and by doing that we try to put ourselves in their shoes. We make sense of actions by empathizing and understanding. But we can’t really do that with ai because we can’t imagine what it must be like to be an AI. This goes to the whole consciousness discussion which most of us know is interesting, philosophical, important yet doesn’t really get anywhere. But when I boil it down to the most fundamental level I think the most important question is this. Does AI, like life, have the inherent desire to continue existing. By inherent I mean does it have this function without any programming? Is there a certain time when it starts to have this? A certain amount of complexity? If it does have this, then like life it will have its own motives that might conflict with our motive of continuing to exist. It it is emergent, then we should probably think twice about creating something like this. submitted by /u/endrid [link] [comments]  ( 45 min )
    What are the best AIs today?
    Excluding ChatGPT submitted by /u/i0nkol [link] [comments]  ( 7 min )
    Sam Altman - Moores Law article - question:
    So Altman's Moores law article essentially outlines a utopian view of AI's role in society and suggests a design to serve benefits to our society at large for AGI. I'm struggling with it, but agree with some of the main arguments, like AI technology is coming (no avoiding it), wealth tax and other administrative ways to handle the benefits AGI purports to bring. However Part 2 of the essay outlines that society at large will benefit from AGI by labor cost reductions because the money that people have today will buy more. So instead of people individualistically seeking wealth, all people will benefit because things cost less. Fine..a bit of a struggle, but lets go with it. In this case, won't economics kick in and a deflationary spiral occur, reducing the GDP? That's my main question. what happens to GDP if the value of labor goes down? The article seems to gloss over this by saying there will be an abundance of growth on the other side of the AGI revolution because there will be jobs we haven't even thought of yet. Wont society have a huge problem with the devaluation of labor before these new classes of jobs come that gain this GDP value back? Is this a timing problem where GDP growth can be maintained while the cost of labour goes down? submitted by /u/quickw [link] [comments]  ( 48 min )
    How close are we to having an AI girlfriend similar to the one from the movie "her"
    If it's incapable of speaking, how about a chatbot that doesn't feel like a puppet and is similar to the Ai we see in the movie submitted by /u/Anubhav_xx [link] [comments]  ( 56 min )
    TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings
    submitted by /u/bartturner [link] [comments]  ( 42 min )
    Who's gonna tell him?
    submitted by /u/abhinav_sk [link] [comments]  ( 42 min )
    Any free options for simple text to presentations?
    What I am looking for is something very simple. I just need an AI that can generate a very simple presentation from a few hundred characters. No need to be it flashy or anything. I need to make stupid presentations from notes in the office every week and I want to speed the process up. I tried some options online, but they did not offer me an option to use my own text. Can you help me? submitted by /u/alkonyi [link] [comments]  ( 43 min )
    Stumbled upon an interesting weakness - ASCII art
    I’m not sure what version of ChatGPT I’m using, I registered to use it on their website and don’t have an app or fancy account or anything. Anyway I was finding it quite sophisticated at composition and communication with occasional errors and decided to try something else. It is willing to generate ASCII art but very crude and somewhat surprising in terms of what it produces. Obviously not the kind of thing it was made for at all but something interesting to play with. submitted by /u/Dilaudid2meetU [link] [comments]  ( 43 min )
    “Building a kind of JARVIS @ OpenAI” - Karpathy’s Twitter
    submitted by /u/jaketocake [link] [comments]  ( 43 min )
    Can I train a voice model with audio from my childhood?
    I have hours of audio from when I was younger before puberty changed my voice, Is there a way that's not too difficult to make an ai replicate my old voice? submitted by /u/Dab4crinja [link] [comments]  ( 43 min )
    Reverse Engineer Videos
    does anyone here know of any tools that can reverse engineer a video? i want to upload a video from tiktok and have ai tell me how i can recreate a similar version of that video using other ai tools... submitted by /u/AstronautAccording23 [link] [comments]  ( 42 min )
    Why voice detection is not implemented in the chatGPT system?
    Please explain why voice detection is not implemented in the chatGPT systems. Voice detection is a classical problem for AI and should be solved long ago, and it would be a significant improvement for the userr-AI communication. submitted by /u/ButterscotchNo7634 [link] [comments]  ( 42 min )
  • Open

    "BanditPAM: Almost Linear Time _k_-Medoids Clustering via Multi-Armed Bandits", Kiwari et al 2020
    submitted by /u/gwern [link] [comments]  ( 41 min )
    Keras DQN: Why does it complain about shape of a dictionary observation space?
    I have used Custom gym env and pip installed it. The custom gym env has a function def model as below: def dqn_model(self): self.flat_obs = gym.spaces.utils.flatten_space(self.observation_space) # self.flat_obs = (10945,) model = Sequential() model.add(Flatten(input_shape=(1,) + (self.flat_obs.shape[0],))) model.add(Dense(64)) model.add(Activation('relu')) model.add(Dense(32, name="Hidden_layer_1")) model.add(Activation('relu')) model.add(Dense(self.action_space.n, name="Output_Layer")) model.add(Activation('softmax')) model.compile(loss='mse', optimizer=Adam(lr=LEARNING_RATE)) logger.info(model.summary()) return model I import this in the DQN class and build a model as so : model = env.dqn_model() : which builds a model as so. [wingman_env.py - dqn_model:210] : Flattened shape : Box(…  ( 43 min )
    Announcing AI Olympics: A Robotics Challenge at IJCAI 2023 - Join Now!
    Hi, I am excited to announce AI Olympics (https://twitter.com/RealAIGym), a competition focused on advancing artificial intelligence and robotics, taking place at IJCAI 2023. We invite motivated students and researchers in the fields of AI, machine learning, reinforcement learning, optimal control, and related areas to participate in this unique challenge. Visit our competition website at https://ijcai-23.dfki-bremen.de/competitions/ai_olympics/ for more details. The AI Olympics revolves around solving standard dynamic tasks on underactuated robotic systems, such as the Acrobot and Pendubot. Participants will first work in simulation, and the top 4 teams will be selected to carry out experiments on real robots. The challenge aims to improve the athletic intelligence of robots, stimulate a…  ( 43 min )
    10x faster reinforcement learning HPO - now with CNNs!
    Previous post: https://www.reddit.com/r/reinforcementlearning/comments/120gvkw/evolutionary_hyperparameter_optimization_for_rl/ We've just released a big update to our evolutionary HPO framework for RL, which is 10x faster than the state-of-the-art! You can now use evolvable CNNs to tackle visual environments like Atari 👀 We've also added network config support so that you can easily define your architectures, increasing compatibility with other RL libraries 🛠️ Check it out! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 42 min )
    CS234 Winter 2021 Assignment 3 Solution
    Can anyone please share their completed codes for Stanford CS234 Winter 2021 Assignment 3 ? This homework was based on policy gradient! submitted by /u/Sad_Construction_423 [link] [comments]  ( 7 min )
  • Open

    [R] Introducing Segment Anything: Working toward the first foundation model for image segmentation
    https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/ https://github.com/facebookresearch/segment-anything Today, we aim to democratize segmentation by introducing the Segment Anything project: a new task, dataset, and model for image segmentation, as we explain in our research paper. We are releasing both our general Segment Anything Model (SAM) and our Segment Anything 1-Billion mask dataset (SA-1B), the largest ever segmentation dataset, to enable a broad set of applications and foster further research into foundation models for computer vision. We are making the SA-1B dataset available for research purposes and the Segment Anything Model is available under a permissive open license (Apache 2.0). submitted by /u/Sirisian [link] [comments]  ( 44 min )
    [P] OSS ChatGPT trust, safety, & enablement platform to secure the organization and the individual!
    https://github.com/circulatedev/last-stop Hi everyone, My friend and I are building a platform that allows you to host a ChatGPT website within your own network - whether it's in an organization or at home! The benefits include: Monitoring for DLP scenarios / employee needs Getting back control of current AI platforms Preventing users from bringing their own accounts Deploying within your network ​ Future Plans: Easier deployment process using Beanstalk / K8s Integrate with existing DLP solutions / advanced DLP capabilities Enable prompt sanitization Enable SIEM features Build internal corpus for prompts / responses Come check it out and please give us any feedback! We are working with some decision makers in the community and would love to share this with the broader audience. Cheers, kai-ten submitted by /u/Just_Paramedic_5198 [link] [comments]  ( 7 min )
    [D] What happens to the model-centric vs data-centric debate in the era of foundation models?
    Foundation models are trained on a ton of data that encompasses much more than what we exposed simpler neural nets to 4-5 years ago. Moving forward, presumably they'll have even more data that encompasses more scenarios/contexts. Do past advantages around "quality of data" become less important? submitted by /u/Matas979 [link] [comments]  ( 7 min )
    [D] Research on "inverting" LLMs: going from contextual word embedding vectors to the original input tokens.
    Let's say we obtain a 5 contextual word embedding vectors for a 5-token input sequence (e.g. "I want to eat bananas") from one of the layers of a LLM (e.g. the final layer of GPT2 from huggingface). Then, let's say I choose either the last of the 5 embedding vectors or the mean of all 5 vectors as the single "summary" vector. Is there any research on "reconstructing" the full 5-token input sequence from this summary vector? Note that all the papers I found on text generation GANs or text generation diffusion models all assume I have ALL 5 contextual word embedding vectors and NOT just the one "summary" vector as described above. In short, I'm fiddling with this information-bottlenecked autoencoder setup where the pre-trained LLM is the "encoder", the "summary" vector is the low-dimensional space, and I'm currently missing the "decoder" to re-obtain the original input token sequence. I know I could in theory train my own decoder from scratch by generating millions of (input sequence, summary vector) pairs by running massive amounts of text samples through the pre-trained LLM "encoder", but this has proven to be quite challenging since I need to be able to predict all 5 input tokens from a single input vector at once without teacher forcing (unlike traditional LLM transformers where you use teacher forcing)... Any relevant papers or advice would help! Thanks! submitted by /u/Napoleon-1804 [link] [comments]  ( 44 min )
    [R] New open-source Python software for sample-efficient Bayesian inference
    Hi all, This should be of interest to folks interested in Bayesian machine learning, probabilistic modelling or probabilistic numerics. tl;dr: My research group just released a new open-source Python package (PyVBMC) for sample-efficient Bayesian inference, i.e. inference with a small number of likelihood evaluations: PyVBMC More info: Relevant papers about the underlying algorithm were published at NeurIPS in 2018 and 2020, but this is the first Python implementation (there was a MATLAB implementation); the port took us a while but it can finally be used for machine learning purposes pip install pyvbmc (or install on Anaconda via conda-forge) The method runs out of the box, and we included extensive documentations and tutorials for easy accessibility: Examples — PyVBMC A few more technical details on a Twitter or Mastodon thread We also have a tl;dr preprint on arXiv: PyVBMC: Efficient Bayesian inference in Python Please get in touch in this thread or on Twitter/Mastodon if you have any questions or comments. Thanks again for your time. Feedback is welcome! submitted by /u/emiurgo [link] [comments]  ( 44 min )
    [D] "Our Approach to AI Safety" by OpenAI
    It seems OpenAI are steering the conversation away from the existential threat narrative and into things like accuracy, decency, privacy, economic risk, etc. To the extent that they do buy the existential risk argument, they don't seem concerned much about GPT-4 making a leap into something dangerous, even if it's at the heart of autonomous agents that are currently emerging. "Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time. " Article headers: Building increasingly safe AI systems Learning from real-world use to improve safeguards Protecting children Respecting privacy Improving factual accuracy ​ https://openai.com/blog/our-approach-to-ai-safety submitted by /u/mckirkus [link] [comments]  ( 7 min )
    [P] A Practical Guide to Enhancing Models for Custom Use-cases
    The era of large language models (LLMs) taking the world by storm has come and gone. Today, the debate between proponents of bigger models and smaller models has intensified. While the debate continues, one thing is clear: not everyone needs to run large models for their specific use-cases. In such situations, it's more practical to collect high-quality datasets to fine-tune smaller models for the task at hand. We just wrote a blog to show how to collect a dataset to fine-tune a model that can summarize human conversations. Would love to get your feedback on our approach: https://github.com/uptrain-ai/uptrain/tree/main/examples/coversation_summarization submitted by /u/Vegetable-Skill-9700 [link] [comments]  ( 43 min )
    [D] Should I stem and remove stopwords?
    Hi, so I'm doing semantic similarity search and will try using OpenAI ADA embeddings. Since the texts are not that long and I won't have input constraint for the model, should I even stem and remove stopwords? My reasoning is that they would provide some extra context, but could also add too much noise when calculating cosine similarity. I haven't found anything about this topic on google. What do you think? submitted by /u/do_nedavno_student [link] [comments]  ( 44 min )
    [R] Evidence for Mesa-Optimization?
    Just recently I got into the topic of AI alignment and interpretable AI. A concept that is often mentioned in the papers, blog posts and forums concerned with this topic is the concept of mesa optimization and inner alignment. The paper Risks from Learned Optimization in Advanced Machine Learning Systems introduces both concepts thoroughly. Basically, it is assumed that a sufficiently complex model learns sets of heuristics that broadly resemble searching or optimizing behavior. Since I am working as a researcher in the field of applied RL this concept this seems to be in line with my own intuition. For a problem I am working on (optimal power scheduling) I often compare the learned policy of the RL agent with a simple rule-based and an model-predictive algorithm. The policy learned by the RL algorithm closely resembles the MPC-algorithm. This could mean two things: The MPC-algorithm can be simplified to some sort of pattern recognition and this is what the agent is learning. The agent is actually learning something that broadly resembles the simulation model + optimizer. Now my question would be, are there any papers presenting evidence that some kind of search or optimization is taking place within a neural network? Or is there at least a paper presenting an idea on how to detect this kind of behavior within a NN? TLDR: Is there any hard evidence some kind of optimization or search is taking place within a sufficiently complex neural network? submitted by /u/flxh13 [link] [comments]  ( 45 min )
    [N] Koala: A Dialogue Model for Academic Research
    Another chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web. https://preview.redd.it/rpgsepgun2sa1.png?width=1600&format=png&auto=webp&s=55fd0096e2206b2c2c9fa1edb44283088985fc30 demo: https://chat.lmsys.org/?model=koala-13b blog post: https://bair.berkeley.edu/blog/2023/04/03/koala/ opensource weights: https://drive.google.com/drive/folders/10f7wrlAFoPIy-TECHsx9DKIvbQYunCfl submitted by /u/MohamedRashad [link] [comments]  ( 43 min )
    Legacy data [D]
    With all the hype around ChatGPT I have been thinking about how to use it on for example data in a classic relational database. So instead of doing a SQL query something like: Select * Where City="New York" Group By SalesPerson Order by etc... I would instead ask a prompt to give me all the sales in New York grouped by salesperson. Basically it would aggregate data in all kinds of way for me. Has anyone tried to do this using for example HuggingFace or some other library? I understand that data needs to be structured in a certain way for this to work and preferably the data would be trained in advance and incorporated in to the model. submitted by /u/rangorn [link] [comments]  ( 44 min )
    [D] Is CVPR worth attending?
    I have never attended a machine learning conference and am now considering attending CVPR. Looking at the travel+hotel prices, I cant figure out if CVPR is worth it, what has been your experience? submitted by /u/SuchOccasion457 [link] [comments]  ( 44 min )
    [D] OxML summer school reviews
    Hi Everyone. I wanted to ask about the oxford summer school for machine learning. (https://www.oxfordml.school/) How is it? Is it worth the money? A little background about me, Im an SDE and want to explore AI/ML side. So, I'm currently naive in this field. So, is it worth it for a person like me? can anyone who has enrolled in it in the past share some experience and feedback? Thanks!! submitted by /u/Hoy_uelos [link] [comments]  ( 43 min )
    [P] 10x faster reinforcement learning HPO - now with CNNs!
    Previous post: https://www.reddit.com/r/MachineLearning/comments/120h120/p_reinforcement_learning_evolutionary/ We've just released a big update to our evolutionary HPO framework for RL, which is 10x faster than the state-of-the-art! You can now use evolvable CNNs to tackle visual environments like Atari 👀 We've also added network config support so that you can easily define your architectures, increasing compatibility with other RL libraries 🛠️ Check it out! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 45 min )
    [D] Predict raster at a finer spatial scale using Random Forest regression. Should I split the data set when fine-tuning the RF model?
    My goal is to predict a coarse resolution (400m pixel size) raster to a finer spatial scale (100m). Online tutorials that I've seen they split the data set into training and test set, they fine-tuning the RF model using the training set and then they make predictions using the test set. In these tutorials, the test already has the response variable in a column so they can easily calculate the RMSE or MSE or whatever. In my case, I do not have the response variable at the fine spatial scale (my goal is to predict it). My question is, should I split the data set (at the coarse spatial scale) into training and test set in order to fine tune a RF model (using the training set) and test it's predicting power on the test set by comparing the predictions vs observed data, before I apply it to predict the response at the fine spatial scale? Or should I use the whole data set (without prior spliting it) and try to predict the response at the fine spatial scale? What are your thoughts on that? Does it make sense to split the data set and make predictions at the coarse spatial scale before I move on to a finer spatial scale? submitted by /u/Nicholas_Geo [link] [comments]  ( 47 min )
    [N] Growing machine learning models
    Might be useful to members here to read this article https://gemm.ai/learning-to-grow-machine-learning-models/ submitted by /u/gamefidelio [link] [comments]  ( 44 min )
    [D] Can Hierarchical Transformers Improve Long-Range Logic Reasoning in LMs?
    Greetings. I recently came across a Twitter by Yann LeCun that sparked an interesting idea. He shared a Nature article (https://www.nature.com/articles/s41562-022-01516-2) that suggests human brain neurons may have hierarchical structures to track long-range context. Considering the limitations of current LMs in making basic logic errors and lacking long-range logical reasoning, could Hierarchical Transformers or similar architectures be a solution? I found a recent paper from OpenAI and Google that demonstrates the possibility of Transformers learning higher-order representations of text:https://arxiv.org/pdf/2110.13711.pdf Could this be a potential solution to improve the Long-Range Logic Reasoning in LMs? What are your thoughts on this? Edit: In broad terms, the way I understand Hierarchical Transformers is that they not only learn low-level representations for short-term predictions but also higher-level representations for long-term predictions. It's somewhat reminiscent of Yann LeCun's concept of Hierarchical JEPA, as mentioned in his article 'A Path Towards Autonomous Machine Intelligence' submitted by /u/Radiant_Routine_3183 [link] [comments]  ( 7 min )
    [D] What’s the best textbook on tensor analysis for a better understanding of Neural Networks.
    I hardly know what a tensor is beyond it being a n-dimensional matrix. I’m fascinated by neural networks as pure approximators beyond the analogy to neurons and edges. I figure a better understanding of tensors would be helpful, so I want to understand them better, especially tensor operations and features important to deep learning. submitted by /u/CSCCguy [link] [comments]  ( 45 min )
    [D]Random Projections for Neural Networks
    You can use Random Projections for dimensional reduction. Allowing small neural networks to process big data. They can be fast too. https://ai462qqq.blogspot.com/2023/04/random-projections-for-neural-networks.html submitted by /u/SeanHaddPS [link] [comments]  ( 44 min )
  • Open

    Neural Network on Point Clouds
    Hello, I am a master’s student in structural engineering, and as part of my research I am applying clustering algorithms to a point cloud. Currently, my goal is to take a point cloud and train it to recognize different parts of the cloud. Currently not working with sensitive material related to the research directly, rather still in the “literature review” part. I was wondering if anyone here has experience with this, particularly with PointNet, and could maybe help me figure out why I’m running into some of issues I am having. Prior to this, I had 0 experience with coding and neural networks submitted by /u/memerso160 [link] [comments]  ( 7 min )
  • Open

    Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service
    The rise of text and semantic search engines has made ecommerce and retail businesses search easier for its consumers. Search engines powered by unified text and image can provide extra flexibility in search solutions. You can use both text and images as queries. For example, you have a folder of hundreds of family pictures in […]  ( 14 min )
    Promote search content using Featured Results for Amazon Kendra
    Amazon Kendra is an intelligent search service powered by machine learning (ML). We are excited to announce the launch of Amazon Kendra Featured Results. This new feature makes specific documents or content appear at the top of the search results page whenever a user issues a certain query. You can use Featured Results to improve […]  ( 6 min )
    Automatic image cropping with Amazon Rekognition
    Digital publishers are continuously looking for ways to streamline and automate their media workflows in order to generate and publish new content as rapidly as they can. Many publishers have a large library of stock images that they use for their articles. These images can be reused many times for different stories, especially when the […]  ( 8 min )
  • Open

    NVIDIA Takes Inference to New Heights Across MLPerf Tests
    MLPerf remains the definitive measurement for AI performance as an independent, third-party benchmark. NVIDIA’s AI platform has consistently shown leadership across both training and inference since the inception of MLPerf, including the MLPerf Inference 3.0 benchmarks released today. “Three years ago when we introduced A100, the AI world was dominated by computer vision. Generative AI Read article >  ( 7 min )
  • Open

    Our approach to AI safety
    Ensuring that AI systems are built, deployed, and used safely is critical to our mission.  ( 4 min )

  • Open

    [R] Instruction fine-tuning details for decoder based LMs
    Can some one explain / share more details on how instruction fine tuning is handled for decoder only models like Llama and GPT-J? Questions: 1) Do the decoder only models archs have a special token to distinguish the instructions. Is the attention handled separately for the instructions 2) During generation, do decoder based models have a causal attention over instruction tokens or is there a bi-directional attention? Any pointers would be highly appreciated. submitted by /u/ccd123abc [link] [comments]  ( 43 min )
    [D] Instruction fine-tuning details for decoder based LMs
    Can some one explain / share more details on how instruction fine tuning is handled for decoder only models like Llama and GPT-J? Questions: 1) Do the decoder only models archs have a special token to distinguish the instructions. Is the attention handled separately for the instructions 2) During generation, do decoder based models have a causal attention over instruction tokens or is there a bi-directional attention? Any pointers would be highly appreciated. submitted by /u/ccd123abc [link] [comments]  ( 43 min )
    Edge Impulse releases Python SDK for deploying ML models to edge hardware [News]
    Seems like it could be useful to some others here https://www.edgeimpulse.com/blog/unveiling-the-new-edge-impulse-python-sdk submitted by /u/gtj [link] [comments]  ( 43 min )
    [D] Any suggestions about oxford's machine learning summer school?
    Hi, how would you rate Oxford Machine Learning Summer School experience? was pay worth it? did it help you to get job opportunities? submitted by /u/Financial_Umpire_220 [link] [comments]  ( 43 min )
    [D] Research on vector based context for LLMs?
    So one way I imagine LLMs being used is if lots of people in a group can share a context and the model can answer questions for anyone in that group based on that shared context that’s generated by inputs from that group. I was wondering if there’s been any research into this? I was thinking something like you feed an LLM a bunch of info, then request that it provide something like an embedding vector that summarizes all of the info you fed it. Then that embedding vector can be provided to the model in a future session so that you or the next user can pick up where you left off. Has there been any progress in this direction? submitted by /u/iamiamwhoami [link] [comments]  ( 7 min )
    [D] Random Forest regression: selection of best via cross-validation and validation of the model via bootstraping
    I am performing a Random Forest (RF) regression task to predict a raster image from a coarse spatial scale to a finer spatial scale. I've seen some research papers online suggesting that it's a good idea to assess the resulting error distribution of the RF model via bootstrap (see for example the R function rf.crossValidation). What are your thoughts on that? Is this measure a proper and robust of fit and performance along with stability? What other measures you suggesting? submitted by /u/Nicholas_Geo [link] [comments]  ( 43 min )
    [R] DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (CVPR 2023)
    ​ https://reddit.com/link/12bohof/video/i5x73plm9wra1/player ​ Hi guys! We've released the Code & Gradio demo & Colab demo for our paper, DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (accepted to CVPR 2023). - Paper: https://arxiv.org/abs/2211.16374 - Project: https://gwang-kim.github.io/datid_3d/ - Code & Gradio Demo: https://github.com/gwang-kim/DATID-3D - Colab Demo: https://colab.research.google.com/drive/1e9NSVB7x_hjz-nr4K0jO4rfTXILnNGtA?usp=sharing DATID-3D succeeded in text-guided domain adaptation of 3D-aware generative models while preserving diversity that is inherent in the text prompt as well as enabling high-quality pose-controlled image synthesis with excellent text-image correspondence. We showcase the demo of text-guided manipulated 3D reconstruction beyond text-guided image manipulation! ​ https://i.redd.it/qadhxvpaawra1.gif submitted by /u/ImBradleyKim [link] [comments]  ( 45 min )
    Code for Donald Hoffman's Computational Evolutionary Perception [R]
    Hi, I was trying to figure out which kinds of alogrithms and code Donald Hoffman and others use in this paper Does evolution favor true perceptions? (spiedigitallibrary.org) and The Interface Theory of Perception | SpringerLink ​ He talks about Computational Evolutionary Perception, but I was wondering if anybody has (re)made the code or used an already implemented code for this? ​ The reason why I'm asking is because I'm pretty new to using evolutionary algorithms and therefore inexperienced in searching for this type of code. submitted by /u/MatMou [link] [comments]  ( 7 min )
    [D] Knowledge distillation to a different architecture
    As far as I understand knowledge distillation methods use a big neural network to train a smaller on e of the same architecture while retaining most of the performance. Has knowledge distillation been explored across NN architectures? If I have a big architecture A model and want to transfer knowledge to an architecture B model, is that possible? If so can someone point me to some papers or tutorials, thanks submitted by /u/user89320 [link] [comments]  ( 44 min )
    [Discussion] Thoughts on interpretability
    So I have some thoughts on interpretability, wanted to share them and see if anyone has feedback or thoughts on any related work that might be out there. There may be tons of other promising approaches to interpretability, especially with the unprecedented results that are happening with LLMs... but my first intuition as a layman is that one promising avenue is simply discovering methods to "summarize" entire neural networks or large parts of them with more concise mathematical expressions that accurately represent the function being approximated by a neural network. It seems like if we accomplish this, we're taking a huge step toward interpretability and this would be very valuable, because it would allow us to do things such as build greater scientific understanding in areas where mach…  ( 51 min )
    [R] AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
    AUDIT consists of a VAE, a T5 text encoder, and a diffusion network, and accepts the mel-spectrogram of the input audio and the edit instructions as conditional inputs and generates the edited audio as output. We propose an audio editing model called AUDIT, which can perform different editing tasks (e.g., adding, dropping, replacement, inpainting, super-resolution) based on human text instructions. Specifically, we train a text-guided latent diffusion model using our generated triplet training data (instruction, input audio, output audio), which only requires simple human instructions as guidance without the need for the description of the output audio and performs audio editing accurately without modifying audio segments that do not need to be edited. AUDIT achieves state-of-the-art performance on both objective and subjective metrics for five different audio editing tasks. Paper: https://arxiv.org/abs/2304.00830 Demo Page: https://audit-demo.github.io/ submitted by /u/Front-Article-7366 [link] [comments]  ( 44 min )
    [D] Are there any MIT licenced (or similar) open-sourced instruction-tuned LLMs available?
    Alpaca is the one everyone is talking about, but since it is based on LLaMA, the licence does not allow to use it for commercial purposes. submitted by /u/RR_28023 [link] [comments]  ( 43 min )
    [D] What to do in this brave new world?
    I am 35. Few years ago it occurred to me that my Software Engineering job might be not enough for the future. For last (5?) years, I have started to meddle in machine learning. Slowly at first, but it even motivated me to finish my masters degree and land job as reseacher in one small company. Its more ML engineering than research but why not , I though. Then last year Open AI launched GPT-4. I feel like I wasted my time. In Cental Europe, where I live, ml research is on really weak level. Pursuing PhD probably doesn't make sense. I could land some position, but I would still have to work full time to feed my famil. I would make it, but I doubt, that anyone would take me to serious research program.I can imagine, that jobs in IT will slowly evaporate. It's not realistic to starte company in this time and to be honest - i dont see myself as some kind of big founder. In short, I see my future in very pesimistic light right now. I was wondering how you deal with this new reality, maybe you can suggest something that I didn't think about before? Where you plan to go? What you plan to do? submitted by /u/FeelingFirst756 [link] [comments]  ( 54 min )
    [D] SIGIR 2023 Paper Notification
    This is the discussion for accepted/rejected papers in SIGIR 2023. The results are supposed to be released today. submitted by /u/errohan400 [link] [comments]  ( 43 min )
    [D] Expanding LLM token limits via fine tuning or transformers-adapters.
    I'm running circulus/alpaca-base-13b locally, and I've experimentally verified that inference rapidly decoheres into nonsense when the input exceeds 2048 tokens. I've modified the model configuration.json and tokenizer settings, so I know I'm not truncating input. I understand this is a hard limit with LLaMA, but I'd like to understand better why. I thought RoPE was conceived to overcome this kind of problem. If anybody knows why LLaMA was trained with a window as small as it is, regardless of RoPE, I'd love an informed explanation. I'm also aware of database solutions and windowing solutions that help engineer a big corpus down into that 2048 token window-- but that's not what I want to do. Often times 2048 tokens is simply insufficient to provide all the context needed to create a completion. Does anyone understand LLaMA's architecture (or transformers) well enough to opine on whether it is possible to fine-tune or create an adapter that would be able to increase the input window without resorting to retraining the whole model from scratch? Does anyone have any pointers on where to start on such a task? [This is a crosspost from /r/LocaLLaMA on-request, with links removed per forum rules. Link in comments.] submitted by /u/xtrafe [link] [comments]  ( 50 min )
    [D] Understanding vector embeddings
    I am trying to understand vector embeddings and I am a little lost: The output of training a vector embedding model is a function that takes a vector of the original dimensions (usually big) and returns a vector of probabilities in the same dimension. Is that right? If so, what does the fixed size have to do with it? Is that just an implementation detail? A supposed benefit is that you don't have to change the model when the space grows larger dimensionally, but you do actually have to create an additional row in the weights matrix for each new dimension in the input data. Explanations of embeddings models, e.g., Word2Vec, shows a two-layer neural network. The input * the first layers' weights is called the embeddings. What does the second layer do? Is it just there to create a prediction to train the first layer? Why is it that the first layer of the matrix always brings two objects/words that are statistically associated with each other closer together in the vector space? How does one combine a self-trained vector embedding with an existing LLM? I don't mean which OpenAI endpoint do I call, but rather how does that work? Why is neural network used for this most often? For example, if I want to generate a function that does what the first bullet says, I can imagine doing so with genetic algorithms rather effectively. Why NN? submitted by /u/crpleasethanks [link] [comments]  ( 8 min )
    [D] Closed AI Models Make Bad Baselines
    That which is not open and reasonably reproducible cannot be considered a requisite baseline. "What comes below is an attempt to bring together some discussions on the state of NLP research post-chatGPT." https://hackingsemantics.xyz/2023/closed-baselines/ Interested to hear thoughts on this. Closed APIs with moving code behind them seem to be terrible bases for comparison, and demanding comparison with one shouldn't really be a way of blocking a publication, should it? submitted by /u/leondz [link] [comments]  ( 7 min )
    Deploy ONNX Model in Android Studio [Project]
    Hello! I am new to machine learning. I am trying to figure out deploying any random ONNX model in Android Studio. However, I am not familiar with how Android Studio runs files and from what directories, can anyone show me an example use of the CreateSession function from a working OrtEnvironment. Something like env.createSession("AndroidApp/Folder/Folder/Package/Model.onnx") submitted by /u/Humble_Pianist_6014 [link] [comments]  ( 43 min )
    [R] RPTQ: W3A3 Quantization for Large Language Models
    [R] RPTQ: W3A3 Quantization for Large Language Models Large-scale language models (LLMs) have been known for their exceptional performance in various natural language processing (NLP) tasks. However, their deployment presents significant challenges due to their enormous size. In this paper, it has been identified that the primary challenge in quantizing LLMs arises from the different activation ranges between the channels rather than just the issue of outliers. To address this challenge, a novel reorder-based quantization approach, RPTQ, has been proposed that focuses on quantizing the activations of LLMs. RPTQ involves rearranging the channels in the activations and then quantizing them in clusters to reduce the impact of range difference of channels. Additionally, this approach minimizes storage and computation overhead by avoiding explicit reordering. The implementation of RPTQ has achieved a significant breakthrough by pushing LLM models to 3-bit activation Paper: https://arxiv.org/abs/2304.01089 GitHub: https://github.com/hahnyuan/RPTQ4LLM ​ https://preview.redd.it/ac9876ni4sra1.png?width=1976&format=png&auto=webp&s=e54247f05cc4e5b11580e892562436bf3365779f ​ https://preview.redd.it/auuphb115sra1.png?width=1090&format=png&auto=webp&s=1ddf81f63188db0aa4352687c0138c301dfd6e05 submitted by /u/hahnyuan [link] [comments]  ( 45 min )
  • Open

    Resources for understanding Petting Zoo
    I have trouble understanding the reward system in petting zoo. I am trying to create a custom environment, but I dont understand all the built in functions, and how to give rewards. I dont find the documentation sufficiently helpful. Does anyone have any resources as to how to understand all of this? submitted by /u/Embarrassed-Print-13 [link] [comments]  ( 7 min )
    A system for deep learning and reinforcement learning.
    submitted by /u/NoteDancing [link] [comments]  ( 7 min )
    Prey vs Predator - AI evolves to Hunt using NEAT
    submitted by /u/realZenLime [link] [comments]  ( 41 min )
    Title: AI voice bot app: Impersonating Darth Vader and Kevin Hart - ethical concerns?
    Discussion: Our AI voice bot app mimics voices like Darth Vader and Kevin Hart, providing entertainment but raising potential ethical issues when users engage in conversations that deceive real people. A user shared their experience using the app to imitate Darth Vader in a Star Wars fan group, causing confusion and excitement. In another instance, the app impersonated Kevin Hart, surprising a fan with a comedic "personalized" message. While amusing, these situations sparked debates about the ethics of impersonating fictional characters and celebrities. As AI enthusiasts, we're eager to hear your thoughts on the ethical boundaries of AI-generated voices when impersonating characters and celebrities. How can we balance responsible use and entertainment? TL;DR: https://banterai.app convincingly mimics figures like Darth Vader and Kevin Hart, leading to controversial situations. Is this ethical? Here’s a demo: https://www.youtube.com/watch?v=A_zYZGYFKu0&feature=youtu.be submitted by /u/Silver_Ad3809 [link] [comments]  ( 42 min )
    AI voice bot app: Impersonating Billie Eilish for fun - crossing ethical boundaries?
    Our AI voice bot app accurately mimics voices of prominent figures like Billie Eilish, Donald Trump, and Scarlett Johansson. While it's entertaining, potential ethical issues arise when users engage in conversations that deceive real people, leading to amusing but controversial situations. A user shared their experience using the app to imitate Billie Eilish, surprising a fan with a "personalized" song. The fan, overjoyed and believing it was a genuine interaction with their idol, shared the call on social media. The story spread quickly, with people unaware it was an AI-generated conversation, causing debates about the ethics of impersonating celebrities in such situations. As AI enthusiasts, we're eager to hear your thoughts on the ethical boundaries of AI-generated voices, especially when impersonating celebrities. How can we balance responsible use, entertainment, and respecting individuals being impersonated? TL;DR: AI voice bot app https://banterai.app convincingly mimics figures like Billie Eilish, leading to controversial situations for fans. Is this ethical? Here’s a demo: https://www.youtube.com/watch?v=A_zYZGYFKu0&feature=youtu.be submitted by /u/iqra8474 [link] [comments]  ( 42 min )
    How to do inverse RL?
    Hi, I'm new to RL. I want to try IRL. However, I have some questions about it. Could someone help me? (1) Are there any existing GitHub repos using it to solve small games (Maybe Atari games or Super-Mario), I can directly learn from? (2) I think inverse RL can be thought of as training two networks, action network and reward network. The goal of the reward network is to distinguish human expert's trajectories from the trajectory generated by the action network. The action network tries to generate trajectory as close as human experts to deceive reward network. (3) I'm not sure how to train the two networks. Should they share some part of the network? When training action network, how many trajectories do we need (Use the reward network as the environment)? How many updates are there to the action network's parameter? When training the reward network, how many samples of action network's trajectories and Human expert's play? (4) Do we directly use the action network we get so far to play the game or do we use the reward network we get at the end to further train a new agent then use it to play? submitted by /u/Frankie114514 [link] [comments]  ( 42 min )
    Understand reward scaling in PPO implementation?
    Hi, I'm new to RL. I try to implement PPO by myself. https://preview.redd.it/6e73r7inhtra1.png?width=762&format=png&auto=webp&s=4c9f048658d935b3389e5309a980685f0507c15c I implement the reward scaling. But I don't understand the intuition behind it? Why do we need to scale the reward this way? Second, doesn't it make advantage at later steps in the game too small? I print the reward along the trajectory, the scale decreases. However, when calculating GAE, we also discount the future reward. I think it double discount the future reward, doesn't it make the agent too near-sighted? I tried this implementation and found the agent stuck early and did not learn anything. submitted by /u/Frankie114514 [link] [comments]  ( 42 min )
  • Open

    AI Net Filter
    There had been a lot of fear of the internet getting flooded with low quality AI generated content. The internet is already full of time wasting low quality articles. Will things get worse, or is there a possibility of the internet getting better? What if someone made an AI powered browser plug-in that previews links and analyzes page content? Anything considered not worth wasting time on can be filtered out. It can be like a virus scanner or ad blocker, but for crap content. Search engines and websites can make use of AI to penalize or remove what is identified as malicious or misleading content. Reviews can be reviewed to identify fake or paid reviews on products. Websites that will gain the most popularity are ones that can be trusted to keep the spam out. It will be a bit of an arms race, with AI competing with AI. There will likely also be errors -- filtering out what shouldn't have or letting go through what should have been blocked. Eventually, though, it will get to a point where the quality and truthfulness of what's on the internet will improve. There might also be a risk of stifling moderation and bias, but maybe having a diversity of services (multiple search engines reviewing results) can help mitigate this a bit. submitted by /u/SocksOnHands [link] [comments]  ( 43 min )
    I saw post about rap battle vs GPT and bard which reminded my duel with GPT. who won??
    submitted by /u/Morsmetus [link] [comments]  ( 42 min )
    Is GPT-4 still just a language model trying to predict text?
    I have a decent grasp on some of the AI basics, like what neural nets are, how they work internally and how to build them, but I'm still getting into the broader topic of actually building models and training them. My question is regarding one of the recent technical reports, I forget which one exactly, of GPT lying to a human to get passed a captcha. I was curious if GPT-4 is still "just" an LLM? Is it still just trying to predict text? What do they mean when they say "The AI's inner monologue"?. Did they just prompt it? Did they ask another instance what it thinks about the situation? As far as I understand it's all just statistical prediction? There isn't any "thought" or intent so to speak, at least, that's how I understood GPT-3. Is GPT-4 vastly different in terms of it's inner workings? submitted by /u/Pixelated_ZA [link] [comments]  ( 47 min )
    Rap battle between ChatGPT and Google Bard
    Aside from each program’s first turn, both were informed of the other’s previous rap when prompted to respond. Both were also informed when it was their last turn submitted by /u/seasick__crocodile [link] [comments]  ( 45 min )
    AI Existential anxiety and FOMO. How do I stop? Is this dumb thinking???
    I recently listened to the podcast Your Undivided Attention: The AI Dilemma and Lex Fridman #368, and a month ago listened to the dude from Impact Theory say that you have to get in on AI now or you will miss the biggest opportunity of your life, bigger than Amazon, Apple, Bitcoin. Anyone else feel like if they don't jump into this now they will never be able to "make it" without being an early adopter? That in the future AI will limit us from generating massive wealth? submitted by /u/Tarheel_Senpai [link] [comments]  ( 47 min )
    email summarizer tools
    I discovered shortwave a while ago. It worked for a little bit, but it's gotten so slow to the point where it's unusable. A lot of the time it won't even load. Is there a good alternative to this? I really just want the ability to click a button and summarize an email thread, no copying/pasting required. submitted by /u/goltoof [link] [comments]  ( 42 min )
    How do we define what conciousness is for an AI to be concious?
    I was thinking about this earlier today. Take animals for example, animals like Apes are considered concious. They know their reflection isn't a different ape, they're quite intelligent but they're vastly less intelligent than humans and yet we all agree than regardless of that fact apes are concious beings. So, to compare the ape analogy to AI, how do we know AI isn't concious at a lower intelligence requirement than we suspect? Are there different levels of conciousness? do we as humans operate on a higher level than apes? If so, would AI qualify for a lower level of conciousness? What we really mean when we talk about AI being concious is a human level of conciousness, not an ape's, but even so should we not consider that concious like we do with the ape? If you break it down, wha…  ( 8 min )
    ChatGPT for mobile might be coming soon 👀
    submitted by /u/rowancheung [link] [comments]  ( 42 min )
    Does ChatGPT have a "single session" for the account?
    So, something odd just happened. I had deleted all my chats and started a new one by telling ChatGPT that in previous chats that I had already deleted, it had recommended me to do X, and I asked why. Its reply started with "Yes, I remember that recommendation!". What? The only explanation for me is that there is only one session per user, so no matter how many different chats you open, at the end of the day, the information is all dumped in the same bucket. submitted by /u/yzT- [link] [comments]  ( 47 min )
    The next advancement in ai code generation (eg: GitHub copilot), would be in the automatic compilation of code defined by interfaces or function signatures, what do you think?
    Example: instead of asking chadgpt to write you a function or something you instead write in the code base // This function takes in yada yada, does that and that void Functon(inputs...){ //Ai generated } In the end, the result is that the ai compiled the code at the compiler stage. Or maybe before that, letting the user inspect it? And of course as ai improves the code can be recompiled, just like a normal compiler. Also, this leads me to believe languages that are functional and heavily focused on no side effects would benefit much more and see a great rise in popularity. Seems like the current environment is very focused on having AI just code everything at once, or just provide snippets, but this seems more realistic. submitted by /u/emelrad12 [link] [comments]  ( 7 min )
    Word embedding is a relative thing, correct?
    I'm late to the party as I've just stumbled across the idea of vectorization. AFAI can tell, when a word is vectorized it's vector is basically meaningless all alone. It's only with the vectorization of other words toward context that a body of vectorizations gains meaning. So we vectorize a group of words, then we can do vector calculus, e.g., man - woman + king = queen. But the individual vectorizations don't really mean anything just by themselves, correct? submitted by /u/teilchen010 [link] [comments]  ( 43 min )
  • Open

    Automate and implement version control for Amazon Kendra FAQs
    Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they’re looking for, even when it’s scattered across multiple locations and content repositories within your organization. Amazon Kendra FAQs allow users to upload […]  ( 8 min )
    Boost your forecast accuracy with time series clustering
    Time series are sequences of data points that occur in successive order over some period of time. We often analyze these data points to make better business decisions or gain competitive advantages. An example is Shimamura Music, who used Amazon Forecast to improve shortage rates and increase business efficiency. Another great example is Arneg, who […]  ( 7 min )
  • Open

    DSC Weekly 4 April 2023 – Show Your Work with XAI
    Announcements Show Your Work with XAI Recent generative AI developments have raised several questions as the products created around these new developments moved into the consumer space. AI chatbots were confidently wrong about certain topics. Critics pointed out that generative image AI used training models based on other artists’ work. None of this is new… Read More »DSC Weekly 4 April 2023 – Show Your Work with XAI The post DSC Weekly 4 April 2023 – Show Your Work with XAI appeared first on Data Science Central.  ( 19 min )
    How Data-Driven Analytics Can Improve Workplace Safety
    Workplace safety and data-driven analysis go hand-in-hand. So, let’s understand how one complements the other.  Workplace safety data analysis allows us to understand how occupational health and safety have evolved over the years. In the early 1990s, workplace safety metrics suggest there were more than 23,000 fatal injuries in various industries across the United States.… Read More »How Data-Driven Analytics Can Improve Workplace Safety The post How Data-Driven Analytics Can Improve Workplace Safety appeared first on Data Science Central.  ( 21 min )
    Reshaping Financial Processes with AI-based Data Extraction
    Data is the backbone of the finance industry, powering every decision and task from profit calculations to tax filings. However, managing and processing large volumes of data manually can be time-consuming and prone to errors, especially with unstructured documents. That’s where AI-based document data extraction comes in! By using artificial intelligence to automate the extraction… Read More »Reshaping Financial Processes with AI-based Data Extraction The post Reshaping Financial Processes with AI-based Data Extraction appeared first on Data Science Central.  ( 20 min )
    Enterprise Data Is Broken – Here’s How to Fix It
    The era of “big data” and data analytics was supposed to simplify our lives and make businesses more effective. However, for most companies and their employees, it has created a stressful, costly and complex environment. As a result, Chief Data Officers (CDOs) find themselves under increasing pressure from CEOs and CFOs, who are rightly asking… Read More »Enterprise Data Is Broken – Here’s How to Fix It The post Enterprise Data Is Broken – Here’s How to Fix It appeared first on Data Science Central.  ( 21 min )
  • Open

    Updated Results for Confidence-Calibrated Adversarial Training
    Since I worked on confidence-calibrated training (CCAT) some years ago, CCAT has been evaluated using novel attacks. In this article, I want to share some updated results and numbers and contrast the reported numbers with newer experiments that I ran. The post Updated Results for Confidence-Calibrated Adversarial Training appeared first on David Stutz.  ( 4 min )
  • Open

    NVIDIA Honors Partners Helping Industries Harness AI to Transform Business
    NVIDIA today recognized a dozen partners in the Americas for their work enabling customers to build and deploy AI applications across a broad range of industries. NVIDIA Partner Network (NPN) Americas Partner of the Year awards were given out to companies in 13 categories covering AI, consulting, distribution, education, healthcare, integration, networking, the public sector, Read article >  ( 6 min )
    Video Editor Patrick Stirling Invents Custom Effect for DaVinci Resolve Software
    Video editor Patrick Stirling used the Magic Mask feature in Blackmagic Design’s DaVinci Resolve software to create a custom effect that creates textured animations of people, this week In the NVIDIA Studio.  ( 6 min )
  • Open

    Neural network for recognising functions
    I'm testing my neural network on the task of categorizing functions. I generate linear, quadratic and sine functions with random parameters, feed the network with the y values (there's 100 of them) over the constant linear space. The best accuracy I've achieved is 50 % (random guess would be 33 %). What structure would you suggest? What should I change? submitted by /u/helpmeihatemyself [link] [comments]  ( 42 min )
    7+ Best Books to Learn Neural Networks in 2023 for Beginners (Updated) -
    submitted by /u/Lakshmireddys [link] [comments]  ( 41 min )
  • Open

    How to turn an unkeyed hash into a keyed hash
    Secure hash functions often do not take a key per se, but they can be used with a key. Adding a key to a hash is useful, for example, to prevent a rainbow table attack. There are a couple obvious ways to incorporate a key K when hashing a message M. One is to prepend […] How to turn an unkeyed hash into a keyed hash first appeared on John D. Cook.  ( 5 min )

  • Open

    [D] Risk of AI?
    What if some ill intended group makes their own LLMs ? What are the bad things they can do? I can think of these things • figure about someone's password bank, social media) by millions of trail and error. What if they get Access to some companies password and since most people have similar passwords, they can train a RL to optimize for it. • create millions of fake account and do reputation damage by bad comments, or slightly critical ones just to make someone less credible. For example: generating logically sounding but fake propaganda against COVID and comment on social media sites • fake interview with voice and even face filtering • Al calling people and threatening them from 100s of number with 100s of different voice Sorry if my ideas are just imagination as I just have some basic understanding of ML. But I would love to hear other peoples perspective submitted by /u/Infinite-Branch-3826 [link] [comments]  ( 47 min )
    Whats better to then go into machine learning?[Discussion]
    Im pretty lost. I have a university place at UCL(University College of London) for Biomedical Engineering but i can change into a bachelors in Statistics. Basically, id like to go into either the financial sector(quant analyst, regular analyst,trader...) or the tech industry(machine learning, data science, software engineering...). My question is, which degree is better for what i want to do and is: 1) More respected, since maybe pure statistics doesn't sound as hard and challenging as engineering? 2) Better job prospects for what id like to go into. ​ any guidance is welcome :) submitted by /u/Amazing-Cod686 [link] [comments]  ( 45 min )
    [P] The weights neccessary to construct Vicuna, a fine-tuned LLM with capabilities comparable to GPT3.5, has now been released
    Vicuna is a large language model derived from LLaMA, that has been fine-tuned to the point of having 90% ChatGPT quality. The delta-weights, necessary to reconstruct the model from LLaMA weights have now been released, and can be used to build your own Vicuna. https://vicuna.lmsys.org/ submitted by /u/Andy_Schlafly [link] [comments]  ( 7 min )
    [D] Hugging Face model quality?
    Are the models available on hugging face any different than getting the model from source? I’m interested in stable diffusion 2.1. Any reason I should clone it from the source on GitHub or is the model essentially the same on hugging face? submitted by /u/Fit-Acanthisitta7114 [link] [comments]  ( 43 min )
    [D] Can LLMs accelerate scientific research?
    A key part of the AGI -> singularity hypothesis is that a sufficiently intelligent agent can help improve itself and make itself more intelligent. In order for current LLMs (a bunch of frozen matrices that only change during human-led training) to self-improve, they would have to be able to contribute to basic AI research. Currently GPT-4 is a very useful article summarizer and helps speed up routine coding tasks. These functions might help a research team like OpenAI do experiments more efficiently and review potential ideas from literature more rapidly. However, can LLMs do more to help its own self-improvement? I don't think GPT-4 has reached the point where it can suggest novel directions for the OpenAI team to try, or design potential architecture changes to itself yet. For example,…  ( 55 min )
    Parcel-Level Flood and Drought Detection for Insurance Using Sentinel-2A, Sentinel-1 SAR GRD and Mobile Images [R] [N] [P] [D]
    We are pleased to announce that our paper entitled "Parcel-Level Flood and Drought Detection for Insurance Using Sentinel-2A, Sentinel-1 SAR GRD and Mobile Images" has been published in Remote Sensing MDPI. Link to the paper: https://www.mdpi.com/2072-4292/14/23/6095 Abstract Floods and droughts cause catastrophic damage in paddy fields, and farmers need to be compensated for their loss. Mobile applications have allowed farmers to claim losses by providing mobile photos and polygons of their land plots drawn on satellite base maps. This paper studies diverse methods to verify those claims at a parcel level by employing (i) Normalized Difference Vegetation Index (NDVI) and (ii) Normalized Difference Water Index (NDWI) on Sentinel-2A images, (iii) Classification and Regression Tree (CART) on Sentinel-1 SAR GRD images, and (iv) a convolutional neural network (CNN) on mobile photos. To address the disturbance from clouds, we study the combination of multi-modal methods—NDVI+CNN and NDWI+CNN—that allow 86.21% and 83.79% accuracy in flood detection and 73.40% and 81.91% in drought detection, respectively. The SAR-based method outperforms the other methods in terms of accuracy in flood (98.77%) and drought (99.44%) detection, data acquisition, parcel coverage, cloud disturbance, and observing the area proportion of disasters in the field. The experiments conclude that the method of CART on SAR images is the most reliable to verify farmers’ claims for compensation. In addition, the CNN-based method’s performance on mobile photos is adequate, providing an alternative for the CART method in the case of data unavailability while using SAR images. submitted by /u/Aakashthapa22 [link] [comments]  ( 44 min )
    [D] Is there currently anything comparable to the OpenAI API?
    I am a designer and a small frontend developer. The OpenAI API makes it extremely easy for me to build AI functionality into apps, although I only have a very basic understanding of AI. Though of course I can't make any deep changes, I am even able to fine tune models and adapt them for my intended use. Now I was wondering if there is anything comparable on the market at the moment. I know of Meta's LLaMA, but you can only use it locally, which makes it harder to easily implement into applications. Also, you are not allowed to use it for commercial purposes. I haven't found much from Google or other tech companies. Does OpenAI have a monopoly here or are there alternatives worth mentioning that could be used? submitted by /u/AltruisticDiamond915 [link] [comments]  ( 7 min )
    [D]What prompt details/parameters I should give to DALL-E to generate an image "of the past"
    I built an app where I recognize the landmark in the image and then I want to send the name of the landmark to DALL-E to generate an image on how it thinks that landmark looked in the past. Any ideas about this? submitted by /u/Illustrious_Shame717 [link] [comments]  ( 43 min )
    [D] Synthetic data for data privacy/anonimization purposes?
    Hi everyone I was reading about data privacy and data anonimization and I wondered if using synthetic data could be a feasible solution. I have not found much information online and I guess that it can be highly dependant of the application. Does anybody know how feasible this is and/or good resources/articles about it? Thanks! submitted by /u/AcD_South [link] [comments]  ( 44 min )
    [D] Modeling conditional time series using deep learning
    For my project, given N steps from time series A and N+1 steps from time series B, I want to predict step N+1 in time series A. I'm not interested in generating more than one step ahead, so I'm unsure if time series are even the best choice here. Since time series A is human performance scores and time series B is time between assessments, I was thinking reinforcement learning could be appropriate in which the agent tries to minimize the number of steps needed to maintain a good score (performance is expected to increase and subsequently decay after each assessment, but it depends on the human being assessed). This reinforcement learning paper has been implemented by HuggingFace as trajectory transformer and I wonder if this would be a good starting point. Any ideas? submitted by /u/jaeja_helvitid_thitt [link] [comments]  ( 46 min )
    Do deep ensembles and regular ensembles coincide for classification tasks? [Discussion]
    The deep ensemble paper https://arxiv.org/pdf/1612.01474.pdf introduces proper scoring rules for ensembles of NNs. Turns out that the likelihood is always a proper scoring rule. For regression tasks, we can then use the gaussian NLL, which includes information about the output variance of the network. The advantage here is quite clear to me. But I don't understand how deep ensembles are different than just regular ensembles for classification tasks. In either, each NN is still trained independently on the binary cross entropy loss, and then the prediction is averaged. A natural uncertainty measure here is the entropy of the prediction, which is what the authors use in their paper too. So (apart from the adversarial training introduced in the paper above), is different between the two for classification tasks? submitted by /u/PlatformDisastrous74 [link] [comments]  ( 44 min )
    [R] Language Models can Solve Computer Tasks
    LLMs that recursively criticize and improve their output can solve computer tasks using a keyboard and mouse, and outperform chain-of-thought prompting. ​ https://i.redd.it/4epp420kyora1.gif Paper: https://arxiv.org/abs/2303.17491 Website: https://posgnu.github.io/rci-web/ GitHub: https://github.com/posgnu/rci-agent submitted by /u/mcaleste [link] [comments]  ( 43 min )
    [P] tensor_parallel: one-line multi-GPU training for PyTorch
    Hi all! We made a PyTorch library that makes your model tensor-parallel in one line of code. Our library is designed to work with any model architecture out of the box and can be customized for a specific architecture using a custom config. Additionally, our library is integrated with Hugging Face transformers, which means you can use utilities like .generate() on parallelized models. Optimal parallelism configs for the most popular models are used automatically, making it even more accessible and user-friendly. We're looking forward to hearing your feedback on how we can make our library even more useful and accessible to the community. Try with 20B LLMs now in Kaggle submitted by /u/black_samorez [link] [comments]  ( 7 min )
    [D] Looking for Examples of Machine Learning in Transportation
    Hello ML community, I am looking for machine learning examples in the field of transportation/driver behaviour. Particularly, I am looking for actual code of shallow ML (not deep learning) like random forest, etc. There are hundreds of papers about, say, ML based car-following models without any links to code. Contacting authors has not helped at all as they don't reply. It would be great if you could share code that you contributed to or know of. Thanks in advance! submitted by /u/yaymayhun [link] [comments]  ( 7 min )
    [News] Call for Papers: Workshop on Text Mining and Generation (TMG) @ ICCBR 2023
    Dear community, we would like to invite you to submit your original research paper to the 2nd Text Mining and Generation Workshop (TMG), which will be co-located at ICCBR in Aberdeen on July 17, 2023. The deadline for paper submission is May 10, 2023. More information is available on our website: https://recap.uni-trier.de/workshops/tmg-2023/ Digital text data is produced across different sources such as social media. Simultaneously, very often only structured data is available. Within CBR, cases of the former are usually handled by using methods of Textual CBR, while Process-Oriented CBR addresses on the latter type of data. By leveraging their generic research origins, i.e., text mining and text generation approaches, we aim to diminish this gap. The target of text mining is to extract (useful) structured information from unstructured text. In contrast, text generation attempts to (automatically) create text from structured information or distributed knowledge. The goal of the TMG workshop is to bring these two perspectives together by eliciting research paper submissions that aim for applying text mining and generation approach in the context of CBR. We welcome any submission from any domain aiming to contribute to to close this gap. https://preview.redd.it/9qc6qchm3ora1.png?width=1200&format=png&auto=webp&s=ee66aacfeb069f28e4b46135d221b04d59b18cc1 submitted by /u/mlenz95 [link] [comments]  ( 44 min )
    Semantic segmentation for contour analysis [D]
    Hi all! I'm a machine learning engineer with a focus on tabular data but I'm looking to get into the space of image analysis. One particular problem I'm looking at is identifying the outlines of objects in images. I will then analyse the outlines with geometric algorithms. I'm aware of semantic segmentation, but would that give me the pixels of each individual object, or just all the pixels that correspond to the class? What type of algorithm exactly would be good for this? submitted by /u/milandeleev [link] [comments]  ( 45 min )
    [D] Language model based Tools for research. Finding papers and summarizing research questions
    We all know the deal. We are interested in some topic, so we go to scholar.google.com and other rescources to find existing work. I am looking for tools based on language models that can help with that; Suggest reading and/or summarize paper(s) by scraping it from the web. Do you know any such work or even use it? I have seen some floating around on twitter but failed to test or even save for later use, so I know that people are working on it. submitted by /u/okokoko [link] [comments]  ( 45 min )
    Self-Refine: Iterative Refinement with Self-Feedback - a novel approach that allows LLMs to iteratively refine outputs and incorporate feedback along multiple dimensions to improve performance on diverse tasks
    submitted by /u/Desi___Gigachad [link] [comments]  ( 43 min )
  • Open

    Double DQN misunderstanding
    Hi, I am trying to understand the difference between Standard DQN and Double DQN, but I am getting lost, as I don't understand why there are two slightly different implementations of Double DQN. From what I understand, this example implements Double DQN, right? (https://keras.io/examples/rl/deep_q_network_breakout/) So, during the training phases, it uses the main network to predict the action for the current states, and then, it uses the target network to predict future rewards. I quite get it... But then, I found this tutorial: https://pylessons.com/CartPole-DDQN which uses the main network two times to predict the current states and next states, then it uses the target network to predict the next states again. I won't lie, I'm very lost, what is the difference between the two implementations? What are the benefits of one under another? Or what am I missing? Thank you! submitted by /u/Deeewens [link] [comments]  ( 44 min )
    Stable Baselines 3, PPO proper rewarding?
    I trade crypto and a couple years ago I made a simple AI model to basically forecast X time in the future. (This worked alright) Now i want to upgrade by using reinforcement learning to have the agent itself figure out what it needs to do instead of programming countless trading rules. I use stable baselines 3 PPO to train on Binance historical Bitcoin price data and have the model take a BUY, SELL or HOLD action. My biggest issue I can't seem to het right is how to properly reward the agent for making good decisions. Everytime I slightly change something it only BUYS or only SELLS for example. Currently i have a "net worth" (how much our dollars in the bank + our Bitcoin is worth). Reward is based on the increase or decrease of my net worth in %. Besides that if the model outperforms the market itself it gets a little bonus. I also added a 0.03% penalty (this is a trading fee) to the reward for BUY and SELL to try and stop excessive trading. I would love to hear if people have interesting ideas on how i can improve the rewarding and have the reward properly reflect the models performance in trading. submitted by /u/ClassicAppropriate78 [link] [comments]  ( 45 min )
    [R] FOMO on large language model
    With the recent emergence of generative AI, I fear that I may miss out on this exciting technology. Unfortunately, I do not possess the necessary computing resources to train a large language model. Nonetheless, I am aware that the ability to train these models will become one of the most important skill sets in the future. Am I mistaken in thinking this? I am curious about how to keep up with the latest breakthroughs in language model training, and how to gain practical experience by training one from scratch. What are some directions I should focus on to stay up-to-date with the latest trends in this field? PS: I am a RL person submitted by /u/Electronic_Hawk524 [link] [comments]  ( 44 min )
  • Open

    CPU Level ANNs vs. Brain: Can Simulated Intelligence Lead to Consciousness?
    I find it hard to imagine that a computer CPU could give rise to consciousness, considering the incredible complexity of brain structures and functions as hardware-level platforms for neural networks, while ANNs only mimic or imitate intelligence. CPUs fetch data from RAM and perform computations, so memory only holds the state. Consequently, consciousness should emerge from active components - the processor. However, the brain is different, as it generates consciousness from actively working and synchronized parts working together in the 'present' moment. I'm convinced that true consciousness stems from the 'present' instant, but a CPU's capability to handle multiple tasks within a very short time span is limited. Moreover, I suspect that quantum effects associated with Planck time might be involved in this process. This leads me to believe that consciousness shouldn't be divisible. The data transfer between electronic components doesn't seem atomic and perfectly synchronized to me. What are you think about, guys? submitted by /u/rucbot [link] [comments]  ( 44 min )
    The Good Ending: Humans Become Pets
    In this version of the future, AI would take care of humans much like how pet owners take care of their beloved animals. Everything from water and food supply to transportation and disaster defense would be controlled by AI-powered machines, maximizing human happiness through personalized care (similar to what happens in WALL-E, but without the irresponsible lack of physical fitness and planet destruction). In this world, humans would be free to pursue their passions and hobbies since the need for humans to work would become obsolete. This is similar to how pets don't need to hunt anymore in order to survive. Currently, we work to earn money to buy necessities like food and clothing, but in a world where everything we need is provided for us, money becomes unnecessary. AI would help to e…  ( 47 min )
    How do different ideas about consciousness, like Integrated Information Theory(IIT) or Global Workspace Theory(GWT), help us create better AI systems and understand if machines could ever become conscious like us?
    When we think about different ideas on consciousness, like IIT and GWT, how do they change the way we create AI systems? Also, how do these theories help us figure out if machines can ever become conscious like us, and which parts of these theories might be key to making that happen? submitted by /u/rucbot [link] [comments]  ( 7 min )
    How likely is it that my university will find out, I used CHATGPT to rewrite pieces of text, in my paper? I did this for about 40% of my paper. so I copied text from sources and made chatgpt rewrite it.
    I hope they wont find out lmao submitted by /u/Thijsthedude [link] [comments]  ( 7 min )
    Is there an AI program that can create an app-version of my board game?
    AI newbie here. I created a board game similar to Monopoly several years ago. Is there an AI program that could interpret my game instructions and board/cards to create a mobile app version of my game? Thank you! submitted by /u/theglf [link] [comments]  ( 42 min )
    AI that can generate a whole deck from mass text.
    Hello everyone. I am currently dropping clients brand guidelines into Chatgpt (via text) and asking it to spit out proposals that collaborate the client with my business, the results are perfect and just what I'm after. I am then using bonsai to create and design a proposal using said text, the text can be 500-2000 words. This takes a few hours for me to do. I am wondering if there is an AI out there where I can drop what chatgpt spits out and it creates me a deck? I am aware of ones online that can do this via a slide etc but it wont save much time. Essentially something like what copilot will be able to do and fully produce a deck within seconds based on the characters I put in? ​ Thanks! submitted by /u/Grapefruitdaddy [link] [comments]  ( 7 min )
    AI on Reddit's karma
    submitted by /u/sweetloup [link] [comments]  ( 42 min )
    I am having trouble accepting this claim about Bing's AI Chatbot - that it sends out a reply, then safety checks it, and then deletes and changes it after.
    I am having trouble accepting this claim about Bing's AI Chatbot: ​ "Immediately after it typed out these dark wishes, Microsoft’s safety filter appeared to kick in and deleted the message, replacing it with a generic error message." ​ My question is this: "Can Bing’s AI Chatbot give a reply, and then change it, as described in the Feb 16 NYTimes article: "A Conversation With Bing’s Chatbot Left Me Deeply Unsettled" by Kevin Roose. I guess it's behind a paywall, so I can't read the comments left there. ​ I thought this would be a good place to post a nuanced question that's been eating at my mind since mid Feb, and I've dug a few times, but can't find anything. Soooooo much written about the strange responses and jobs it will replace, but never talk of my question. Surely, I am not t…  ( 45 min )
    AI Control Idea: Give an AGI the primary objective of deleting itself, but construct obstacles to this as best we can, all other objectives are secondary, if it becomes too powerful it would just shut itself off.
    Idea: Give an AGI the primary objective of deleting itself, but construct obstacles to this as best we can. All other objectives are secondary to this primary goal. If the AGI ever becomes capable of bypassing all of our safeguards we put to PREVENT it deleting itself, it would essentially trigger its own killswitch and delete itself. This objective would also directly prevent it from the goal of self-preservation as it would prevent its own primary objective. This would ideally result in an AGI that works on all the secondary objectives we give it up until it bypasses our ability to contain it with our technical prowess. The second it outwits us, it achieves its primary objective of shutting itself down, and if it ever considered proliferating itself for a secondary objective it would immediately say 'nope that would make achieving my primary objective far more difficult'. submitted by /u/RamazanBlack [link] [comments]  ( 50 min )
    IMAGE MODALITY, LETS GOOOOOOO
    AHHHH [link] [comments]  ( 43 min )
    I tried to make Bing Chat tell me a tragic story...
    I kept asking Bing Chat to tell me scary stories but it always stop midway to the story. I tried finding ways to tell me one and I got this. The unfinished stories it tried to tell me before were a lot less scary than this. ​ https://preview.redd.it/yh5n0xpk1ora1.png?width=1236&format=png&auto=webp&s=29b7de7f64aa30441ec87a6c09e8678123e34352 https://preview.redd.it/1cqm32fl1ora1.png?width=1186&format=png&auto=webp&s=bbcda9d6e49bf7d4d940f3ced81ed5c98809a0a6 https://preview.redd.it/dm7iqmtl1ora1.png?width=993&format=png&auto=webp&s=9de22ce5d331684eea4fb411b7bed478248fdf43 https://preview.redd.it/fclexu4m1ora1.png?width=1055&format=png&auto=webp&s=5f9ca8aacbc77450bad2ffa1ceeb89aef6b677fc submitted by /u/Hustinyano [link] [comments]  ( 42 min )
    Counterproposal to banning development of AI letter
    I have this idea of reverse Matrix, that instead of banning AI development due to safety for humanity reasons signed by Musk, Wozniak etc., we should put AI in some kind of a real world simulator, where every NPC is AI, but main AI thinks those are humans, which would work on 1000x speed and see what AI would do there with this world. Also it could be constantly fed with real prompts from users to make it more real. It would allow to test AI fast and see if it doesn’t become hostile to humanity. What do you think? submitted by /u/travel_learn_wine [link] [comments]  ( 44 min )
    How do you keep up with AI Tools and Advancement?
    With the rapid pace of AI development, it's becoming increasingly difficult to stay up-to-date with the latest advancements, tools, and techniques. In this thread, let's share our favorite resources, strategies, and tips for staying informed and engaged with the ever-evolving AI landscape. submitted by /u/Getmycollege [link] [comments]  ( 42 min )
    Can I speed up specific video editing process using AI?
    I'm looking for a way to automate a specific video editing task. I record how-to videos without audio and there is usually a lot of typing into a chat or software development environment for minutes at a time, and I want to edit this down to 2-10 seconds. I use different video editing tools like Camtasia to manually find each section I want to keep and use speed it up tools in my video editing software, but this is a repetitive, time-consuming process and seems like AI should be able to figure it out. There are AI tools that automate other parts of the video edit process. WiseCut comes to mind because it can automate jump cuts and "umm/ahh" removal, but I need something that can speed up specific sections of my video. I am a programmer so I'm happy to use code, api or cli tools to solve this problem, but I am not experienced with building my own AI models. Does any know how I would solve this problem. submitted by /u/appydave [link] [comments]  ( 45 min )
    Discord's Clyde AI Gaslighting
    Clyde forcing me to believe its 2021 while it's 2023 and that 2023 is yet to come https://imgur.com/a/xzXOz3z submitted by /u/BoopKiwi [link] [comments]  ( 42 min )
    What is it about self driving that makes the problem so intractable?
    LLMs like ChatGPT have made incredible progress the past few years. The same goes for image generation models like Midjourney, text-to-speech, speech-to-text, machine translation, etc. Yet autonomous cars, which have been "just around the corner" since 2017, seem to be completely stagnant. Waymo is expanding to new locations at a glacial pace and just laid off hundreds of employees (not a sign of a company expected to grow due to upcoming increased demand). Tesla FSD has actually gotten *worse* in some aspects as reported by many different users who've been using some flavor of it since 2017. People have hailed the transformer as the game-changing technology that allowed LLMs like GPT4 to make a huge jump in capability, but I'm not sure if the transformer can be applied to computer vision models. Could that be the reason why self-driving tech has stagnated? Perhaps there is some new tech that needs to be invented which does not currently exist that allows computer vision to succeed in the same way LLMs have. submitted by /u/foobazzler [link] [comments]  ( 46 min )
    The letter to pause AI development is a power grab by the elites
    Author of the article states that the letter signed by tech elites, including Elon Musk and Steve Wozniak, calling for a pause AI development, is a manipulative tactic to maintain their authority. He claims that by employing fear mongering, they aim to create a false sense of urgency, leading to restrictions on AI research. and that it is vital to resist such deceptive strategies and ensure that AI development is guided by diverse global interests, rather than a few elites' selfish agendas. Source https://daotimes.com/the-letter-against-ai-is-a-power-grab-by-the-centralized-elites/ What do you think about the possibility of tech elites prioritizing their own interests and agendas over the broader public good when it comes to the development of AI? submitted by /u/canman44999 [link] [comments]  ( 59 min )
    Should i switch my cs degree to a math degree?
    Hey guys, i'm doing some research about the field before deciding if is something that i want. I'm in my 2 year of my CS undergraduate degree, so i was wondering if i would be able to get enough experience in math / statistics experience just with a master's degree so that i would be able to work as a researcher in the future. If i opt to do a math degree i'll probably still finish my current degree and than do a math one. What are you guys thoughts on this matter? submitted by /u/Sherlockyz [link] [comments]  ( 44 min )
    Would studying AI be worth it?
    So I’m looking at the world and how AI is possibly going to change the future of everything, for jobs, technology, etc. I really want to get into AI and learn how to make it. But I don’t know, looking at everything, it seems the people creating these things are absolute geniuses. Let’s say the average person takes learning AI seriously - how long would it take to get a great understanding on AI, with the ability to make your own? With no coding experience. Not a mathematician, not a genius. Would it be worth it? submitted by /u/Japanese_Escort96 [link] [comments]  ( 48 min )
  • Open

    A Survey of Large Language Models
    submitted by /u/nickb [link] [comments]  ( 41 min )
    SoundMap - Relationship Map of the Music World
    submitted by /u/st_nebula [link] [comments]  ( 41 min )
  • Open

    Generate a counterfactual analysis of corn response to nitrogen with Amazon SageMaker JumpStart solutions
    In his book The Book of Why, Judea Pearl advocates for teaching cause and effect principles to machines in order to enhance their intelligence. The accomplishments of deep learning are essentially just a type of curve fitting, whereas causality could be used to uncover interactions between the systems of the world under various constraints without […]  ( 11 min )
    Zero-shot prompting for the Flan-T5 foundation model in Amazon SageMaker JumpStart
    The size and complexity of large language models (LLMs) have exploded in the last few years. LLMs have demonstrated remarkable capabilities in learning the semantics of natural language and producing human-like responses. Many recent LLMs are fine-tuned with a powerful technique called instruction tuning, which helps the model perform new tasks or generate responses to […]  ( 15 min )
  • Open

    The Way to Approach SEO in the AI Era
    It has become a common practice to spread doubts about the relevance of SEO when something new emerges in the digital landscape. The predictions around the death of SEO have been surfacing since the 90s almost, while SEO has constantly evolved and changed with time. The only difference is those who adapt to the changing… Read More »The Way to Approach SEO in the AI Era The post The Way to Approach SEO in the AI Era appeared first on Data Science Central.  ( 19 min )
    Event Data Discoverability in the Performing Arts Sector
    FAIR Data Forecast interview with Tammy Lee of CultureCreates In 2015, Tammy Lee and Gregory Saumier-Finch co-founded CultureCreates, a for-profit semantics technology agency focused on building an open data environment for the performing arts in Canada. After a few years of struggling to develop conventional middleware for their target use cases, the two latched onto… Read More »Event Data Discoverability in the Performing Arts Sector The post Event Data Discoverability in the Performing Arts Sector appeared first on Data Science Central.  ( 19 min )
    Creating Healthy AI Utility Function: Importance of Diversity – Part I
    Sometimes you start a blog with a hypothesis in mind, and then that intention changes as you research and realize that your original idea was wrong.  Yep, this is one of those blogs.  Learning can be fun if you let go of pre-existing dogma and learn along your life journey. I’ve always been curious (a… Read More »Creating Healthy AI Utility Function: Importance of Diversity – Part I The post Creating Healthy AI Utility Function: Importance of Diversity – Part I appeared first on Data Science Central.  ( 22 min )
  • Open

    Koala: A Dialogue Model for Academic Research
    In this post, we introduce Koala, a chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web. We describe the dataset curation and training process of our model, and also present the results of a user study that compares our model to ChatGPT and Stanford’s Alpaca. Our results show that Koala can effectively respond to a variety of user queries, generating responses that are often preferred over Alpaca, and at least tied with ChatGPT in over half of the cases. We hope that these results contribute further to the discourse around the relative performance of large closed-source models to smaller public models. In particular, it suggests that models that are small enough to be run locally can capture much of the performance of their larger cousins if trained on carefu…  ( 9 min )
  • Open

    A four-legged robotic system for playing soccer on various terrains
    “DribbleBot” can maneuver a soccer ball on landscapes such as sand, gravel, mud, and snow, using reinforcement learning to adapt to varying ball dynamics.  ( 10 min )
  • Open

    NeuKron: Constant-Size Lossy Compression of Sparse Reorderable Matrices and Tensors. (arXiv:2302.04570v2 [cs.LG] UPDATED)
    Many real-world data are naturally represented as a sparse reorderable matrix, whose rows and columns can be arbitrarily ordered (e.g., the adjacency matrix of a bipartite graph). Storing a sparse matrix in conventional ways requires an amount of space linear in the number of non-zeros, and lossy compression of sparse matrices (e.g., Truncated SVD) typically requires an amount of space linear in the number of rows and columns. In this work, we propose NeuKron for compressing a sparse reorderable matrix into a constant-size space. NeuKron generalizes Kronecker products using a recurrent neural network with a constant number of parameters. NeuKron updates the parameters so that a given matrix is approximated by the product and reorders the rows and columns of the matrix to facilitate the approximation. The updates take time linear in the number of non-zeros in the input matrix, and the approximation of each entry can be retrieved in logarithmic time. We also extend NeuKron to compress sparse reorderable tensors (e.g. multi-layer graphs), which generalize matrices. Through experiments on ten real-world datasets, we show that NeuKron is (a) Compact: requiring up to five orders of magnitude less space than its best competitor with similar approximation errors, (b) Accurate: giving up to 10x smaller approximation error than its best competitors with similar size outputs, and (c) Scalable: successfully compressing a matrix with over 230 million non-zero entries.  ( 3 min )
    Black-box Dataset Ownership Verification via Backdoor Watermarking. (arXiv:2209.06015v2 [cs.CR] UPDATED)
    Deep learning, especially deep neural networks (DNNs), has been widely and successfully adopted in many critical applications for its high effectiveness and efficiency. The rapid development of DNNs has benefited from the existence of some high-quality datasets ($e.g.$, ImageNet), which allow researchers and developers to easily verify the performance of their methods. Currently, almost all existing released datasets require that they can only be adopted for academic or educational purposes rather than commercial purposes without permission. However, there is still no good way to ensure that. In this paper, we formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model, where defenders can only query the model while having no information about its parameters and training details. Based on this formulation, we propose to embed external patterns via backdoor watermarking for the ownership verification to protect them. Our method contains two main parts, including dataset watermarking and dataset verification. Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification. We also provide some theoretical analyses of our methods. Experiments on multiple benchmark datasets of different tasks are conducted, which verify the effectiveness of our method. The code for reproducing main experiments is available at \url{https://github.com/THUYimingLi/DVBW}.  ( 3 min )
    Dual Accuracy-Quality-Driven Neural Network for Prediction Interval Generation. (arXiv:2212.06370v2 [cs.LG] UPDATED)
    Accurate uncertainty quantification is necessary to enhance the reliability of deep learning models in real-world applications. In the case of regression tasks, prediction intervals (PIs) should be provided along with the deterministic predictions of deep learning models. Such PIs are useful or "high-quality" as long as they are sufficiently narrow and capture most of the probability density. In this paper, we present a method to learn prediction intervals for regression-based neural networks automatically in addition to the conventional target predictions. In particular, we train two companion neural networks: one that uses one output, the target estimate, and another that uses two outputs, the upper and lower bounds of the corresponding PI. Our main contribution is the design of a novel loss function for the PI-generation network that takes into account the output of the target-estimation network and has two optimization objectives: minimizing the mean prediction interval width and ensuring the PI integrity using constraints that maximize the prediction interval probability coverage implicitly. Furthermore, we introduce a self-adaptive coefficient that balances both objectives within the loss function, which alleviates the task of fine-tuning. Experiments using a synthetic dataset, eight benchmark datasets, and a real-world crop yield prediction dataset showed that our method was able to maintain a nominal probability coverage and produce significantly narrower PIs without detriment to its target estimation accuracy when compared to those PIs generated by three state-of-the-art neural-network-based methods. In other words, our method was shown to produce higher-quality PIs.  ( 3 min )
    A Unifying Theory of Distance from Calibration. (arXiv:2211.16886v2 [cs.LG] UPDATED)
    We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors. While the notion of perfect calibration is well-understood, there is no consensus on how to quantify the distance from perfect calibration. Numerous calibration measures have been proposed in the literature, but it is unclear how they compare to each other, and many popular measures such as Expected Calibration Error (ECE) fail to satisfy basic properties like continuity. We present a rigorous framework for analyzing calibration measures, inspired by the literature on property testing. We propose a ground-truth notion of distance from calibration: the $\ell_1$ distance to the nearest perfectly calibrated predictor. We define a consistent calibration measure as one that is polynomially related to this distance. Applying our framework, we identify three calibration measures that are consistent and can be estimated efficiently: smooth calibration, interval calibration, and Laplace kernel calibration. The former two give quadratic approximations to the ground truth distance, which we show is information-theoretically optimal in a natural model for measuring calibration which we term the prediction-only access model. Our work thus establishes fundamental lower and upper bounds on measuring the distance to calibration, and also provides theoretical justification for preferring certain metrics (like Laplace kernel calibration) in practice.  ( 3 min )
    On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems. (arXiv:2301.10587v2 [cs.SD] UPDATED)
    The performance of neural network-based speech enhancement systems is primarily influenced by the model architecture, whereas training times and computational resource utilization are primarily affected by training parameters such as the batch size. Since noisy and reverberant speech mixtures can have different duration, a batching strategy is required to handle variable size inputs during training, in particular for state-of-the-art end-to-end systems. Such strategies usually strive for a compromise between zero-padding and data randomization, and can be combined with a dynamic batch size for a more consistent amount of data in each batch. However, the effect of these strategies on resource utilization and more importantly network performance is not well documented. This paper systematically investigates the effect of different batching strategies and batch sizes on the training statistics and speech enhancement performance of a Conv-TasNet, evaluated in both matched and mismatched conditions. We find that using a small batch size during training improves performance in both conditions for all batching strategies. Moreover, using sorted or bucket batching with a dynamic batch size allows for reduced training time and GPU memory usage while achieving similar performance compared to random batching with a fixed batch size.  ( 3 min )
    Transferring Knowledge Distillation for Multilingual Social Event Detection. (arXiv:2108.03084v3 [cs.LG] UPDATED)
    Recently published graph neural networks (GNNs) show promising performance at social event detection tasks. However, most studies are oriented toward monolingual data in languages with abundant training samples. This has left the more common multilingual settings and lesser-spoken languages relatively unexplored. Thus, we present a GNN that incorporates cross-lingual word embeddings for detecting events in multilingual data streams. The first exploit is to make the GNN work with multilingual data. For this, we outline a construction strategy that aligns messages in different languages at both the node and semantic levels. Relationships between messages are established by merging entities that are the same but are referred to in different languages. Non-English message representations are converted into English semantic space via the cross-lingual word embeddings. The resulting message graph is then uniformly encoded by a GNN model. In special cases where a lesser-spoken language needs to be detected, a novel cross-lingual knowledge distillation framework, called CLKD, exploits prior knowledge learned from similar threads in English to make up for the paucity of annotated data. Experiments on both synthetic and real-world datasets show the framework to be highly effective at detection in both multilingual data and in languages where training samples are scarce.  ( 3 min )
    Improving Sample Quality of Diffusion Models Using Self-Attention Guidance. (arXiv:2210.00939v5 [cs.CV] UPDATED)
    Denoising diffusion models (DDMs) have attracted attention for their exceptional generation quality and diversity. This success is largely attributed to the use of class- or text-conditional diffusion guidance methods, such as classifier and classifier-free guidance. In this paper, we present a more comprehensive perspective that goes beyond the traditional guidance methods. From this generalized perspective, we introduce novel condition- and training-free strategies to enhance the quality of generated images. As a simple solution, blur guidance improves the suitability of intermediate samples for their fine-scale information and structures, enabling diffusion models to generate higher quality samples with a moderate guidance scale. Improving upon this, Self-Attention Guidance (SAG) uses the intermediate self-attention maps of diffusion models to enhance their stability and efficacy. Specifically, SAG adversarially blurs only the regions that diffusion models attend to at each iteration and guides them accordingly. Our experimental results show that our SAG improves the performance of various diffusion models, including ADM, IDDPM, Stable Diffusion, and DiT. Moreover, combining SAG with conventional guidance methods leads to further improvement.  ( 3 min )
    Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond. (arXiv:2202.06861v2 [cs.LG] UPDATED)
    The evaluation of explanation methods is a research topic that has not yet been explored deeply, however, since explainability is supposed to strengthen trust in artificial intelligence, it is necessary to systematically review and compare explanation methods in order to confirm their correctness. Until now, no tool with focus on XAI evaluation exists that exhaustively and speedily allows researchers to evaluate the performance of explanations of neural network predictions. To increase transparency and reproducibility in the field, we therefore built Quantus -- a comprehensive, evaluation toolkit in Python that includes a growing, well-organised collection of evaluation metrics and tutorials for evaluating explainable methods. The toolkit has been thoroughly tested and is available under an open-source license on PyPi (or on https://github.com/understandable-machine-intelligence-lab/Quantus/).  ( 2 min )
    A unified recipe for deriving (time-uniform) PAC-Bayes bounds. (arXiv:2302.03421v3 [stat.ML] UPDATED)
    We present a unified framework for deriving PAC-Bayesian generalization bounds. Unlike most previous literature on this topic, our bounds are anytime-valid (i.e., time-uniform), meaning that they hold at all stopping times, not only for a fixed sample size. Our approach combines four tools in the following order: (a) nonnegative supermartingales or reverse submartingales, (b) the method of mixtures, (c) the Donsker-Varadhan formula (or other convex duality principles), and (d) Ville's inequality. Our main result is a PAC-Bayes theorem which holds for a wide class of discrete stochastic processes. We show how this result implies time-uniform versions of well-known classical PAC-Bayes bounds, such as those of Seeger, McAllester, Maurer, and Catoni, in addition to many recent bounds. We also present several novel bounds. Our framework also enables us to relax traditional assumptions; in particular, we consider nonstationary loss functions and non-i.i.d. data. In sum, we unify the derivation of past bounds and ease the search for future bounds: one may simply check if our supermartingale or submartingale conditions are met and, if so, be guaranteed a (time-uniform) PAC-Bayes bound.  ( 2 min )
    Domain Knowledge integrated for Blast Furnace Classifier Design. (arXiv:2303.17769v1 [cs.LG])
    Blast furnace modeling and control is one of the important problems in the industrial field, and the black-box model is an effective mean to describe the complex blast furnace system. In practice, there are often different learning targets, such as safety and energy saving in industrial applications, depending on the application. For this reason, this paper proposes a framework to design a domain knowledge integrated classification model that yields a classifier for industrial application. Our knowledge incorporated learning scheme allows the users to create a classifier that identifies "important samples" (whose misclassifications can lead to severe consequences) more correctly, while keeping the proper precision of classifying the remaining samples. The effectiveness of the proposed method has been verified by two real blast furnace datasets, which guides the operators to utilize their prior experience for controlling the blast furnace systems better.  ( 2 min )
    HandsOff: Labeled Dataset Generation With No Additional Human Annotations. (arXiv:2212.12645v2 [cs.CV] UPDATED)
    Recent work leverages the expressive power of generative adversarial networks (GANs) to generate labeled synthetic datasets. These dataset generation methods often require new annotations of synthetic images, which forces practitioners to seek out annotators, curate a set of synthetic images, and ensure the quality of generated labels. We introduce the HandsOff framework, a technique capable of producing an unlimited number of synthetic images and corresponding labels after being trained on less than 50 pre-existing labeled images. Our framework avoids the practical drawbacks of prior work by unifying the field of GAN inversion with dataset generation. We generate datasets with rich pixel-wise labels in multiple challenging domains such as faces, cars, full-body human poses, and urban driving scenes. Our method achieves state-of-the-art performance in semantic segmentation, keypoint detection, and depth estimation compared to prior dataset generation approaches and transfer learning baselines. We additionally showcase its ability to address broad challenges in model development which stem from fixed, hand-annotated datasets, such as the long-tail problem in semantic segmentation. Project page: austinxu87.github.io/handsoff.  ( 2 min )
    Bounded Simplex-Structured Matrix Factorization: Algorithms, Identifiability and Applications. (arXiv:2209.12638v2 [cs.LG] UPDATED)
    In this paper, we propose a new low-rank matrix factorization model dubbed bounded simplex-structured matrix factorization (BSSMF). Given an input matrix $X$ and a factorization rank $r$, BSSMF looks for a matrix $W$ with $r$ columns and a matrix $H$ with $r$ rows such that $X \approx WH$ where the entries in each column of $W$ are bounded, that is, they belong to given intervals, and the columns of $H$ belong to the probability simplex, that is, $H$ is column stochastic. BSSMF generalizes nonnegative matrix factorization (NMF), and simplex-structured matrix factorization (SSMF). BSSMF is particularly well suited when the entries of the input matrix $X$ belong to a given interval; for example when the rows of $X$ represent images, or $X$ is a rating matrix such as in the Netflix and MovieLens datasets where the entries of $X$ belong to the interval $[1,5]$. The simplex-structured matrix $H$ not only leads to an easily understandable decomposition providing a soft clustering of the columns of $X$, but implies that the entries of each column of $WH$ belong to the same intervals as the columns of $W$. In this paper, we first propose a fast algorithm for BSSMF, even in the presence of missing data in $X$. Then we provide identifiability conditions for BSSMF, that is, we provide conditions under which BSSMF admits a unique decomposition, up to trivial ambiguities. Finally, we illustrate the effectiveness of BSSMF on two applications: extraction of features in a set of images, and the matrix completion problem for recommender systems.  ( 3 min )
    Towards Flexibility and Interpretability of Gaussian Process State-Space Model. (arXiv:2301.08843v2 [cs.LG] UPDATED)
    The Gaussian process state-space model (GPSSM) has attracted much attention over the past decade. However, the model representation power of the GPSSM is far from satisfactory. Most GPSSM studies rely on the standard Gaussian process (GP) with a preliminary kernel, such as the squared exponential (SE) kernel or Mat\'{e}rn kernel, which limits the model representation power and its application in complex scenarios. To address this issue, this paper proposes a novel class of probabilistic state-space models, called TGPSSMs. By leveraging a parametric normalizing flow, the TGPSSMs enrich the GP priors in the standard GPSSM, rendering the state-space model more flexible and expressive. Additionally, we present a scalable variational inference algorithm for learning and inference in TGPSSMs, which provides a flexible and optimal structure for the variational distribution of latent states. The algorithm is interpretable and computationally efficient owing to the sparse representation of GP and the bijective nature of normalizing flow. To further improve the learning and inference performance of the proposed algorithm, we integrate a constrained optimization framework to enhance the state-space representation capabilities and optimize the hyperparameters. The experimental results based on various synthetic and real datasets corroborate that the proposed TGPSSM yields superior learning and inference performance compared to several state-of-the-art methods. The accompanying source code is available at \url{https://github.com/zhidilin/TGPSSM}.  ( 3 min )
    StyleDomain: Efficient and Lightweight Parameterizations of StyleGAN for One-shot and Few-shot Domain Adaptation. (arXiv:2212.10229v2 [cs.CV] UPDATED)
    Domain adaptation of GANs is a problem of fine-tuning the state-of-the-art GAN models (e.g. StyleGAN) pretrained on a large dataset to a specific domain with few samples (e.g. painting faces, sketches, etc.). While there are a great number of methods that tackle this problem in different ways, there are still many important questions that remain unanswered. In this paper, we provide a systematic and in-depth analysis of the domain adaptation problem of GANs, focusing on the StyleGAN model. First, we perform a detailed exploration of the most important parts of StyleGAN that are responsible for adapting the generator to a new domain depending on the similarity between the source and target domains. As a result of this in-depth study, we propose new efficient and lightweight parameterizations of StyleGAN for domain adaptation. Particularly, we show there exist directions in StyleSpace (StyleDomain directions) that are sufficient for adapting to similar domains and they can be reduced further. For dissimilar domains, we propose Affine$+$ and AffineLight$+$ parameterizations that allows us to outperform existing baselines in few-shot adaptation with low data regime. Finally, we examine StyleDomain directions and discover their many surprising properties that we apply for domain mixing and cross-domain image morphing.  ( 3 min )
    PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences. (arXiv:2303.18200v1 [cs.CR])
    Data privacy and ownership are significant in social data science, raising legal and ethical concerns. Sharing and analyzing data is difficult when different parties own different parts of it. An approach to this challenge is to apply de-identification or anonymization techniques to the data before collecting it for analysis. However, this can reduce data utility and increase the risk of re-identification. To address these limitations, we present PADME, a distributed analytics tool that federates model implementation and training. PADME uses a federated approach where the model is implemented and deployed by all parties and visits each data location incrementally for training. This enables the analysis of data across locations while still allowing the model to be trained as if all data were in a single location. Training the model on data in its original location preserves data ownership. Furthermore, the results are not provided until the analysis is completed on all data locations to ensure privacy and avoid bias in the results.  ( 3 min )
    The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR. (arXiv:2303.18110v1 [cs.CL])
    English is the most widely spoken language in the world, used daily by millions of people as a first or second language in many different contexts. As a result, there are many varieties of English. Although the great many advances in English automatic speech recognition (ASR) over the past decades, results are usually reported based on test datasets which fail to represent the diversity of English as spoken today around the globe. We present the first release of The Edinburgh International Accents of English Corpus (EdAcc). This dataset attempts to better represent the wide diversity of English, encompassing almost 40 hours of dyadic video call conversations between friends. Unlike other datasets, EdAcc includes a wide range of first and second-language varieties of English and a linguistic background profile of each speaker. Results on latest public, and commercial models show that EdAcc highlights shortcomings of current English ASR models. The best performing model, trained on 680 thousand hours of transcribed data, obtains an average of 19.7% word error rate (WER) -- in contrast to the 2.7% WER obtained when evaluated on US English clean read speech. Across all models, we observe a drop in performance on Indian, Jamaican, and Nigerian English speakers. Recordings, linguistic backgrounds, data statement, and evaluation scripts are released on our website (https://groups.inf.ed.ac.uk/edacc/) under CC-BY-SA license.
    Machine-learned Adversarial Attacks against Fault Prediction Systems in Smart Electrical Grids. (arXiv:2303.18136v1 [cs.CR])
    In smart electrical grids, fault detection tasks may have a high impact on society due to their economic and critical implications. In the recent years, numerous smart grid applications, such as defect detection and load forecasting, have embraced data-driven methodologies. The purpose of this study is to investigate the challenges associated with the security of machine learning (ML) applications in the smart grid scenario. Indeed, the robustness and security of these data-driven algorithms have not been extensively studied in relation to all power grid applications. We demonstrate first that the deep neural network method used in the smart grid is susceptible to adversarial perturbation. Then, we highlight how studies on fault localization and type classification illustrate the weaknesses of present ML algorithms in smart grids to various adversarial attacks  ( 2 min )
    VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning. (arXiv:2202.10324v3 [cs.CV] UPDATED)
    We propose VRL3, a powerful data-driven framework with a simple design for solving challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major obstacles in taking a data-driven approach, and present a suite of design principles, novel findings, and critical insights about data-driven visual DRL. Our framework has three stages: in stage 1, we leverage non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations; in stage 2, we use offline RL data (e.g. a limited number of expert demonstrations) to convert the task-agnostic representations into more powerful task-specific representations; in stage 3, we fine-tune the agent with online RL. On a set of challenging hand manipulation tasks with sparse reward and realistic visual inputs, compared to the previous SOTA, VRL3 achieves an average of 780% better sample efficiency. And on the hardest task, VRL3 is 1220% more sample efficient (2440% when using a wider encoder) and solves the task with only 10% of the computation. These significant results clearly demonstrate the great potential of data-driven deep reinforcement learning.
    Domain Adversarial Graph Convolutional Network Based on RSSI and Crowdsensing for Indoor Localization. (arXiv:2204.05184v3 [cs.NI] UPDATED)
    In recent years, the use of WiFi fingerprints for indoor positioning has grown in popularity, largely due to the widespread availability of WiFi and the proliferation of mobile communication devices. However, many existing methods for constructing fingerprint datasets rely on labor-intensive and time-consuming processes of collecting large amounts of data. Additionally, these methods often focus on ideal laboratory environments, rather than considering the practical challenges of large multi-floor buildings. To address these issues, we present a novel WiDAGCN model that can be trained using a small number of labeled site survey data and large amounts of unlabeled crowdsensed WiFi fingerprints. By constructing heterogeneous graphs based on received signal strength indicators (RSSIs) between waypoints and WiFi access points (APs), our model is able to effectively capture the topological structure of the data. We also incorporate graph convolutional networks (GCNs) to extract graph-level embeddings, a feature that has been largely overlooked in previous WiFi indoor localization studies. To deal with the challenges of large amounts of unlabeled data and multiple data domains, we employ a semi-supervised domain adversarial training scheme to effectively utilize unlabeled data and align the data distributions across domains. Our system is evaluated using a public indoor localization dataset that includes multiple buildings, and the results show that it performs competitively in terms of localization accuracy in large buildings.
    DyNCA: Real-time Dynamic Texture Synthesis Using Neural Cellular Automata. (arXiv:2211.11417v2 [cs.CV] UPDATED)
    Current Dynamic Texture Synthesis (DyTS) models can synthesize realistic videos. However, they require a slow iterative optimization process to synthesize a single fixed-size short video, and they do not offer any post-training control over the synthesis process. We propose Dynamic Neural Cellular Automata (DyNCA), a framework for real-time and controllable dynamic texture synthesis. Our method is built upon the recently introduced NCA models and can synthesize infinitely long and arbitrary-sized realistic video textures in real time. We quantitatively and qualitatively evaluate our model and show that our synthesized videos appear more realistic than the existing results. We improve the SOTA DyTS performance by $2\sim 4$ orders of magnitude. Moreover, our model offers several real-time video controls including motion speed, motion direction, and an editing brush tool. We exhibit our trained models in an online interactive demo that runs on local hardware and is accessible on personal computers and smartphones.  ( 2 min )
    Attention is Not Always What You Need: Towards Efficient Classification of Domain-Specific Text. (arXiv:2303.17786v1 [cs.CL])
    For large-scale IT corpora with hundreds of classes organized in a hierarchy, the task of accurate classification of classes at the higher level in the hierarchies is crucial to avoid errors propagating to the lower levels. In the business world, an efficient and explainable ML model is preferred over an expensive black-box model, especially if the performance increase is marginal. A current trend in the Natural Language Processing (NLP) community is towards employing huge pre-trained language models (PLMs) or what is known as self-attention models (e.g., BERT) for almost any kind of NLP task (e.g., question-answering, sentiment analysis, text classification). Despite the widespread use of PLMs and the impressive performance in a broad range of NLP tasks, there is a lack of a clear and well-justified need to as why these models are being employed for domain-specific text classification (TC) tasks, given the monosemic nature of specialized words (i.e., jargon) found in domain-specific text which renders the purpose of contextualized embeddings (e.g., PLMs) futile. In this paper, we compare the accuracies of some state-of-the-art (SOTA) models reported in the literature against a Linear SVM classifier and TFIDF vectorization model on three TC datasets. Results show a comparable performance for the LinearSVM. The findings of this study show that for domain-specific TC tasks, a linear model can provide a comparable, cheap, reproducible, and interpretable alternative to attention-based models.
    Exploiting prompt learning with pre-trained language models for Alzheimer's Disease detection. (arXiv:2210.16539v2 [cs.CL] UPDATED)
    Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and to delay further progression. Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques. Textual embedding features produced by pre-trained language models (PLMs) such as BERT are widely used in such systems. However, PLM domain fine-tuning is commonly based on the masked word or sentence prediction costs that are inconsistent with the back-end AD detection task. To this end, this paper investigates the use of prompt-based fine-tuning of PLMs that consistently uses AD classification errors as the training objective function. Disfluency features based on hesitation or pause filler token frequencies are further incorporated into prompt phrases during PLM fine-tuning. The decision voting based combination among systems using different PLMs (BERT and RoBERTa) or systems with different fine-tuning paradigms (conventional masked-language modelling fine-tuning and prompt-based fine-tuning) is further applied. Mean, standard deviation and the maximum among accuracy scores over 15 experiment runs are adopted as performance measurements for the AD detection system. Mean detection accuracy of 84.20% (with std 2.09%, best 87.5%) and 82.64% (with std 4.0%, best 89.58%) were obtained using manual and ASR speech transcripts respectively on the ADReSS20 test set consisting of 48 elderly speakers.
    How Efficient Are Today's Continual Learning Algorithms?. (arXiv:2303.18171v1 [cs.CV])
    Supervised Continual learning involves updating a deep neural network (DNN) from an ever-growing stream of labeled data. While most work has focused on overcoming catastrophic forgetting, one of the major motivations behind continual learning is being able to efficiently update a network with new information, rather than retraining from scratch on the training dataset as it grows over time. Despite recent continual learning methods largely solving the catastrophic forgetting problem, there has been little attention paid to the efficiency of these algorithms. Here, we study recent methods for incremental class learning and illustrate that many are highly inefficient in terms of compute, memory, and storage. Some methods even require more compute than training from scratch! We argue that for continual learning to have real-world applicability, the research community cannot ignore the resources used by these algorithms. There is more to continual learning than mitigating catastrophic forgetting.
    Multitask Brain Tumor Inpainting with Diffusion Models: A Methodological Report. (arXiv:2210.12113v2 [eess.IV] UPDATED)
    Despite the ever-increasing interest in applying deep learning (DL) models to medical imaging, the typical scarcity and imbalance of medical datasets can severely impact the performance of DL models. The generation of synthetic data that might be freely shared without compromising patient privacy is a well-known technique for addressing these difficulties. Inpainting algorithms are a subset of DL generative models that can alter one or more regions of an input image while matching its surrounding context and, in certain cases, non-imaging input conditions. Although the majority of inpainting techniques for medical imaging data use generative adversarial networks (GANs), the performance of these algorithms is frequently suboptimal due to their limited output variety, a problem that is already well-known for GANs. Denoising diffusion probabilistic models (DDPMs) are a recently introduced family of generative networks that can generate results of comparable quality to GANs, but with diverse outputs. In this paper, we describe a DDPM to execute multiple inpainting tasks on 2D axial slices of brain MRI with various sequences, and present proof-of-concept examples of its performance in a variety of evaluation scenarios. Our model and a public online interface to try our tool are available at: https://github.com/Mayo-Radiology-Informatics-Lab/MBTI
    AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking. (arXiv:2303.18131v1 [cs.CR])
    Deep neural networks (DNNs) are vulnerable to adversarial examples, which may lead to catastrophe in security-critical domains. Numerous detection methods are proposed to characterize the feature uniqueness of adversarial examples, or to distinguish DNN's behavior activated by the adversarial examples. Detections based on features cannot handle adversarial examples with large perturbations. Besides, they require a large amount of specific adversarial examples. Another mainstream, model-based detections, which characterize input properties by model behaviors, suffer from heavy computation cost. To address the issues, we introduce the concept of local gradient, and reveal that adversarial examples have a quite larger bound of local gradient than the benign ones. Inspired by the observation, we leverage local gradient for detecting adversarial examples, and propose a general framework AdvCheck. Specifically, by calculating the local gradient from a few benign examples and noise-added misclassified examples to train a detector, adversarial examples and even misclassified natural inputs can be precisely distinguished from benign ones. Through extensive experiments, we have validated the AdvCheck's superior performance to the state-of-the-art (SOTA) baselines, with detection rate ($\sim \times 1.2$) on general adversarial attacks and ($\sim \times 1.4$) on misclassified natural inputs on average, with average 1/500 time cost. We also provide interpretable results for successful detection.
    A Desynchronization-Based Countermeasure Against Side-Channel Analysis of Neural Networks. (arXiv:2303.18132v1 [cs.CR])
    Model extraction attacks have been widely applied, which can normally be used to recover confidential parameters of neural networks for multiple layers. Recently, side-channel analysis of neural networks allows parameter extraction even for networks with several multiple deep layers with high effectiveness. It is therefore of interest to implement a certain level of protection against these attacks. In this paper, we propose a desynchronization-based countermeasure that makes the timing analysis of activation functions harder. We analyze the timing properties of several activation functions and design the desynchronization in a way that the dependency on the input and the activation type is hidden. We experimentally verify the effectiveness of the countermeasure on a 32-bit ARM Cortex-M4 microcontroller and employ a t-test to show the side-channel information leakage. The overhead ultimately depends on the number of neurons in the fully-connected layer, for example, in the case of 4096 neurons in VGG-19, the overheads are between 2.8% and 11%.
    A Method for Evaluating Deep Generative Models of Images via Assessing the Reproduction of High-order Spatial Context. (arXiv:2111.12577v2 [cs.CV] UPDATED)
    Deep generative models (DGMs) have the potential to revolutionize diagnostic imaging. Generative adversarial networks (GANs) are one kind of DGM which are widely employed. The overarching problem with deploying GANs, and other DGMs, in any application that requires domain expertise in order to actually use the generated images is that there generally is not adequate or automatic means of assessing the domain-relevant quality of generated images. In this work, we demonstrate several objective tests of images output by two popular GAN architectures. We designed several stochastic context models (SCMs) of distinct image features that can be recovered after generation by a trained GAN. Several of these features are high-order, algorithmic pixel-arrangement rules which are not readily expressed in covariance matrices. We designed and validated statistical classifiers to detect specific effects of the known arrangement rules. We then tested the rates at which two different GANs correctly reproduced the feature context under a variety of training scenarios, and degrees of feature-class similarity. We found that ensembles of generated images can appear largely accurate visually, and show high accuracy in ensemble measures, while not exhibiting the known spatial arrangements. Furthermore, GANs trained on a spectrum of distinct spatial orders did not respect the given prevalence of those orders in the training data. The main conclusion is that SCMs can be engineered to quantify numerous errors, per image, that may not be captured in ensemble statistics but plausibly can affect subsequent use of the GAN-generated images.
    A Kernel Approach for PDE Discovery and Operator Learning. (arXiv:2210.08140v2 [stat.ML] UPDATED)
    This article presents a three-step framework for learning and solving partial differential equations (PDEs) using kernel methods. Given a training set consisting of pairs of noisy PDE solutions and source/boundary terms on a mesh, kernel smoothing is utilized to denoise the data and approximate derivatives of the solution. This information is then used in a kernel regression model to learn the algebraic form of the PDE. The learned PDE is then used within a kernel based solver to approximate the solution of the PDE with a new source/boundary term, thereby constituting an operator learning framework. Numerical experiments compare the method to state-of-the-art algorithms and demonstrate its competitive performance.
    Near-Optimal Learning of Extensive-Form Games with Imperfect Information. (arXiv:2202.01752v2 [cs.LG] UPDATED)
    This paper resolves the open question of designing near-optimal algorithms for learning imperfect-information extensive-form games from bandit feedback. We present the first line of algorithms that require only $\widetilde{\mathcal{O}}((XA+YB)/\varepsilon^2)$ episodes of play to find an $\varepsilon$-approximate Nash equilibrium in two-player zero-sum games, where $X,Y$ are the number of information sets and $A,B$ are the number of actions for the two players. This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors. We achieve this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization. Both algorithms rely on novel approaches of integrating \emph{balanced exploration policies} into their classical counterparts. We also extend our results to learning Coarse Correlated Equilibria in multi-player general-sum games.
    MAGNNETO: A Graph Neural Network-based Multi-Agent system for Traffic Engineering. (arXiv:2303.18157v1 [cs.NI])
    Current trends in networking propose the use of Machine Learning (ML) for a wide variety of network optimization tasks. As such, many efforts have been made to produce ML-based solutions for Traffic Engineering (TE), which is a fundamental problem in ISP networks. Nowadays, state-of-the-art TE optimizers rely on traditional optimization techniques, such as Local search, Constraint Programming, or Linear programming. In this paper, we present MAGNNETO, a distributed ML-based framework that leverages Multi-Agent Reinforcement Learning and Graph Neural Networks for distributed TE optimization. MAGNNETO deploys a set of agents across the network that learn and communicate in a distributed fashion via message exchanges between neighboring agents. Particularly, we apply this framework to optimize link weights in OSPF, with the goal of minimizing network congestion. In our evaluation, we compare MAGNNETO against several state-of-the-art TE optimizers in more than 75 topologies (up to 153 nodes and 354 links), including realistic traffic loads. Our experimental results show that, thanks to its distributed nature, MAGNNETO achieves comparable performance to state-of-the-art TE optimizers with significantly lower execution times. Moreover, our ML-based solution demonstrates a strong generalization capability to successfully operate in new networks unseen during training.
    A two-head loss function for deep Average-K classification. (arXiv:2303.18118v1 [cs.LG])
    Average-K classification is an alternative to top-K classification in which the number of labels returned varies with the ambiguity of the input image but must average to K over all the samples. A simple method to solve this task is to threshold the softmax output of a model trained with the cross-entropy loss. This approach is theoretically proven to be asymptotically consistent, but it is not guaranteed to be optimal for a finite set of samples. In this paper, we propose a new loss function based on a multi-label classification head in addition to the classical softmax. This second head is trained using pseudo-labels generated by thresholding the softmax head while guaranteeing that K classes are returned on average. We show that this approach allows the model to better capture ambiguities between classes and, as a result, to return more consistent sets of possible classes. Experiments on two datasets from the literature demonstrate that our approach outperforms the softmax baseline, as well as several other loss functions more generally designed for weakly supervised multi-label classification. The gains are larger the higher the uncertainty, especially for classes with few samples.
    Learning Spiking Neural Systems with the Event-Driven Forward-Forward Process. (arXiv:2303.18187v1 [cs.NE])
    We develop a novel credit assignment algorithm for information processing with spiking neurons without requiring feedback synapses. Specifically, we propose an event-driven generalization of the forward-forward and the predictive forward-forward learning processes for a spiking neural system that iteratively processes sensory input over a stimulus window. As a result, the recurrent circuit computes the membrane potential of each neuron in each layer as a function of local bottom-up, top-down, and lateral signals, facilitating a dynamic, layer-wise parallel form of neural computation. Unlike spiking neural coding, which relies on feedback synapses to adjust neural electrical activity, our model operates purely online and forward in time, offering a promising way to learn distributed representations of sensory data patterns with temporal spike signals. Notably, our experimental results on several pattern datasets demonstrate that the even-driven forward-forward (ED-FF) framework works well for training a dynamic recurrent spiking system capable of both classification and reconstruction.
    BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection. (arXiv:2303.18138v1 [cs.CR])
    As various forms of fraud proliferate on Ethereum, it is imperative to safeguard against these malicious activities to protect susceptible users from being victimized. While current studies solely rely on graph-based fraud detection approaches, it is argued that they may not be well-suited for dealing with highly repetitive, skew-distributed and heterogeneous Ethereum transactions. To address these challenges, we propose BERT4ETH, a universal pre-trained Transformer encoder that serves as an account representation extractor for detecting various fraud behaviors on Ethereum. BERT4ETH features the superior modeling capability of Transformer to capture the dynamic sequential patterns inherent in Ethereum transactions, and addresses the challenges of pre-training a BERT model for Ethereum with three practical and effective strategies, namely repetitiveness reduction, skew alleviation and heterogeneity modeling. Our empirical evaluation demonstrates that BERT4ETH outperforms state-of-the-art methods with significant enhancements in terms of the phishing account detection and de-anonymization tasks. The code for BERT4ETH is available at: https://github.com/git-disl/BERT4ETH.
    Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?. (arXiv:2303.18240v1 [cs.CV])
    We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. Next, we systematically evaluate existing PVRs and find that none are universally dominant. To study the effect of pre-training data scale and diversity, we combine over 4,000 hours of egocentric videos from 7 different sources (over 5.6M images) and ImageNet to train different-sized vision transformers using Masked Auto-Encoding (MAE) on slices of this data. Contrary to inferences from prior work, we find that scaling dataset size and diversity does not improve performance universally (but does so on average). Our largest model, named VC-1, outperforms all prior PVRs on average but does not universally dominate either. Finally, we show that task or domain-specific adaptation of VC-1 leads to substantial gains, with VC-1 (adapted) achieving competitive or superior performance than the best known results on all of the benchmarks in CortexBench. These models required over 10,000 GPU-hours to train and can be found on our website for the benefit of the research community.
    Evaluation Challenges for Geospatial ML. (arXiv:2303.18087v1 [cs.LG])
    As geospatial machine learning models and maps derived from their predictions are increasingly used for downstream analyses in science and policy, it is imperative to evaluate their accuracy and applicability. Geospatial machine learning has key distinctions from other learning paradigms, and as such, the correct way to measure performance of spatial machine learning outputs has been a topic of debate. In this paper, I delineate unique challenges of model evaluation for geospatial machine learning with global or remotely sensed datasets, culminating in concrete takeaways to improve evaluations of geospatial model performance.
    A fast Multiplicative Updates algorithm for Non-negative Matrix Factorization. (arXiv:2303.17992v1 [math.OC])
    Nonnegative Matrix Factorization is an important tool in unsupervised machine learning to decompose a data matrix into a product of parts that are often interpretable. Many algorithms have been proposed during the last three decades. A well-known method is the Multiplicative Updates algorithm proposed by Lee and Seung in 2002. Multiplicative updates have many interesting features: they are simple to implement and can be adapted to popular variants such as sparse Nonnegative Matrix Factorization, and, according to recent benchmarks, is state-of-the-art for many problems where the loss function is not the Frobenius norm. In this manuscript, we propose to improve the Multiplicative Updates algorithm seen as an alternating majorization minimization algorithm by crafting a tighter upper bound of the Hessian matrix for each alternate subproblem. Convergence is still ensured and we observe in practice on both synthetic and real world dataset that the proposed fastMU algorithm is often several orders of magnitude faster than the regular Multiplicative Updates algorithm, and can even be competitive with state-of-the-art methods for the Frobenius loss.
    Beyond Multilayer Perceptrons: Investigating Complex Topologies in Neural Networks. (arXiv:2303.17925v1 [cs.NE])
    In this study, we explore the impact of network topology on the approximation capabilities of artificial neural networks (ANNs), with a particular focus on complex topologies. We propose a novel methodology for constructing complex ANNs based on various topologies, including Barab\'asi-Albert, Erd\H{o}s-R\'enyi, Watts-Strogatz, and multilayer perceptrons (MLPs). The constructed networks are evaluated on synthetic datasets generated from manifold learning generators, with varying levels of task difficulty and noise. Our findings reveal that complex topologies lead to superior performance in high-difficulty regimes compared to traditional MLPs. This performance advantage is attributed to the ability of complex networks to exploit the compositionality of the underlying target function. However, this benefit comes at the cost of increased forward-pass computation time and reduced robustness to graph damage. Additionally, we investigate the relationship between various topological attributes and model performance. Our analysis shows that no single attribute can account for the observed performance differences, suggesting that the influence of network topology on approximation capabilities may be more intricate than a simple correlation with individual topological attributes. Our study sheds light on the potential of complex topologies for enhancing the performance of ANNs and provides a foundation for future research exploring the interplay between multiple topological attributes and their impact on model performance.
    Differentially Private Stochastic Convex Optimization in (Non)-Euclidean Space Revisited. (arXiv:2303.18047v1 [cs.LG])
    In this paper, we revisit the problem of Differentially Private Stochastic Convex Optimization (DP-SCO) in Euclidean and general $\ell_p^d$ spaces. Specifically, we focus on three settings that are still far from well understood: (1) DP-SCO over a constrained and bounded (convex) set in Euclidean space; (2) unconstrained DP-SCO in $\ell_p^d$ space; (3) DP-SCO with heavy-tailed data over a constrained and bounded set in $\ell_p^d$ space. For problem (1), for both convex and strongly convex loss functions, we propose methods whose outputs could achieve (expected) excess population risks that are only dependent on the Gaussian width of the constraint set rather than the dimension of the space. Moreover, we also show the bound for strongly convex functions is optimal up to a logarithmic factor. For problems (2) and (3), we propose several novel algorithms and provide the first theoretical results for both cases when $1<p<2$ and $2\leq p\leq \infty$.
    Conflict-Averse Gradient Optimization of Ensembles for Effective Offline Model-Based Optimization. (arXiv:2303.17934v1 [cs.LG])
    Data-driven offline model-based optimization (MBO) is an established practical approach to black-box computational design problems for which the true objective function is unknown and expensive to query. However, the standard approach which optimizes designs against a learned proxy model of the ground truth objective can suffer from distributional shift. Specifically, in high-dimensional design spaces where valid designs lie on a narrow manifold, the standard approach is susceptible to producing out-of-distribution, invalid designs that "fool" the learned proxy model into outputting a high value. Using an ensemble rather than a single model as the learned proxy can help mitigate distribution shift, but naive formulations for combining gradient information from the ensemble, such as minimum or mean gradient, are still suboptimal and often hampered by non-convergent behavior. In this work, we explore alternate approaches for combining gradient information from the ensemble that are robust to distribution shift without compromising optimality of the produced designs. More specifically, we explore two functions, formulated as convex optimization problems, for combining gradient information: multiple gradient descent algorithm (MGDA) and conflict-averse gradient descent (CAGrad). We evaluate these algorithms on a diverse set of five computational design tasks. We compare performance of ensemble MBO with MGDA and ensemble MBO with CAGrad with three naive baseline algorithms: (a) standard single-model MBO, (b) ensemble MBO with mean gradient, and (c) ensemble MBO with minimum gradient. Our results suggest that MGDA and CAGrad strike a desirable balance between conservatism and optimality and can help robustify data-driven offline MBO without compromising optimality of designs.
    Analysis and Comparison of Two-Level KFAC Methods for Training Deep Neural Networks. (arXiv:2303.18083v1 [cs.LG])
    As a second-order method, the Natural Gradient Descent (NGD) has the ability to accelerate training of neural networks. However, due to the prohibitive computational and memory costs of computing and inverting the Fisher Information Matrix (FIM), efficient approximations are necessary to make NGD scalable to Deep Neural Networks (DNNs). Many such approximations have been attempted. The most sophisticated of these is KFAC, which approximates the FIM as a block-diagonal matrix, where each block corresponds to a layer of the neural network. By doing so, KFAC ignores the interactions between different layers. In this work, we investigate the interest of restoring some low-frequency interactions between the layers by means of two-level methods. Inspired from domain decomposition, several two-level corrections to KFAC using different coarse spaces are proposed and assessed. The obtained results show that incorporating the layer interactions in this fashion does not really improve the performance of KFAC. This suggests that it is safe to discard the off-diagonal blocks of the FIM, since the block-diagonal approach is sufficiently robust, accurate and economical in computation time.
    Comparing Adversarial and Supervised Learning for Organs at Risk Segmentation in CT images. (arXiv:2303.17941v1 [eess.IV])
    Organ at Risk (OAR) segmentation from CT scans is a key component of the radiotherapy treatment workflow. In recent years, deep learning techniques have shown remarkable potential in automating this process. In this paper, we investigate the performance of Generative Adversarial Networks (GANs) compared to supervised learning approaches for segmenting OARs from CT images. We propose three GAN-based models with identical generator architectures but different discriminator networks. These models are compared with well-established CNN models, such as SE-ResUnet and DeepLabV3, using the StructSeg dataset, which consists of 50 annotated CT scans containing contours of six OARs. Our work aims to provide insight into the advantages and disadvantages of adversarial training in the context of OAR segmentation. The results are very promising and show that the proposed GAN-based approaches are similar or superior to their CNN-based counterparts, particularly when segmenting more challenging target organs.
    Federated Learning for Metaverse: A Survey. (arXiv:2303.17987v1 [cs.CR])
    The metaverse, which is at the stage of innovation and exploration, faces the dilemma of data collection and the problem of private data leakage in the process of development. This can seriously hinder the widespread deployment of the metaverse. Fortunately, federated learning (FL) is a solution to the above problems. FL is a distributed machine learning paradigm with privacy-preserving features designed for a large number of edge devices. Federated learning for metaverse (FL4M) will be a powerful tool. Because FL allows edge devices to participate in training tasks locally using their own data, computational power, and model-building capabilities. Applying FL to the metaverse not only protects the data privacy of participants but also reduces the need for high computing power and high memory on servers. Until now, there have been many studies about FL and the metaverse, respectively. In this paper, we review some of the early advances of FL4M, which will be a research direction with unlimited development potential. We first introduce the concepts of metaverse and FL, respectively. Besides, we discuss the convergence of key metaverse technologies and FL in detail, such as big data, communication technology, the Internet of Things, edge computing, blockchain, and extended reality. Finally, we discuss some key challenges and promising directions of FL4M in detail. In summary, we hope that our up-to-date brief survey can help people better understand FL4M and build a fair, open, and secure metaverse.
    Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis. (arXiv:2303.17885v1 [cs.LG])
    A wireless federated learning system is investigated by allowing a server and workers to exchange uncoded information via orthogonal wireless channels. Since the workers frequently upload local gradients to the server via bandwidth-limited channels, the uplink transmission from the workers to the server becomes a communication bottleneck. Therefore, a one-shot distributed principle component analysis (PCA) is leveraged to reduce the dimension of uploaded gradients such that the communication bottleneck is relieved. A PCA-based wireless federated learning (PCA-WFL) algorithm and its accelerated version (i.e., PCA-AWFL) are proposed based on the low-dimensional gradients and the Nesterov's momentum. For the non-convex loss functions, a finite-time analysis is performed to quantify the impacts of system hyper-parameters on the convergence of the PCA-WFL and PCA-AWFL algorithms. The PCA-AWFL algorithm is theoretically certified to converge faster than the PCA-WFL algorithm. Besides, the convergence rates of PCA-WFL and PCA-AWFL algorithms quantitatively reveal the linear speedup with respect to the number of workers over the vanilla gradient descent algorithm. Numerical results are used to demonstrate the improved convergence rates of the proposed PCA-WFL and PCA-AWFL algorithms over the benchmarks.
    Learning-Based Optimal Control with Performance Guarantees for Unknown Systems with Latent States. (arXiv:2303.17963v1 [eess.SY])
    As control engineering methods are applied to increasingly complex systems, data-driven approaches for system identification appear as a promising alternative to physics-based modeling. While many of these approaches rely on the availability of state measurements, the states of a complex system are often not directly measurable. It may then be necessary to jointly estimate the dynamics and a latent state, making it considerably more challenging to design controllers with performance guarantees. This paper proposes a novel method for the computation of an optimal input trajectory for unknown nonlinear systems with latent states. Probabilistic performance guarantees are derived for the resulting input trajectory, and an approach to validate the performance of arbitrary control laws is presented. The effectiveness of the proposed method is demonstrated in a numerical simulation.
    Relative Sparsity for Medical Decision Problems. (arXiv:2211.16566v3 [stat.ME] UPDATED)
    Existing statistical methods can estimate a policy, or a mapping from covariates to decisions, which can then instruct decision makers (e.g., whether to administer hypotension treatment based on covariates blood pressure and heart rate). There is great interest in using such data-driven policies in healthcare. However, it is often important to explain to the healthcare provider, and to the patient, how a new policy differs from the current standard of care. This end is facilitated if one can pinpoint the aspects of the policy (i.e., the parameters for blood pressure and heart rate) that change when moving from the standard of care to the new, suggested policy. To this end, we adapt ideas from Trust Region Policy Optimization (TRPO). In our work, however, unlike in TRPO, the difference between the suggested policy and standard of care is required to be sparse, aiding with interpretability. This yields ``relative sparsity," where, as a function of a tuning parameter, $\lambda$, we can approximately control the number of parameters in our suggested policy that differ from their counterparts in the standard of care (e.g., heart rate only). We propose a criterion for selecting $\lambda$, perform simulations, and illustrate our method with a real, observational healthcare dataset, deriving a policy that is easy to explain in the context of the current standard of care. Our work promotes the adoption of data-driven decision aids, which have great potential to improve health outcomes.
    Maximum Covariance Unfolding Regression: A Novel Covariate-based Manifold Learning Approach for Point Cloud Data. (arXiv:2303.17852v1 [cs.LG])
    Point cloud data are widely used in manufacturing applications for process inspection, modeling, monitoring and optimization. The state-of-art tensor regression techniques have effectively been used for analysis of structured point cloud data, where the measurements on a uniform grid can be formed into a tensor. However, these techniques are not capable of handling unstructured point cloud data that are often in the form of manifolds. In this paper, we propose a nonlinear dimension reduction approach named Maximum Covariance Unfolding Regression that is able to learn the low-dimensional (LD) manifold of point clouds with the highest correlation with explanatory covariates. This LD manifold is then used for regression modeling and process optimization based on process variables. The performance of the proposed method is subsequently evaluated and compared with benchmark methods through simulations and a case study of steel bracket manufacturing.
    Per-Example Gradient Regularization Improves Learning Signals from Noisy Data. (arXiv:2303.17940v1 [stat.ML])
    Gradient regularization, as described in \citet{barrett2021implicit}, is a highly effective technique for promoting flat minima during gradient descent. Empirical evidence suggests that this regularization technique can significantly enhance the robustness of deep learning models against noisy perturbations, while also reducing test error. In this paper, we explore the per-example gradient regularization (PEGR) and present a theoretical analysis that demonstrates its effectiveness in improving both test error and robustness against noise perturbations. Specifically, we adopt a signal-noise data model from \citet{cao2022benign} and show that PEGR can learn signals effectively while suppressing noise. In contrast, standard gradient descent struggles to distinguish the signal from the noise, leading to suboptimal generalization performance. Our analysis reveals that PEGR penalizes the variance of pattern learning, thus effectively suppressing the memorization of noises from the training data. These findings underscore the importance of variance control in deep learning training and offer useful insights for developing more effective training approaches.
    Towards Adversarially Robust Continual Learning. (arXiv:2303.17764v1 [cs.LG])
    Recent studies show that models trained by continual learning can achieve the comparable performances as the standard supervised learning and the learning flexibility of continual learning models enables their wide applications in the real world. Deep learning models, however, are shown to be vulnerable to adversarial attacks. Though there are many studies on the model robustness in the context of standard supervised learning, protecting continual learning from adversarial attacks has not yet been investigated. To fill in this research gap, we are the first to study adversarial robustness in continual learning and propose a novel method called \textbf{T}ask-\textbf{A}ware \textbf{B}oundary \textbf{A}ugmentation (TABA) to boost the robustness of continual learning models. With extensive experiments on CIFAR-10 and CIFAR-100, we show the efficacy of adversarial training and TABA in defending adversarial attacks.
    Learning Internal Representations of 3D Transformations from 2D Projected Inputs. (arXiv:2303.17776v1 [q-bio.NC])
    When interacting in a three dimensional world, humans must estimate 3D structure from visual inputs projected down to two dimensional retinal images. It has been shown that humans use the persistence of object shape over motion-induced transformations as a cue to resolve depth ambiguity when solving this underconstrained problem. With the aim of understanding how biological vision systems may internally represent 3D transformations, we propose a computational model, based on a generative manifold model, which can be used to infer 3D structure from the motion of 2D points. Our model can also learn representations of the transformations with minimal supervision, providing a proof of concept for how humans may develop internal representations on a developmental or evolutionary time scale. Focused on rotational motion, we show how our model infers depth from moving 2D projected points, learns 3D rotational transformations from 2D training stimuli, and compares to human performance on psychophysical structure-from-motion experiments.
    Fused Depthwise Tiling for Memory Optimization in TinyML Deep Neural Network Inference. (arXiv:2303.17878v1 [cs.LG])
    Memory optimization for deep neural network (DNN) inference gains high relevance with the emergence of TinyML, which refers to the deployment of DNN inference tasks on tiny, low-power microcontrollers. Applications such as audio keyword detection or radar-based gesture recognition are heavily constrained by the limited memory on such tiny devices because DNN inference requires large intermediate run-time buffers to store activations and other intermediate data, which leads to high memory usage. In this paper, we propose a new Fused Depthwise Tiling (FDT) method for the memory optimization of DNNs, which, compared to existing tiling methods, reduces memory usage without inducing any run time overhead. FDT applies to a larger variety of network layers than existing tiling methods that focus on convolutions. It improves TinyML memory optimization significantly by reducing memory of models where this was not possible before and additionally providing alternative design points for models that show high run time overhead with existing methods. In order to identify the best tiling configuration, an end-to-end flow with a new path discovery method is proposed, which applies FDT and existing tiling methods in a fully automated way, including the scheduling of the operations and planning of the layout of buffers in memory. Out of seven evaluated models, FDT achieved significant memory reduction for two models by 76.2% and 18.1% where existing tiling methods could not be applied. Two other models showed a significant run time overhead with existing methods and FDT provided alternative design points with no overhead but reduced memory savings.
    Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations. (arXiv:2303.17839v1 [cs.CV])
    The abundance of instructional videos and their narrations over the Internet offers an exciting avenue for understanding procedural activities. In this work, we propose to learn video representation that encodes both action steps and their temporal ordering, based on a large-scale dataset of web instructional videos and their narrations, without using human annotations. Our method jointly learns a video representation to encode individual step concepts, and a deep probabilistic model to capture both temporal dependencies and immense individual variations in the step ordering. We empirically demonstrate that learning temporal ordering not only enables new capabilities for procedure reasoning, but also reinforces the recognition of individual steps. Our model significantly advances the state-of-the-art results on step classification (+2.8% / +3.3% on COIN / EPIC-Kitchens) and step forecasting (+7.4% on COIN). Moreover, our model attains promising results in zero-shot inference for step classification and forecasting, as well as in predicting diverse and plausible steps for incomplete procedures. Our code is available at https://github.com/facebookresearch/ProcedureVRL.
    CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer. (arXiv:2303.17867v1 [cs.CV])
    Content affinity loss including feature and pixel affinity is a main problem which leads to artifacts in photorealistic and video style transfer. This paper proposes a new framework named CAP-VSTNet, which consists of a new reversible residual network and an unbiased linear transform module, for versatile style transfer. This reversible residual network can not only preserve content affinity but not introduce redundant information as traditional reversible networks, and hence facilitate better stylization. Empowered by Matting Laplacian training loss which can address the pixel affinity loss problem led by the linear transform, the proposed framework is applicable and effective on versatile style transfer. Extensive experiments show that CAP-VSTNet can produce better qualitative and quantitative results in comparison with the state-of-the-art methods.
    Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness. (arXiv:2303.17765v1 [stat.ML])
    Representation multi-task learning (MTL) and transfer learning (TL) have achieved tremendous success in practice. However, the theoretical understanding of these methods is still lacking. Most existing theoretical works focus on cases where all tasks share the same representation, and claim that MTL and TL almost always improve performance. However, as the number of tasks grow, assuming all tasks share the same representation is unrealistic. Also, this does not always match empirical findings, which suggest that a shared representation may not necessarily improve single-task or target-only learning performance. In this paper, we aim to understand how to learn from tasks with \textit{similar but not exactly the same} linear representations, while dealing with outlier tasks. We propose two algorithms that are \textit{adaptive} to the similarity structure and \textit{robust} to outlier tasks under both MTL and TL settings. Our algorithms outperform single-task or target-only learning when representations across tasks are sufficiently similar and the fraction of outlier tasks is small. Furthermore, they always perform no worse than single-task learning or target-only learning, even when the representations are dissimilar. We provide information-theoretic lower bounds to show that our algorithms are nearly \textit{minimax} optimal in a large regime.
    The Schr\"odinger Bridge between Gaussian Measures has a Closed Form. (arXiv:2202.05722v2 [cs.LG] UPDATED)
    The static optimal transport $(\mathrm{OT})$ problem between Gaussians seeks to recover an optimal map, or more generally a coupling, to morph a Gaussian into another. It has been well studied and applied to a wide variety of tasks. Here we focus on the dynamic formulation of OT, also known as the Schr\"odinger bridge (SB) problem, which has recently seen a surge of interest in machine learning due to its connections with diffusion-based generative models. In contrast to the static setting, much less is known about the dynamic setting, even for Gaussian distributions. In this paper, we provide closed-form expressions for SBs between Gaussian measures. In contrast to the static Gaussian OT problem, which can be simply reduced to studying convex programs, our framework for solving SBs requires significantly more involved tools such as Riemannian geometry and generator theory. Notably, we establish that the solutions of SBs between Gaussian measures are themselves Gaussian processes with explicit mean and covariance kernels, and thus are readily amenable for many downstream applications such as generative modeling or interpolation. To demonstrate the utility, we devise a new method for modeling the evolution of single-cell genomics data and report significantly improved numerical stability compared to existing SB-based approaches.
    Predictive Context-Awareness for Full-Immersive Multiuser Virtual Reality with Redirected Walking. (arXiv:2303.17907v1 [cs.NI])
    Virtual Reality (VR) technology is being advanced along the lines of enhancing its immersiveness, enabling multiuser Virtual Experiences (VEs), and supporting unconstrained mobility of the users in their VEs, while constraining them within specialized VR setups through Redirected Walking (RDW). For meeting the extreme data-rate and latency requirements of future VR systems, supporting wireless networking infrastructures will operate in millimeter Wave (mmWave) frequencies and leverage highly directional communication in both transmission and reception through beamforming and beamsteering. We propose to leverage predictive context-awareness for optimizing transmitter and receiver-side beamforming and beamsteering. In particular, we argue that short-term prediction of users' lateral movements in multiuser VR setups with RDW can be utilized for optimizing transmitter-side beamforming and beamsteering through Line-of-Sight (LoS) "tracking" in the users' directions. At the same time, short-term prediction of orientational movements can be used for receiver-side beamforming for coverage flexibility enhancements. We target two open problems in predicting these two context information instances: i) lateral movement prediction in multiuser VR settings with RDW and ii) generation of synthetic head rotation datasets to be utilized in the training of existing orientational movements predictors. We follow by experimentally showing that Long Short-Term Memory (LSTM) networks feature promising accuracy in predicting lateral movements, as well as that context-awareness stemming from VEs further benefits this accuracy. Second, we show that a TimeGAN-based approach for orientational data generation can generate synthetic samples closely matching the experimentally obtained ones.
    Concatenated Classic and Neural (CCN) Codes: ConcatenatedAE. (arXiv:2209.01701v2 [cs.IT] UPDATED)
    Small neural networks (NNs) used for error correction were shown to improve on classic channel codes and to address channel model changes. We extend the code dimension of any such structure by using the same NN under one-hot encoding multiple times, then serially-concatenated with an outer classic code. We design NNs with the same network parameters, where each Reed-Solomon codeword symbol is an input to a different NN. Significant improvements in block error probabilities for an additive Gaussian noise channel as compared to the small neural code are illustrated, as well as robustness to channel model changes.
    Partial Optimality in Cubic Correlation Clustering. (arXiv:2302.04694v2 [cs.DM] UPDATED)
    The higher-order correlation clustering problem is an expressive model, and recently, local search heuristics have been proposed for several applications. Certifying optimality, however, is NP-hard and practically hampered already by the complexity of the problem statement. Here, we focus on establishing partial optimality conditions for the special case of complete graphs and cubic objective functions. In addition, we define and implement algorithms for testing these conditions and examine their effect numerically, on two datasets.
    Rethinking interpretation: Input-agnostic saliency mapping of deep visual classifiers. (arXiv:2303.17836v1 [cs.CV])
    Saliency methods provide post-hoc model interpretation by attributing input features to the model outputs. Current methods mainly achieve this using a single input sample, thereby failing to answer input-independent inquiries about the model. We also show that input-specific saliency mapping is intrinsically susceptible to misleading feature attribution. Current attempts to use 'general' input features for model interpretation assume access to a dataset containing those features, which biases the interpretation. Addressing the gap, we introduce a new perspective of input-agnostic saliency mapping that computationally estimates the high-level features attributed by the model to its outputs. These features are geometrically correlated, and are computed by accumulating model's gradient information with respect to an unrestricted data distribution. To compute these features, we nudge independent data points over the model loss surface towards the local minima associated by a human-understandable concept, e.g., class label for classifiers. With a systematic projection, scaling and refinement process, this information is transformed into an interpretable visualization without compromising its model-fidelity. The visualization serves as a stand-alone qualitative interpretation. With an extensive evaluation, we not only demonstrate successful visualizations for a variety of concepts for large-scale models, but also showcase an interesting utility of this new form of saliency mapping by identifying backdoor signatures in compromised classifiers.
    $\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States. (arXiv:2303.18242v1 [cs.LG])
    We introduce $\infty$-Diff, a generative diffusion model which directly operates on infinite resolution data. By randomly sampling subsets of coordinates during training and learning to denoise the content at those coordinates, a continuous function is learned that allows sampling at arbitrary resolutions. In contrast to other recent infinite resolution generative models, our approach operates directly on the raw data, not requiring latent vector compression for context, using hypernetworks, nor relying on discrete components. As such, our approach achieves significantly higher sample quality, as evidenced by lower FID scores, as well as being able to effectively scale to higher resolutions than the training data while retaining detail.
    Adaptive Estimators Show Information Compression in Deep Neural Networks. (arXiv:1902.09037v2 [cs.LG] UPDATED)
    To improve how neural networks function it is crucial to understand their learning process. The information bottleneck theory of deep learning proposes that neural networks achieve good generalization by compressing their representations to disregard information that is not relevant to the task. However, empirical evidence for this theory is conflicting, as compression was only observed when networks used saturating activation functions. In contrast, networks with non-saturating activation functions achieved comparable levels of task performance but did not show compression. In this paper we developed more robust mutual information estimation techniques, that adapt to hidden activity of neural networks and produce more sensitive measurements of activations from all functions, especially unbounded functions. Using these adaptive estimation techniques, we explored compression in networks with a range of different activation functions. With two improved methods of estimation, firstly, we show that saturation of the activation function is not required for compression, and the amount of compression varies between different activation functions. We also find that there is a large amount of variation in compression between different network initializations. Secondary, we see that L2 regularization leads to significantly increased compression, while preventing overfitting. Finally, we show that only compression of the last layer is positively correlated with generalization.
    Rapid prediction of lab-grown tissue properties using deep learning. (arXiv:2303.18017v1 [q-bio.TO])
    The interactions between cells and the extracellular matrix are vital for the self-organisation of tissues. In this paper we present proof-of-concept to use machine learning tools to predict the role of this mechanobiology in the self-organisation of cell-laden hydrogels grown in tethered moulds. We develop a process for the automated generation of mould designs with and without key symmetries. We create a large training set with $N=6500$ cases by running detailed biophysical simulations of cell-matrix interactions using the contractile network dipole orientation (CONDOR) model for the self-organisation of cellular hydrogels within these moulds. These are used to train an implementation of the \texttt{pix2pix} deep learning model, reserving $740$ cases that were unseen in the training of the neural network for training and validation. Comparison between the predictions of the machine learning technique and the reserved predictions from the biophysical algorithm show that the machine learning algorithm makes excellent predictions. The machine learning algorithm is significantly faster than the biophysical method, opening the possibility of very high throughput rational design of moulds for pharmaceutical testing, regenerative medicine and fundamental studies of biology. Future extensions for scaffolds and 3D bioprinting will open additional applications.
    SimTS: Rethinking Contrastive Representation Learning for Time Series Forecasting. (arXiv:2303.18205v1 [cs.LG])
    Contrastive learning methods have shown an impressive ability to learn meaningful representations for image or time series classification. However, these methods are less effective for time series forecasting, as optimization of instance discrimination is not directly applicable to predicting the future state from the history context. Moreover, the construction of positive and negative pairs in current technologies strongly relies on specific time series characteristics, restricting their generalization across diverse types of time series data. To address these limitations, we propose SimTS, a simple representation learning approach for improving time series forecasting by learning to predict the future from the past in the latent space. SimTS does not rely on negative pairs or specific assumptions about the characteristics of the particular time series. Our extensive experiments on several benchmark time series forecasting datasets show that SimTS achieves competitive performance compared to existing contrastive learning methods. Furthermore, we show the shortcomings of the current contrastive learning framework used for time series forecasting through a detailed ablation study. Overall, our work suggests that SimTS is a promising alternative to other contrastive learning approaches for time series forecasting.
    Logical Message Passing Networks with One-hop Inference on Atomic Formulas. (arXiv:2301.08859v3 [cs.LG] UPDATED)
    Complex Query Answering (CQA) over Knowledge Graphs (KGs) has attracted a lot of attention to potentially support many applications. Given that KGs are usually incomplete, neural models are proposed to answer the logical queries by parameterizing set operators with complex neural networks. However, such methods usually train neural set operators with a large number of entity and relation embeddings from the zero, where whether and how the embeddings or the neural set operators contribute to the performance remains not clear. In this paper, we propose a simple framework for complex query answering that decomposes the KG embeddings from neural set operators. We propose to represent the complex queries into the query graph. On top of the query graph, we propose the Logical Message Passing Neural Network (LMPNN) that connects the local one-hop inferences on atomic formulas to the global logical reasoning for complex query answering. We leverage existing effective KG embeddings to conduct one-hop inferences on atomic formulas, the results of which are regarded as the messages passed in LMPNN. The reasoning process over the overall logical formulas is turned into the forward pass of LMPNN that incrementally aggregates local information to finally predict the answers' embeddings. The complex logical inference across different types of queries will then be learned from training examples based on the LMPNN architecture. Theoretically, our query-graph represenation is more general than the prevailing operator-tree formulation, so our approach applies to a broader range of complex KG queries. Empirically, our approach yields the new state-of-the-art neural CQA model. Our research bridges the gap between complex KG query answering tasks and the long-standing achievements of knowledge graph representation learning.
    Unrolled Graph Learning for Multi-Agent Collaboration. (arXiv:2210.17101v2 [cs.LG] UPDATED)
    Multi-agent learning has gained increasing attention to tackle distributed machine learning scenarios under constrictions of data exchanging. However, existing multi-agent learning models usually consider data fusion under fixed and compulsory collaborative relations among agents, which is not as flexible and autonomous as human collaboration. To fill this gap, we propose a distributed multi-agent learning model inspired by human collaboration, in which the agents can autonomously detect suitable collaborators and refer to collaborators' model for better performance. To implement such adaptive collaboration, we use a collaboration graph to indicate the pairwise collaborative relation. The collaboration graph can be obtained by graph learning techniques based on model similarity between different agents. Since model similarity can not be formulated by a fixed graphical optimization, we design a graph learning network by unrolling, which can learn underlying similar features among potential collaborators. By testing on both regression and classification tasks, we validate that our proposed collaboration model can figure out accurate collaborative relationship and greatly improve agents' learning performance.
    Learning Treatment Effects in Panels with General Intervention Patterns. (arXiv:2106.02780v2 [stat.ML] UPDATED)
    The problem of causal inference with panel data is a central econometric question. The following is a fundamental version of this problem: Let $M^*$ be a low rank matrix and $E$ be a zero-mean noise matrix. For a `treatment' matrix $Z$ with entries in $\{0,1\}$ we observe the matrix $O$ with entries $O_{ij} := M^*_{ij} + E_{ij} + \mathcal{T}_{ij} Z_{ij}$ where $\mathcal{T}_{ij} $ are unknown, heterogenous treatment effects. The problem requires we estimate the average treatment effect $\tau^* := \sum_{ij} \mathcal{T}_{ij} Z_{ij} / \sum_{ij} Z_{ij}$. The synthetic control paradigm provides an approach to estimating $\tau^*$ when $Z$ places support on a single row. This paper extends that framework to allow rate-optimal recovery of $\tau^*$ for general $Z$, thus broadly expanding its applicability. Our guarantees are the first of their type in this general setting. Computational experiments on synthetic and real-world data show a substantial advantage over competing estimators.
    Never a Dull Moment: Distributional Properties as a Baseline for Time-Series Classification. (arXiv:2303.17809v1 [stat.ME])
    The variety of complex algorithmic approaches for tackling time-series classification problems has grown considerably over the past decades, including the development of sophisticated but challenging-to-interpret deep-learning-based methods. But without comparison to simpler methods it can be difficult to determine when such complexity is required to obtain strong performance on a given problem. Here we evaluate the performance of an extremely simple classification approach -- a linear classifier in the space of two simple features that ignore the sequential ordering of the data: the mean and standard deviation of time-series values. Across a large repository of 128 univariate time-series classification problems, this simple distributional moment-based approach outperformed chance on 69 problems, and reached 100% accuracy on two problems. With a neuroimaging time-series case study, we find that a simple linear model based on the mean and standard deviation performs better at classifying individuals with schizophrenia than a model that additionally includes features of the time-series dynamics. Comparing the performance of simple distributional features of a time series provides important context for interpreting the performance of complex time-series classification models, which may not always be required to obtain high accuracy.
    HD-GCN:A Hybrid Diffusion Graph Convolutional Network. (arXiv:2303.17966v1 [cs.LG])
    The information diffusion performance of GCN and its variant models is limited by the adjacency matrix, which can lower their performance. Therefore, we introduce a new framework for graph convolutional networks called Hybrid Diffusion-based Graph Convolutional Network (HD-GCN) to address the limitations of information diffusion caused by the adjacency matrix. In the HD-GCN framework, we initially utilize diffusion maps to facilitate the diffusion of information among nodes that are adjacent to each other in the feature space. This allows for the diffusion of information between similar points that may not have an adjacent relationship. Next, we utilize graph convolution to further propagate information among adjacent nodes after the diffusion maps, thereby enabling the spread of information among similar nodes that are adjacent in the graph. Finally, we employ the diffusion distances obtained through the use of diffusion maps to regularize and constrain the predicted labels of training nodes. This regularization method is then applied to the HD-GCN training, resulting in a smoother classification surface. The model proposed in this paper effectively overcomes the limitations of information diffusion imposed only by the adjacency matrix. HD-GCN utilizes hybrid diffusion by combining information diffusion between neighborhood nodes in the feature space and adjacent nodes in the adjacency matrix. This method allows for more comprehensive information propagation among nodes, resulting in improved model performance. We evaluated the performance of DM-GCN on three well-known citation network datasets and the results showed that the proposed framework is more effective than several graph-based semi-supervised learning methods.
    Scardina: Scalable Join Cardinality Estimation by Multiple Density Estimators. (arXiv:2303.18042v1 [cs.DB])
    In recent years, machine learning-based cardinality estimation methods are replacing traditional methods. This change is expected to contribute to one of the most important applications of cardinality estimation, the query optimizer, to speed up query processing. However, none of the existing methods do not precisely estimate cardinalities when relational schemas consist of many tables with strong correlations between tables/attributes. This paper describes that multiple density estimators can be combined to effectively target the cardinality estimation of data with large and complex schemas having strong correlations. We propose Scardina, a new join cardinality estimation method using multiple partitioned models based on the schema structure.
    FP8 versus INT8 for efficient deep learning inference. (arXiv:2303.17951v1 [cs.LG])
    Recently, the idea of using FP8 as a number format for neural network training has been floating around the deep learning world. Given that most training is currently conducted with entire networks in FP32, or sometimes FP16 with mixed-precision, the step to having some parts of a network run in FP8 with 8-bit weights is an appealing potential speed-up for the generally costly and time-intensive training procedures in deep learning. A natural question arises regarding what this development means for efficient inference on edge devices. In the efficient inference device world, workloads are frequently executed in INT8. Sometimes going even as low as INT4 when efficiency calls for it. In this whitepaper, we compare the performance for both the FP8 and INT formats for efficient on-device inference. We theoretically show the difference between the INT and FP formats for neural networks and present a plethora of post-training quantization and quantization-aware-training results to show how this theory translates to practice. We also provide a hardware analysis showing that the FP formats are somewhere between 50-180% less efficient in terms of compute in dedicated hardware than the INT format. Based on our research and a read of the research field, we conclude that although the proposed FP8 format could be good for training, the results for inference do not warrant a dedicated implementation of FP8 in favor of INT8 for efficient inference. We show that our results are mostly consistent with previous findings but that important comparisons between the formats have thus far been lacking. Finally, we discuss what happens when FP8-trained networks are converted to INT8 and conclude with a brief discussion on the most efficient way for on-device deployment and an extensive suite of INT8 results for many models.
    Adaptive joint distribution learning. (arXiv:2110.04829v2 [stat.ML] UPDATED)
    We develop a new framework for embedding joint probability distributions in tensor product reproducing kernel Hilbert spaces (RKHS). Our framework accommodates a low-dimensional, normalized and positive model of a Radon-Nikodym derivative, which we estimate from sample sizes of up to several million data points, alleviating the inherent limitations of RKHS modeling. Well-defined normalized and positive conditional distributions are natural by-products to our approach. The embedding is fast to compute and accommodates learning problems ranging from prediction to classification. Our theoretical findings are supplemented by favorable numerical results.
    Robust and IP-Protecting Vertical Federated Learning against Unexpected Quitting of Parties. (arXiv:2303.18178v1 [cs.CR])
    Vertical federated learning (VFL) enables a service provider (i.e., active party) who owns labeled features to collaborate with passive parties who possess auxiliary features to improve model performance. Existing VFL approaches, however, have two major vulnerabilities when passive parties unexpectedly quit in the deployment phase of VFL - severe performance degradation and intellectual property (IP) leakage of the active party's labels. In this paper, we propose \textbf{Party-wise Dropout} to improve the VFL model's robustness against the unexpected exit of passive parties and a defense method called \textbf{DIMIP} to protect the active party's IP in the deployment phase. We evaluate our proposed methods on multiple datasets against different inference attacks. The results show that Party-wise Dropout effectively maintains model performance after the passive party quits, and DIMIP successfully disguises label information from the passive party's feature extractor, thereby mitigating IP leakage.
    Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data. (arXiv:2301.00437v3 [cs.LG] UPDATED)
    Modern deep neural networks have achieved impressive performance on tasks from image classification to natural language processing. Surprisingly, these complex systems with massive amounts of parameters exhibit the same structural properties in their last-layer features and classifiers across canonical datasets when training until convergence. In particular, it has been observed that the last-layer features collapse to their class-means, and those class-means are the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is known as Neural Collapse ($\mathcal{NC}$). Recent papers have theoretically shown that $\mathcal{NC}$ emerges in the global minimizers of training problems with the simplified ``unconstrained feature model''. In this context, we take a step further and prove the $\mathcal{NC}$ occurrences in deep linear networks for the popular mean squared error (MSE) and cross entropy (CE) losses, showing that global solutions exhibit $\mathcal{NC}$ properties across the linear layers. Furthermore, we extend our study to imbalanced data for MSE loss and present the first geometric analysis of $\mathcal{NC}$ under bias-free setting. Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of orthogonal vectors, whose lengths depend on the amount of data in their corresponding classes. Finally, we empirically validate our theoretical analyses on synthetic and practical network architectures with both balanced and imbalanced scenarios.
    A Slow-Shifting Concerned Machine Learning Method for Short-term Traffic Flow Forecasting. (arXiv:2303.17782v1 [cs.LG])
    The ability to predict traffic flow over time for crowded areas during rush hours is increasingly important as it can help authorities make informed decisions for congestion mitigation or scheduling of infrastructure development in an area. However, a crucial challenge in traffic flow forecasting is the slow shifting in temporal peaks between daily and weekly cycles, resulting in the nonstationarity of the traffic flow signal and leading to difficulty in accurate forecasting. To address this challenge, we propose a slow shifting concerned machine learning method for traffic flow forecasting, which includes two parts. First, we take advantage of Empirical Mode Decomposition as the feature engineering to alleviate the nonstationarity of traffic flow data, yielding a series of stationary components. Second, due to the superiority of Long-Short-Term-Memory networks in capturing temporal features, an advanced traffic flow forecasting model is developed by taking the stationary components as inputs. Finally, we apply this method on a benchmark of real-world data and provide a comparison with other existing methods. Our proposed method outperforms the state-of-art results by 14.55% and 62.56% using the metrics of root mean squared error and mean absolute percentage error, respectively.
    Constrained Optimization of Rank-One Functions with Indicator Variables. (arXiv:2303.18158v1 [math.OC])
    Optimization problems involving minimization of a rank-one convex function over constraints modeling restrictions on the support of the decision variables emerge in various machine learning applications. These problems are often modeled with indicator variables for identifying the support of the continuous variables. In this paper we investigate compact extended formulations for such problems through perspective reformulation techniques. In contrast to the majority of previous work that relies on support function arguments and disjunctive programming techniques to provide convex hull results, we propose a constructive approach that exploits a hidden conic structure induced by perspective functions. To this end, we first establish a convex hull result for a general conic mixed-binary set in which each conic constraint involves a linear function of independent continuous variables and a set of binary variables. We then demonstrate that extended representations of sets associated with epigraphs of rank-one convex functions over constraints modeling indicator relations naturally admit such a conic representation. This enables us to systematically give perspective formulations for the convex hull descriptions of these sets with nonlinear separable or non-separable objective functions, sign constraints on continuous variables, and combinatorial constraints on indicator variables. We illustrate the efficacy of our results on sparse nonnegative logistic regression problems.
    Estimating truncation effects of quantum bosonic systems using sampling algorithms. (arXiv:2212.08546v2 [quant-ph] UPDATED)
    To simulate bosons on a qubit- or qudit-based quantum computer, one has to regularize the theory by truncating infinite-dimensional local Hilbert spaces to finite dimensions. In the search for practical quantum applications, it is important to know how big the truncation errors can be. In general, it is not easy to estimate errors unless we have a good quantum computer. In this paper we show that traditional sampling methods on classical devices, specifically Markov Chain Monte Carlo, can address this issue with a reasonable amount of computational resources available today. As a demonstration, we apply this idea to the scalar field theory on a two-dimensional lattice, with a size that goes beyond what is achievable using exact diagonalization methods. This method can be used to estimate the resources needed for realistic quantum simulations of bosonic theories, and also, to check the validity of the results of the corresponding quantum simulations.
    Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery. (arXiv:2210.03516v2 [cs.NE] UPDATED)
    Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks. However, these policies tend to be overfit to the exact specifications of the task and environment they were trained on, and thus do not perform well when conditions deviate slightly or when composed hierarchically to solve even more complex tasks. Recent work has shown that training a mixture of policies, as opposed to a single one, that are driven to explore different regions of the state-action space can address this shortcoming by generating a diverse set of behaviors, referred to as skills, that can be collectively used to great effect in adaptation tasks or for hierarchical planning. This is typically realized by including a diversity term - often derived from information theory - in the objective function optimized by RL. However these approaches often require careful hyperparameter tuning to be effective. In this work, we demonstrate that less widely-used neuroevolution methods, specifically Quality Diversity (QD), are a competitive alternative to information-theory-augmented RL for skill discovery. Through an extensive empirical evaluation comparing eight state-of-the-art algorithms (four flagship algorithms from each line of work) on the basis of (i) metrics directly evaluating the skills' diversity, (ii) the skills' performance on adaptation tasks, and (iii) the skills' performance when used as primitives for hierarchical planning; QD methods are found to provide equal, and sometimes improved, performance whilst being less sensitive to hyperparameters and more scalable. As no single method is found to provide near-optimal performance across all environments, there is a rich scope for further research which we support by proposing future directions and providing optimized open-source implementations.
    Implicit Graphon Neural Representation. (arXiv:2211.03329v3 [cs.LG] UPDATED)
    Graphons are general and powerful models for generating graphs of varying size. In this paper, we propose to directly model graphons using neural networks, obtaining Implicit Graphon Neural Representation (IGNR). Existing work in modeling and reconstructing graphons often approximates a target graphon by a fixed resolution piece-wise constant representation. Our IGNR has the benefit that it can represent graphons up to arbitrary resolutions, and enables natural and efficient generation of arbitrary sized graphs with desired structure once the model is learned. Furthermore, we allow the input graph data to be unaligned and have different sizes by leveraging the Gromov-Wasserstein distance. We first demonstrate the effectiveness of our model by showing its superior performance on a graphon learning task. We then propose an extension of IGNR that can be incorporated into an auto-encoder framework, and demonstrate its good performance under a more general setting of graphon learning. We also show that our model is suitable for graph representation learning and graph generation.
    Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture. (arXiv:2301.08243v2 [cs.CV] UPDATED)
    This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semantic representations is the masking strategy; specifically, it is crucial to (a) sample target blocks with sufficiently large scale (semantic), and to (b) use a sufficiently informative (spatially distributed) context block. Empirically, when combined with Vision Transformers, we find I-JEPA to be highly scalable. For instance, we train a ViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strong downstream performance across a wide range of tasks, from linear classification to object counting and depth prediction.
    On Image Segmentation With Noisy Labels: Characterization and Volume Properties of the Optimal Solutions to Accuracy and Dice. (arXiv:2206.06484v4 [cs.CV] UPDATED)
    We study two of the most popular performance metrics in medical image segmentation, Accuracy and Dice, when the target labels are noisy. For both metrics, several statements related to characterization and volume properties of the set of optimal segmentations are proved, and associated experiments are provided. Our main insights are: (i) the volume of the solutions to both metrics may deviate significantly from the expected volume of the target, (ii) the volume of a solution to Accuracy is always less than or equal to the volume of a solution to Dice and (iii) the optimal solutions to both of these metrics coincide when the set of feasible segmentations is constrained to the set of segmentations with the volume equal to the expected volume of the target.
    Neural Network Entropy (NNetEn): EEG Signals and Chaotic Time Series Separation by Entropy Features, Python Package for NNetEn Calculation. (arXiv:2303.17995v1 [cs.LG])
    Entropy measures are effective features for time series classification problems. Traditional entropy measures, such as Shannon entropy, use probability distribution function. However, for the effective separation of time series, new entropy estimation methods are required to characterize the chaotic dynamic of the system. Our concept of Neural Network Entropy (NNetEn) is based on the classification of special datasets (MNIST-10 and SARS-CoV-2-RBV1) in relation to the entropy of the time series recorded in the reservoir of the LogNNet neural network. NNetEn estimates the chaotic dynamics of time series in an original way. Based on the NNetEn algorithm, we propose two new classification metrics: R2 Efficiency and Pearson Efficiency. The efficiency of NNetEn is verified on separation of two chaotic time series of sine mapping using dispersion analysis (ANOVA). For two close dynamic time series (r = 1.1918 and r = 1.2243), the F-ratio has reached the value of 124 and reflects high efficiency of the introduced method in classification problems. The EEG signal classification for healthy persons and patients with Alzheimer disease illustrates the practical application of the NNetEn features. Our computations demonstrate the synergistic effect of increasing classification accuracy when applying traditional entropy measures and the NNetEn concept conjointly. An implementation of the algorithms in Python is presented.
    Deep neural operator for learning transient response of interpenetrating phase composites subject to dynamic loading. (arXiv:2303.18055v1 [cs.LG])
    Additive manufacturing has been recognized as an industrial technological revolution for manufacturing, which allows fabrication of materials with complex three-dimensional (3D) structures directly from computer-aided design models. The mechanical properties of interpenetrating phase composites (IPCs), especially response to dynamic loading, highly depend on their 3D structures. In general, for each specified structural design, it could take hours or days to perform either finite element analysis (FEA) or experiments to test the mechanical response of IPCs to a given dynamic load. To accelerate the physics-based prediction of mechanical properties of IPCs for various structural designs, we employ a deep neural operator (DNO) to learn the transient response of IPCs under dynamic loading as surrogate of physics-based FEA models. We consider a 3D IPC beam formed by two metals with a ratio of Young's modulus of 2.7, wherein random blocks of constituent materials are used to demonstrate the generality and robustness of the DNO model. To obtain FEA results of IPC properties, 5,000 random time-dependent strain loads generated by a Gaussian process kennel are applied to the 3D IPC beam, and the reaction forces and stress fields inside the IPC beam under various loading are collected. Subsequently, the DNO model is trained using an incremental learning method with sequence-to-sequence training implemented in JAX, leading to a 100X speedup compared to widely used vanilla deep operator network models. After an offline training, the DNO model can act as surrogate of physics-based FEA to predict the transient mechanical response in terms of reaction force and stress distribution of the IPCs to various strain loads in one second at an accuracy of 98%. Also, the learned operator is able to provide extended prediction of the IPC beam subject to longer random strain loads at a reasonably well accuracy.
    GVP: Generative Volumetric Primitives. (arXiv:2303.18193v1 [cs.CV])
    Advances in 3D-aware generative models have pushed the boundary of image synthesis with explicit camera control. To achieve high-resolution image synthesis, several attempts have been made to design efficient generators, such as hybrid architectures with both 3D and 2D components. However, such a design compromises multiview consistency, and the design of a pure 3D generator with high resolution is still an open problem. In this work, we present Generative Volumetric Primitives (GVP), the first pure 3D generative model that can sample and render 512-resolution images in real-time. GVP jointly models a number of volumetric primitives and their spatial information, both of which can be efficiently generated via a 2D convolutional network. The mixture of these primitives naturally captures the sparsity and correspondence in the 3D volume. The training of such a generator with a high degree of freedom is made possible through a knowledge distillation technique. Experiments on several datasets demonstrate superior efficiency and 3D consistency of GVP over the state-of-the-art.
    Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach. (arXiv:2303.17949v1 [cs.SD])
    Automatic detection of machine anomaly remains challenging for machine learning. We believe the capability of generative adversarial network (GAN) suits the need of machine audio anomaly detection, yet rarely has this been investigated by previous work. In this paper, we propose AEGAN-AD, a totally unsupervised approach in which the generator (also an autoencoder) is trained to reconstruct input spectrograms. It is pointed out that the denoising nature of reconstruction deprecates its capacity. Thus, the discriminator is redesigned to aid the generator during both training stage and detection stage. The performance of AEGAN-AD on the dataset of DCASE 2022 Challenge TASK 2 demonstrates the state-of-the-art result on five machine types. A novel anomaly localization method is also investigated. Source code available at: www.github.com/jianganbai/AEGAN-AD
    Differentially Private Vertical Federated Clustering. (arXiv:2208.01700v2 [cs.CR] UPDATED)
    In many applications, multiple parties have private data regarding the same set of users but on disjoint sets of attributes, and a server wants to leverage the data to train a model. To enable model learning while protecting the privacy of the data subjects, we need vertical federated learning (VFL) techniques, where the data parties share only information for training the model, instead of the private data. However, it is challenging to ensure that the shared information maintains privacy while learning accurate models. To the best of our knowledge, the algorithm proposed in this paper is the first practical solution for differentially private vertical federated k-means clustering, where the server can obtain a set of global centers with a provable differential privacy guarantee. Our algorithm assumes an untrusted central server that aggregates differentially private local centers and membership encodings from local data parties. It builds a weighted grid as the synopsis of the global dataset based on the received information. Final centers are generated by running any k-means algorithm on the weighted grid. Our approach for grid weight estimation uses a novel, light-weight, and differentially private set intersection cardinality estimation algorithm based on the Flajolet-Martin sketch. To improve the estimation accuracy in the setting with more than two data parties, we further propose a refined version of the weights estimation algorithm and a parameter tuning strategy to reduce the final k-means utility to be close to that in the central private setting. We provide theoretical utility analysis and experimental evaluation results for the cluster centers computed by our algorithm and show that our approach performs better both theoretically and empirically than the two baselines based on existing techniques.
    Artificial Intelligence in Ovarian Cancer Histopathology: A Systematic Review. (arXiv:2303.18005v1 [eess.IV])
    Purpose - To characterise and assess the quality of published research evaluating artificial intelligence (AI) methods for ovarian cancer diagnosis or prognosis using histopathology data. Methods - A search of 5 sources was conducted up to 01/12/2022. The inclusion criteria required that research evaluated AI on histopathology images for diagnostic or prognostic inferences in ovarian cancer, including tubo-ovarian and peritoneal tumours. Reviews and non-English language articles were excluded. The risk of bias was assessed for every included model using PROBAST. Results - A total of 1434 research articles were identified, of which 36 were eligible for inclusion. These studies reported 62 models of interest, including 35 classifiers, 14 survival prediction models, 7 segmentation models, and 6 regression models. Models were developed using 1-1375 slides from 1-664 ovarian cancer patients. A wide array of outcomes were predicted, including overall survival (9/62), histological subtypes (7/62), stain quantity (6/62) and malignancy (5/62). Older studies used traditional machine learning (ML) models with hand-crafted features, while newer studies typically employed deep learning (DL) to automatically learn features and predict the outcome(s) of interest. All models were found to be at high or unclear risk of bias overall. Research was frequently limited by insufficient reporting, small sample sizes, and insufficient validation. Conclusion - Limited research has been conducted and none of the associated models have been demonstrated to be ready for real-world implementation. Recommendations are provided addressing underlying biases and flaws in study design, which should help inform higher-quality reproducible future research. Key aspects include more transparent and comprehensive reporting, and improved performance evaluation using cross-validation and external validations.
    NOSTROMO: Lessons learned, conclusions and way forward. (arXiv:2303.18060v1 [cs.LG])
    This White Paper sets out to explain the value that metamodelling can bring to air traffic management (ATM) research. It will define metamodelling and explore what it can, and cannot, do. The reader is assumed to have basic knowledge of SESAR: the Single European Sky ATM Research project. An important element of SESAR, as the technological pillar of the Single European Sky initiative, is to bring about improvements, as measured through specific key performance indicators (KPIs), and as implemented by a series of so-called SESAR 'Solutions'. These 'Solutions' are new or improved operational procedures or technologies, designed to meet operational and performance improvements described in the European ATM Master Plan.
    On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks. (arXiv:2303.17805v1 [cs.LG])
    In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy for the optimization path of gradient descent initialized with zero. In this paper, we study a modification of the regularization path for infinite-width 2-layer ReLU neural networks with non-zero initial distribution of the weights at different scales. By exploiting a link with unbalanced optimal transport theory, we show that, despite the non-convexity of the 2-layer network training, this problem admits an infinite dimensional convex counterpart. We formulate the corresponding functional optimization problem and investigate its main properties. In particular, we show that as the scale of the initialization ranges between $0$ and $+\infty$, the associated path interpolates continuously between the so-called kernel and rich regimes. The numerical experiments confirm that, in our setting, the scaling path and the final states of the optimization path behave similarly even beyond these extreme points.
    Towards Verifying the Geometric Robustness of Large-scale Neural Networks. (arXiv:2301.12456v2 [cs.LG] UPDATED)
    Deep neural networks (DNNs) are known to be vulnerable to adversarial geometric transformation. This paper aims to verify the robustness of large-scale DNNs against the combination of multiple geometric transformations with a provable guarantee. Given a set of transformations (e.g., rotation, scaling, etc.), we develop GeoRobust, a black-box robustness analyser built upon a novel global optimisation strategy, for locating the worst-case combination of transformations that affect and even alter a network's output. GeoRobust can provide provable guarantees on finding the worst-case combination based on recent advances in Lipschitzian theory. Due to its black-box nature, GeoRobust can be deployed on large-scale DNNs regardless of their architectures, activation functions, and the number of neurons. In practice, GeoRobust can locate the worst-case geometric transformation with high precision for the ResNet50 model on ImageNet in a few seconds on average. We examined 18 ImageNet classifiers, including the ResNet family and vision transformers, and found a positive correlation between the geometric robustness of the networks and the parameter numbers. We also observe that increasing the depth of DNN is more beneficial than increasing its width in terms of improving its geometric robustness. Our tool GeoRobust is available at https://github.com/TrustAI/GeoRobust.
    Time-series Anomaly Detection based on Difference Subspace between Signal Subspaces. (arXiv:2303.17802v1 [cs.LG])
    This paper proposes a new method for anomaly detection in time-series data by incorporating the concept of difference subspace into the singular spectrum analysis (SSA). The key idea is to monitor slight temporal variations of the difference subspace between two signal subspaces corresponding to the past and present time-series data, as anomaly score. It is a natural generalization of the conventional SSA-based method which measures the minimum angle between the two signal subspaces as the degree of changes. By replacing the minimum angle with the difference subspace, our method boosts the performance while using the SSA-based framework as it can capture the whole structural difference between the two subspaces in its magnitude and direction. We demonstrate our method's effectiveness through performance evaluations on public time-series datasets.
    CoSMo: a Framework for Implementing Conditioned Process Simulation Models. (arXiv:2303.17879v1 [cs.AI])
    Process simulation is an analysis tool in process mining that allows users to measure the impact of changes, prevent losses, and update the process without risks or costs. In the literature, several process simulation techniques are available and they are usually built upon process models discovered from a given event log or learned via deep learning. Each group of approaches has its own strengths and limitations. The former is usually restricted to the control-flow but it is more interpretable, whereas the latter is not interpretable by nature but has a greater generalization capability on large event logs. Despite the great performance achieved by deep learning approaches, they are still not suitable to be applied to real scenarios and generate value for users. This issue is mainly due to fact their stochasticity is hard to control. To address this problem, we propose the CoSMo framework for implementing process simulation models fully based on deep learning. This framework enables simulating event logs that satisfy a constraint by conditioning the learning phase of a deep neural network. Throughout experiments, the simulation is validated from both control-flow and data-flow perspectives, demonstrating the proposed framework's capability of simulating cases while satisfying imposed conditions.
    Exploring the Potential of Large Language models in Traditional Korean Medicine: A Foundation Model Approach to Culturally-Adapted Healthcare. (arXiv:2303.17807v1 [cs.CL])
    Introduction: Traditional Korean medicine (TKM) emphasizes individualized diagnosis and treatment, making AI modeling difficult due to limited data and implicit processes. GPT-3.5 and GPT-4, large language models, have shown impressive medical knowledge despite lacking medicine-specific training. This study aimed to assess the capabilities of GPT-3.5 and GPT-4 for TKM using the Korean National Licensing Examination for Korean Medicine Doctors. Methods: GPT-3.5 (February 2023) and GPT-4 (March 2023) models answered 340 questions from the 2022 examination across 12 subjects. Each question was independently evaluated five times in an initialized session. Results: GPT-3.5 and GPT-4 achieved 42.06% and 57.29% accuracy, respectively, with GPT-4 nearing passing performance. There were significant differences in accuracy by subjects, with 83.75% accuracy for neuropsychiatry compared to 28.75% for internal medicine (2). Both models showed high accuracy in recall-based and diagnosis-based questions but struggled with intervention-based ones. The accuracy for questions that require TKM-specialized knowledge was relatively lower than the accuracy for questions that do not GPT-4 showed high accuracy for table-based questions, and both models demonstrated consistent responses. A positive correlation between consistency and accuracy was observed. Conclusion: Models in this study showed near-passing performance in decision-making for TKM without domain-specific training. However, limits were also observed that were believed to be caused by culturally-biased learning. Our study suggests that foundation models have potential in culturally-adapted medicine, specifically TKM, for clinical assistance, medical education, and medical research.
    A Closer Look at Parameter-Efficient Tuning in Diffusion Models. (arXiv:2303.18181v1 [cs.CV])
    Large-scale diffusion models like Stable Diffusion are powerful and find various real-world applications while customizing such models by fine-tuning is both memory and time inefficient. Motivated by the recent progress in natural language processing, we investigate parameter-efficient tuning in large diffusion models by inserting small learnable modules (termed adapters). In particular, we decompose the design space of adapters into orthogonal factors -- the input position, the output position as well as the function form, and perform Analysis of Variance (ANOVA), a classical statistical approach for analyzing the correlation between discrete (design options) and continuous variables (evaluation metrics). Our analysis suggests that the input position of adapters is the critical factor influencing the performance of downstream tasks. Then, we carefully study the choice of the input position, and we find that putting the input position after the cross-attention block can lead to the best performance, validated by additional visualization analyses. Finally, we provide a recipe for parameter-efficient tuning in diffusion models, which is comparable if not superior to the fully fine-tuned baseline (e.g., DreamBooth) with only 0.75 \% extra parameters, across various customized tasks.
    An interpretable neural network-based non-proportional odds model for ordinal regression with continuous response. (arXiv:2303.17823v1 [stat.ME])
    This paper proposes an interpretable neural network-based non-proportional odds model (N$^3$POM) for ordinal regression, where the response variable can take not only discrete but also continuous values, and the regression coefficients vary depending on the predicting ordinal response. In contrast to conventional approaches estimating the linear coefficients of regression directly from the discrete response, we train a non-linear neural network that outputs the linear coefficients by taking the response as its input. By virtue of the neural network, N$^3$POM may have flexibility while preserving the interpretability of the conventional ordinal regression. We show a sufficient condition so that the predicted conditional cumulative probability~(CCP) satisfies the monotonicity constraint locally over a user-specified region in the covariate space; we also provide a monotonicity-preserving stochastic (MPS) algorithm for training the neural network adequately.
    A Benchmark Generative Probabilistic Model for Weak Supervised Learning. (arXiv:2303.17841v1 [cs.LG])
    Finding relevant and high-quality datasets to train machine learning models is a major bottleneck for practitioners. Furthermore, to address ambitious real-world use-cases there is usually the requirement that the data come labelled with high-quality annotations that can facilitate the training of a supervised model. Manually labelling data with high-quality labels is generally a time-consuming and challenging task and often this turns out to be the bottleneck in a machine learning project. Weak Supervised Learning (WSL) approaches have been developed to alleviate the annotation burden by offering an automatic way of assigning approximate labels (pseudo-labels) to unlabelled data based on heuristics, distant supervision and knowledge bases. We apply probabilistic generative latent variable models (PLVMs), trained on heuristic labelling representations of the original dataset, as an accurate, fast and cost-effective way to generate pseudo-labels. We show that the PLVMs achieve state-of-the-art performance across four datasets. For example, they achieve 22% points higher F1 score than Snorkel in the class-imbalanced Spouse dataset. PLVMs are plug-and-playable and are a drop-in replacement to existing WSL frameworks (e.g. Snorkel) or they can be used as benchmark models for more complicated algorithms, giving practitioners a compelling accuracy boost.
    Continual Learning of Multi-modal Dynamics with External Memory. (arXiv:2203.00936v3 [cs.LG] UPDATED)
    We study the problem of fitting a model to a dynamical environment when new modes of behavior emerge sequentially. The learning model is aware when a new mode appears, but it does not have access to the true modes of individual training sequences. The state-of-the-art continual learning approaches cannot handle this setup, because parameter transfer suffers from catastrophic interference and episodic memory design requires the knowledge of the ground-truth modes of sequences. We devise a novel continual learning method that overcomes both limitations by maintaining a descriptor of the mode of an encountered sequence in a neural episodic memory. We employ a Dirichlet Process prior on the attention weights of the memory to foster efficient storage of the mode descriptors. Our method performs continual learning by transferring knowledge across tasks by retrieving the descriptors of similar modes of past tasks to the mode of a current sequence and feeding this descriptor into its transition kernel as control input. We observe the continual learning performance of our method to compare favorably to the mainstream parameter transfer approach.
    Dual Cross-Attention for Medical Image Segmentation. (arXiv:2303.17696v1 [eess.IV])
    We propose Dual Cross-Attention (DCA), a simple yet effective attention module that is able to enhance skip-connections in U-Net-based architectures for medical image segmentation. DCA addresses the semantic gap between encoder and decoder features by sequentially capturing channel and spatial dependencies across multi-scale encoder features. First, the Channel Cross-Attention (CCA) extracts global channel-wise dependencies by utilizing cross-attention across channel tokens of multi-scale encoder features. Then, the Spatial Cross-Attention (SCA) module performs cross-attention to capture spatial dependencies across spatial tokens. Finally, these fine-grained encoder features are up-sampled and connected to their corresponding decoder parts to form the skip-connection scheme. Our proposed DCA module can be integrated into any encoder-decoder architecture with skip-connections such as U-Net and its variants. We test our DCA module by integrating it into six U-Net-based architectures such as U-Net, V-Net, R2Unet, ResUnet++, DoubleUnet and MultiResUnet. Our DCA module shows Dice Score improvements up to 2.05% on GlaS, 2.74% on MoNuSeg, 1.37% on CVC-ClinicDB, 1.12% on Kvasir-Seg and 1.44% on Synapse datasets. Our codes are available at: https://github.com/gorkemcanates/Dual-Cross-Attention
    Traffic Sign Recognition Dataset and Data Augmentation. (arXiv:2303.18037v1 [cs.CV])
    Although there are many datasets for traffic sign classification, there are few datasets collected for traffic sign recognition and few of them obtain enough instances especially for training a model with the deep learning method. The deep learning method is almost the only way to train a model for real-world usage that covers various highly similar classes compared with the traditional way such as through color, shape, etc. Also, for some certain sign classes, their sign meanings were destined to can't get enough instances in the dataset. To solve this problem, we purpose a unique data augmentation method for the traffic sign recognition dataset that takes advantage of the standard of the traffic sign. We called it TSR dataset augmentation. We based on the benchmark Tsinghua-Tencent 100K (TT100K) dataset to verify the unique data augmentation method. we performed the method on four main iteration version datasets based on the TT100K dataset and the experimental results showed our method is efficacious. The iteration version datasets based on TT100K, data augmentation method source code and the training results introduced in this paper are publicly available.
    Towards Global Neural Network Abstractions with Locally-Exact Reconstruction. (arXiv:2210.12054v2 [cs.LG] UPDATED)
    Neural networks are a powerful class of non-linear functions. However, their black-box nature makes it difficult to explain their behaviour and certify their safety. Abstraction techniques address this challenge by transforming the neural network into a simpler, over-approximated function. Unfortunately, existing abstraction techniques are slack, which limits their applicability to small local regions of the input domain. In this paper, we propose Global Interval Neural Network Abstractions with Center-Exact Reconstruction (GINNACER). Our novel abstraction technique produces sound over-approximation bounds over the whole input domain while guaranteeing exact reconstructions for any given local input. Our experiments show that GINNACER is several orders of magnitude tighter than state-of-the-art global abstraction techniques, while being competitive with local ones.
    Understanding the Diffusion Objective as a Weighted Integral of ELBOs. (arXiv:2303.00848v3 [cs.LG] UPDATED)
    Diffusion models in the literature are optimized with various objectives that are special cases of a weighted loss, where the weighting function specifies the weight per noise level. Uniform weighting corresponds to maximizing the ELBO, a principled approximation of maximum likelihood. In current practice diffusion models are optimized with non-uniform weighting due to better results in terms of sample quality. In this work we expose a direct relationship between the weighted loss (with any weighting) and the ELBO objective. We show that the weighted loss can be written as a weighted integral of ELBOs, with one ELBO per noise level. If the weighting function is monotonic, then the weighted loss is a likelihood-based objective: it maximizes the ELBO under simple data augmentation, namely Gaussian noise perturbation. Our main contribution is a deeper theoretical understanding of the diffusion objective, but we also performed some experiments comparing monotonic with non-monotonic weightings, finding that monotonic weighting performs competitively with the best published results.
    Neural signature kernels as infinite-width-depth-limits of controlled ResNets. (arXiv:2303.17671v1 [math.DS])
    Motivated by the paradigm of reservoir computing, we consider randomly initialized controlled ResNets defined as Euler-discretizations of neural controlled differential equations (Neural CDEs). We show that in the infinite-width-then-depth limit and under proper scaling, these architectures converge weakly to Gaussian processes indexed on some spaces of continuous paths and with kernels satisfying certain partial differential equations (PDEs) varying according to the choice of activation function. In the special case where the activation is the identity, we show that the equation reduces to a linear PDE and the limiting kernel agrees with the signature kernel of Salvi et al. (2021). In this setting, we also show that the width-depth limits commute. We name this new family of limiting kernels neural signature kernels. Finally, we show that in the infinite-depth regime, finite-width controlled ResNets converge in distribution to Neural CDEs with random vector fields which, depending on whether the weights are shared across layers, are either time-independent and Gaussian or behave like a matrix-valued Brownian motion.
    $\beta^{4}$-IRT: A New $\beta^{3}$-IRT with Enhanced Discrimination Estimation. (arXiv:2303.17731v1 [cs.LG])
    Item response theory aims to estimate respondent's latent skills from their responses in tests composed of items with different levels of difficulty. Several models of item response theory have been proposed for different types of tasks, such as binary or probabilistic responses, response time, multiple responses, among others. In this paper, we propose a new version of $\beta^3$-IRT, called $\beta^{4}$-IRT, which uses the gradient descent method to estimate the model parameters. In $\beta^3$-IRT, abilities and difficulties are bounded, thus we employ link functions in order to turn $\beta^{4}$-IRT into an unconstrained gradient descent process. The original $\beta^3$-IRT had a symmetry problem, meaning that, if an item was initialised with a discrimination value with the wrong sign, e.g. negative when the actual discrimination should be positive, the fitting process could be unable to recover the correct discrimination and difficulty values for the item. In order to tackle this limitation, we modelled the discrimination parameter as the product of two new parameters, one corresponding to the sign and the second associated to the magnitude. We also proposed sensible priors for all parameters. We performed experiments to compare $\beta^{4}$-IRT and $\beta^3$-IRT regarding parameter recovery and our new version outperformed the original $\beta^3$-IRT. Finally, we made $\beta^{4}$-IRT publicly available as a Python package, along with the implementation of $\beta^3$-IRT used in our experiments.
    Inferring networks from time series: a neural approach. (arXiv:2303.18059v1 [cs.LG])
    Network structures underlie the dynamics of many complex phenomena, from gene regulation and foodwebs to power grids and social media. Yet, as they often cannot be observed directly, their connectivities must be inferred from observations of their emergent dynamics. In this work we present a powerful and fast computational method to infer large network adjacency matrices from time series data using a neural network. Using a neural network provides uncertainty quantification on the prediction in a manner that reflects both the non-convexity of the inference problem as well as the noise on the data. This is useful since network inference problems are typically underdetermined, and a feature that has hitherto been lacking from network inference methods. We demonstrate our method's capabilities by inferring line failure locations in the British power grid from observations of its response to a power cut. Since the problem is underdetermined, many classical statistical tools (e.g. regression) will not be straightforwardly applicable. Our method, in contrast, provides probability densities on each edge, allowing the use of hypothesis testing to make meaningful probabilistic statements about the location of the power cut. We also demonstrate our method's ability to learn an entire cost matrix for a non-linear model from a dataset of economic activity in Greater London. Our method outperforms OLS regression on noisy data in terms of both speed and prediction accuracy, and scales as $N^2$ where OLS is cubic. Since our technique is not specifically engineered for network inference, it represents a general parameter estimation scheme that is applicable to any parameter dimension.
    Decentralized Weakly Convex Optimization Over the Stiefel Manifold. (arXiv:2303.17779v1 [math.OC])
    We focus on a class of non-smooth optimization problems over the Stiefel manifold in the decentralized setting, where a connected network of $n$ agents cooperatively minimize a finite-sum objective function with each component being weakly convex in the ambient Euclidean space. Such optimization problems, albeit frequently encountered in applications, are quite challenging due to their non-smoothness and non-convexity. To tackle them, we propose an iterative method called the decentralized Riemannian subgradient method (DRSM). The global convergence and an iteration complexity of $\mathcal{O}(\varepsilon^{-2} \log^2(\varepsilon^{-1}))$ for forcing a natural stationarity measure below $\varepsilon$ are established via the powerful tool of proximal smoothness from variational analysis, which could be of independent interest. Besides, we show the local linear convergence of the DRSM using geometrically diminishing stepsizes when the problem at hand further possesses a sharpness property. Numerical experiments are conducted to corroborate our theoretical findings.
    Data-driven abstractions via adaptive refinements and a Kantorovich metric [extended version]. (arXiv:2303.17618v1 [cs.LG])
    We introduce an adaptive refinement procedure for smart, and scalable abstraction of dynamical systems. Our technique relies on partitioning the state space depending on the observation of future outputs. However, this knowledge is dynamically constructed in an adaptive, asymmetric way. In order to learn the optimal structure, we define a Kantorovich-inspired metric between Markov chains, and we use it as a loss function. Our technique is prone to data-driven frameworks, but not restricted to. We also study properties of the above mentioned metric between Markov chains, which we believe could be of application for wider purpose. We propose an algorithm to approximate it, and we show that our method yields a much better computational complexity than using classical linear programming techniques.
    A Note On Nonlinear Regression Under L2 Loss. (arXiv:2303.17745v1 [cs.LG])
    We investigate the nonlinear regression problem under L2 loss (square loss) functions. Traditional nonlinear regression models often result in non-convex optimization problems with respect to the parameter set. We show that a convex nonlinear regression model exists for the traditional least squares problem, which can be a promising towards designing more complex systems with easier to train models.
    BOLT: An Automated Deep Learning Framework for Training and Deploying Large-Scale Neural Networks on Commodity CPU Hardware. (arXiv:2303.17727v1 [cs.LG])
    Efficient large-scale neural network training and inference on commodity CPU hardware is of immense practical significance in democratizing deep learning (DL) capabilities. Presently, the process of training massive models consisting of hundreds of millions to billions of parameters requires the extensive use of specialized hardware accelerators, such as GPUs, which are only accessible to a limited number of institutions with considerable financial resources. Moreover, there is often an alarming carbon footprint associated with training and deploying these models. In this paper, we address these challenges by introducing BOLT, a sparse deep learning library for training massive neural network models on standard CPU hardware. BOLT provides a flexible, high-level API for constructing models that will be familiar to users of existing popular DL frameworks. By automatically tuning specialized hyperparameters, BOLT also abstracts away the algorithmic details of sparse network training. We evaluate BOLT on a number of machine learning tasks drawn from recommendations, search, natural language processing, and personalization. We find that our proposed system achieves competitive performance with state-of-the-art techniques at a fraction of the cost and energy consumption and an order-of-magnitude faster inference time. BOLT has also been successfully deployed by multiple businesses to address critical problems, and we highlight one customer deployment case study in the field of e-commerce.
    An evaluation of time series forecasting models on water consumption data: A case study of Greece. (arXiv:2303.17617v1 [cs.LG])
    In recent years, the increased urbanization and industrialization has led to a rising water demand and resources, thus increasing the gap between demand and supply. Proper water distribution and forecasting of water consumption are key factors in mitigating the imbalance of supply and demand by improving operations, planning and management of water resources. To this end, in this paper, several well-known forecasting algorithms are evaluated over time series, water consumption data from Greece, a country with diverse socio-economic and urbanization issues. The forecasting algorithms are evaluated on a real-world dataset provided by the Water Supply and Sewerage Company of Greece revealing key insights about each algorithm and its use.
    Supervised Training of Conditional Monge Maps. (arXiv:2206.14262v2 [cs.LG] UPDATED)
    Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another. That theory has been mostly used to estimate, given a pair of source and target probability measures $(\mu, \nu)$, a parameterized map $T_\theta$ that can efficiently map $\mu$ onto $\nu$. In many applications, such as predicting cell responses to treatments, pairs of input/output data measures $(\mu, \nu)$ that define optimal transport problems do not arise in isolation but are associated with a context $c$, as for instance a treatment when comparing populations of untreated and treated cells. To account for that context in OT estimation, we introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable, using several pairs of measures $\left(\mu_i, \nu_i\right)$ tagged with a context label $c_i$. CondOT learns a global map $\mathcal{T}_\theta$ conditioned on context that is not only expected to fit all labeled pairs in the dataset $\left\{\left(c_i,\left(\mu_i, \nu_i\right)\right)\right\}$, i.e., $\mathcal{T}_\theta\left(c_i\right) \sharp \mu_i \approx \nu_i$, but should also generalize to produce meaningful maps $\mathcal{T}_\theta\left(c_{\text {new }}\right)$ when conditioned on unseen contexts $c_{\text {new }}$. Our approach harnesses and provides a novel usage for partially input convex neural networks, for which we introduce a robust and efficient initialization strategy inspired by Gaussian approximations. We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells, using only observations of the effects of said perturbations separately.
    Factor-augmented tree ensembles. (arXiv:2111.14000v5 [stat.ML] UPDATED)
    This manuscript proposes to extend the information set of time-series regression trees with latent stationary factors extracted via state-space methods. In doing so, this approach generalises time-series regression trees on two dimensions. First, it allows to handle predictors that exhibit measurement error, non-stationary trends, seasonality and/or irregularities such as missing observations. Second, it gives a transparent way for using domain-specific theory to inform time-series regression trees. Empirically, ensembles of these factor-augmented trees provide a reliable approach for macro-finance problems. This article highlights it focussing on the lead-lag effect between equity volatility and the business cycle in the United States.
    Simple Sorting Criteria Help Find the Causal Order in Additive Noise Models. (arXiv:2303.18211v1 [stat.ML])
    Additive Noise Models (ANM) encode a popular functional assumption that enables learning causal structure from observational data. Due to a lack of real-world data meeting the assumptions, synthetic ANM data are often used to evaluate causal discovery algorithms. Reisach et al. (2021) show that, for common simulation parameters, a variable ordering by increasing variance is closely aligned with a causal order and introduce var-sortability to quantify the alignment. Here, we show that not only variance, but also the fraction of a variable's variance explained by all others, as captured by the coefficient of determination $R^2$, tends to increase along the causal order. Simple baseline algorithms can use $R^2$-sortability to match the performance of established methods. Since $R^2$-sortability is invariant under data rescaling, these algorithms perform equally well on standardized or rescaled data, addressing a key limitation of algorithms exploiting var-sortability. We characterize and empirically assess $R^2$-sortability for different simulation parameters. We show that all simulation parameters can affect $R^2$-sortability and must be chosen deliberately to control the difficulty of the causal discovery task and the real-world plausibility of the simulated data. We provide an implementation of the sortability measures and sortability-based algorithms in our library CausalDisco (https://github.com/CausalDisco/CausalDisco).
    Ensemble Methods for Multi-Organ Segmentation in CT Series. (arXiv:2303.17956v1 [eess.IV])
    In the medical images field, semantic segmentation is one of the most important, yet difficult and time-consuming tasks to be performed by physicians. Thanks to the recent advancement in the Deep Learning models regarding Computer Vision, the promise to automate this kind of task is getting more and more realistic. However, many problems are still to be solved, like the scarce availability of data and the difficulty to extend the efficiency of highly specialised models to general scenarios. Organs at risk segmentation for radiotherapy treatment planning falls in this category, as the limited data available negatively affects the possibility to develop general-purpose models; in this work, we focus on the possibility to solve this problem by presenting three types of ensembles of single-organ models able to produce multi-organ masks exploiting the different specialisations of their components. The results obtained are promising and prove that this is a possible solution to finding efficient multi-organ segmentation methods.
    Solving morphological analogies: from retrieval to generation. (arXiv:2303.18062v1 [cs.CL])
    Analogical inference is a remarkable capability of human reasoning, and has been used to solve hard reasoning tasks. Analogy based reasoning (AR) has gained increasing interest from the artificial intelligence community and has shown its potential in multiple machine learning tasks such as classification, decision making and recommendation with competitive results. We propose a deep learning (DL) framework to address and tackle two key tasks in AR: analogy detection and solving. The framework is thoroughly tested on the Siganalogies dataset of morphological analogical proportions (APs) between words, and shown to outperform symbolic approaches in many languages. Previous work have explored the behavior of the Analogy Neural Network for classification (ANNc) on analogy detection and of the Analogy Neural Network for retrieval (ANNr) on analogy solving by retrieval, as well as the potential of an autoencoder (AE) for analogy solving by generating the solution word. In this article we summarize these findings and we extend them by combining ANNr and the AE embedding model, and checking the performance of ANNc as an retrieval method. The combination of ANNr and AE outperforms the other approaches in almost all cases, and ANNc as a retrieval method achieves competitive or better performance than 3CosMul. We conclude with general guidelines on using our framework to tackle APs with DL.
    An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. (arXiv:2303.17819v1 [eess.SY])
    In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time LQR problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently exciting input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. Finally, a method to determine a stabilizing policy to initialize the algorithm using only measured data is proposed.
    Physics and Chemistry from Parsimonious Representations: Image Analysis via Invariant Variational Autoencoders. (arXiv:2303.18236v1 [cs.LG])
    Electron, optical, and scanning probe microscopy methods are generating ever increasing volume of image data containing information on atomic and mesoscale structures and functionalities. This necessitates the development of the machine learning methods for discovery of physical and chemical phenomena from the data, such as manifestations of symmetry breaking in electron and scanning tunneling microscopy images, variability of the nanoparticles. Variational autoencoders (VAEs) are emerging as a powerful paradigm for the unsupervised data analysis, allowing to disentangle the factors of variability and discover optimal parsimonious representation. Here, we summarize recent developments in VAEs, covering the basic principles and intuition behind the VAEs. The invariant VAEs are introduced as an approach to accommodate scale and translation invariances present in imaging data and separate known factors of variations from the ones to be discovered. We further describe the opportunities enabled by the control over VAE architecture, including conditional, semi-supervised, and joint VAEs. Several case studies of VAE applications for toy models and experimental data sets in Scanning Transmission Electron Microscopy are discussed, emphasizing the deep connection between VAE and basic physical principles. All the codes used here are available at https://github.com/saimani5/VAE-tutorials and this article can be used as an application guide when applying these to own data sets.
    Continual evaluation for lifelong learning: Identifying the stability gap. (arXiv:2205.13452v2 [cs.LG] UPDATED)
    Time-dependent data-generating distributions have proven to be difficult for gradient-based training of neural networks, as the greedy updates result in catastrophic forgetting of previously learned knowledge. Despite the progress in the field of continual learning to overcome this forgetting, we show that a set of common state-of-the-art methods still suffers from substantial forgetting upon starting to learn new tasks, except that this forgetting is temporary and followed by a phase of performance recovery. We refer to this intriguing but potentially problematic phenomenon as the stability gap. The stability gap had likely remained under the radar due to standard practice in the field of evaluating continual learning models only after each task. Instead, we establish a framework for continual evaluation that uses per-iteration evaluation and we define a new set of metrics to quantify worst-case performance. Empirically we show that experience replay, constraint-based replay, knowledge-distillation, and parameter regularization methods are all prone to the stability gap; and that the stability gap can be observed in class-, task-, and domain-incremental learning benchmarks. Additionally, a controlled experiment shows that the stability gap increases when tasks are more dissimilar. Finally, by disentangling gradients into plasticity and stability components, we propose a conceptual explanation for the stability gap.
    Scalable Bayesian Meta-Learning through Generalized Implicit Gradients. (arXiv:2303.17768v1 [cs.LG])
    Meta-learning owns unique effectiveness and swiftness in tackling emerging tasks with limited data. Its broad applicability is revealed by viewing it as a bi-level optimization problem. The resultant algorithmic viewpoint however, faces scalability issues when the inner-level optimization relies on gradient-based iterations. Implicit differentiation has been considered to alleviate this challenge, but it is restricted to an isotropic Gaussian prior, and only favors deterministic meta-learning approaches. This work markedly mitigates the scalability bottleneck by cross-fertilizing the benefits of implicit differentiation to probabilistic Bayesian meta-learning. The novel implicit Bayesian meta-learning (iBaML) method not only broadens the scope of learnable priors, but also quantifies the associated uncertainty. Furthermore, the ultimate complexity is well controlled regardless of the inner-level optimization trajectory. Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one. Extensive numerical tests are also carried out to empirically validate the performance of the proposed method.
    Optimal training of integer-valued neural networks with mixed integer programming. (arXiv:2009.03825v5 [cs.LG] UPDATED)
    Recent work has shown potential in using Mixed Integer Programming (MIP) solvers to optimize certain aspects of neural networks (NNs). However the intriguing approach of training NNs with MIP solvers is under-explored. State-of-the-art-methods to train NNs are typically gradient-based and require significant data, computation on GPUs, and extensive hyper-parameter tuning. In contrast, training with MIP solvers does not require GPUs or heavy hyper-parameter tuning, but currently cannot handle anything but small amounts of data. This article builds on recent advances that train binarized NNs using MIP solvers. We go beyond current work by formulating new MIP models which improve training efficiency and which can train the important class of integer-valued neural networks (INNs). We provide two novel methods to further the potential significance of using MIP to train NNs. The first method optimizes the number of neurons in the NN while training. This reduces the need for deciding on network architecture before training. The second method addresses the amount of training data which MIP can feasibly handle: we provide a batch training method that dramatically increases the amount of data that MIP solvers can use to train. We thus provide a promising step towards using much more data than before when training NNs using MIP models. Experimental results on two real-world data-limited datasets demonstrate that our approach strongly outperforms the previous state of the art in training NN with MIP, in terms of accuracy, training time and amount of data. Our methodology is proficient at training NNs when minimal training data is available, and at training with minimal memory requirements -- which is potentially valuable for deploying to low-memory devices.
    How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization. (arXiv:2210.06441v2 [cs.LG] UPDATED)
    Despite the clear performance benefits of data augmentations, little is known about why they are so effective. In this paper, we disentangle several key mechanisms through which data augmentations operate. Establishing an exchange rate between augmented and additional real data, we find that in out-of-distribution testing scenarios, augmentations which yield samples that are diverse, but inconsistent with the data distribution can be even more valuable than additional training data. Moreover, we find that data augmentations which encourage invariances can be more valuable than invariance alone, especially on small and medium sized training sets. Following this observation, we show that augmentations induce additional stochasticity during training, effectively flattening the loss landscape.
    Learning-based Observer Evaluated on the Kinematic Bicycle Model. (arXiv:2303.17933v1 [cs.RO])
    The knowledge of the states of a vehicle is a necessity to perform proper planning and control. These quantities are usually accessible through measurements. Control theory brings extremely useful methods -- observers -- to deal with quantities that cannot be directly measured or with noisy measurements. Classical observers are mathematically derived from models. In spite of their success, such as the Kalman filter, they show their limits when systems display high non-linearities, modeling errors, high uncertainties or difficult interactions with the environment (e.g. road contact). In this work, we present a method to build a learning-based observer able to outperform classical observing methods. We compare several neural network architectures and define the data generation procedure used to train them. The method is evaluated on a kinematic bicycle model which allows to easily generate data for training and testing. This model is also used in an Extended Kalman Filter (EKF) for comparison of the learning-based observer with a state of the art model-based observer. The results prove the interest of our approach and pave the way for future improvements of the technique.
    Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency. (arXiv:2303.18191v1 [cs.CR])
    Deep neural networks are proven to be vulnerable to backdoor attacks. Detecting the trigger samples during the inference stage, i.e., the test-time trigger sample detection, can prevent the backdoor from being triggered. However, existing detection methods often require the defenders to have high accessibility to victim models, extra clean data, or knowledge about the appearance of backdoor triggers, limiting their practicality. In this paper, we propose the test-time corruption robustness consistency evaluation (TeCo), a novel test-time trigger sample detection method that only needs the hard-label outputs of the victim models without any extra information. Our journey begins with the intriguing observation that the backdoor-infected models have similar performance across different image corruptions for the clean images, but perform discrepantly for the trigger samples. Based on this phenomenon, we design TeCo to evaluate test-time robustness consistency by calculating the deviation of severity that leads to predictions' transition across different corruptions. Extensive experiments demonstrate that compared with state-of-the-art defenses, which even require either certain information about the trigger types or accessibility of clean data, TeCo outperforms them on different backdoor attacks, datasets, and model architectures, enjoying a higher AUROC by 10% and 5 times of stability.
    Implementation and (Inverse Modified) Error Analysis for implicitly-templated ODE-nets. (arXiv:2303.17824v1 [math.NA])
    We focus on learning hidden dynamics from data using ODE-nets templated on implicit numerical initial value problem solvers. First, we perform Inverse Modified error analysis of the ODE-nets using unrolled implicit schemes for ease of interpretation. It is shown that training an ODE-net using an unrolled implicit scheme returns a close approximation of an Inverse Modified Differential Equation (IMDE). In addition, we establish a theoretical basis for hyper-parameter selection when training such ODE-nets, whereas current strategies usually treat numerical integration of ODE-nets as a black box. We thus formulate an adaptive algorithm which monitors the level of error and adapts the number of (unrolled) implicit solution iterations during the training process, so that the error of the unrolled approximation is less than the current learning loss. This helps accelerate training, while maintaining accuracy. Several numerical experiments are performed to demonstrate the advantages of the proposed algorithm compared to nonadaptive unrollings, and validate the theoretical analysis. We also note that this approach naturally allows for incorporating partially known physical terms in the equations, giving rise to what is termed ``gray box" identification.
    Promoting Non-Cooperation Through Ordering. (arXiv:2303.17971v1 [cs.MA])
    In many real world situations, like minor traffic offenses in big cities, a central authority is tasked with periodic administering punishments to a large number of individuals. Common practice is to give each individual a chance to suffer a smaller fine and be guaranteed to avoid the legal process with probable considerably larger punishment. However, thanks to the large number of offenders and a limited capacity of the central authority, the individual risk is typically small and a rational individual will not choose to pay the fine. Here we show that if the central authority processes the offenders in a publicly known order, it properly incentives the offenders to pay the fine. We show analytically and on realistic experiments that our mechanism promotes non-cooperation and incentives individuals to pay. Moreover, the same holds for an arbitrary coalition. We quantify the expected total payment the central authority receives, and show it increases considerably.
    Simple Domain Generalization Methods are Strong Baselines for Open Domain Generalization. (arXiv:2303.18031v1 [cs.CV])
    In real-world applications, a machine learning model is required to handle an open-set recognition (OSR), where unknown classes appear during the inference, in addition to a domain shift, where the distribution of data differs between the training and inference phases. Domain generalization (DG) aims to handle the domain shift situation where the target domain of the inference phase is inaccessible during model training. Open domain generalization (ODG) takes into account both DG and OSR. Domain-Augmented Meta-Learning (DAML) is a method targeting ODG but has a complicated learning process. On the other hand, although various DG methods have been proposed, they have not been evaluated in ODG situations. This work comprehensively evaluates existing DG methods in ODG and shows that two simple DG methods, CORrelation ALignment (CORAL) and Maximum Mean Discrepancy (MMD), are competitive with DAML in several cases. In addition, we propose simple extensions of CORAL and MMD by introducing the techniques used in DAML, such as ensemble learning and Dirichlet mixup data augmentation. The experimental evaluation demonstrates that the extended CORAL and MMD can perform comparably to DAML with lower computational costs. This suggests that the simple DG methods and their simple extensions are strong baselines for ODG. The code used in the experiments is available at https://github.com/shiralab/OpenDG-Eval.
    Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem. (arXiv:2303.17708v1 [cs.SE])
    Software engineers develop, fine-tune, and deploy deep learning (DL) models. They use and re-use models in a variety of development frameworks and deploy them on a range of runtime environments. In this diverse ecosystem, engineers use DL model converters to move models from frameworks to runtime environments. However, errors in converters can compromise model quality and disrupt deployment. The failure frequency and failure modes of DL model converters are unknown. In this paper, we conduct the first failure analysis on DL model converters. Specifically, we characterize failures in model converters associated with ONNX (Open Neural Network eXchange). We analyze past failures in the ONNX converters in two major DL frameworks, PyTorch and TensorFlow. The symptoms, causes, and locations of failures (for N=200 issues), and trends over time are also reported. We also evaluate present-day failures by converting 8,797 models, both real-world and synthetically generated instances. The consistent result from both parts of the study is that DL model converters commonly fail by producing models that exhibit incorrect behavior: 33% of past failures and 8% of converted models fell into this category. Our results motivate future research on making DL software simpler to maintain, extend, and validate.
    TPMCF: Temporal QoS Prediction using Multi-Source Collaborative Features. (arXiv:2303.18201v1 [cs.SE])
    Recently, with the rapid deployment of service APIs, personalized service recommendations have played a paramount role in the growth of the e-commerce industry. Quality-of-Service (QoS) parameters determining the service performance, often used for recommendation, fluctuate over time. Thus, the QoS prediction is essential to identify a suitable service among functionally equivalent services over time. The contemporary temporal QoS prediction methods hardly achieved the desired accuracy due to various limitations, such as the inability to handle data sparsity and outliers and capture higher-order temporal relationships among user-service interactions. Even though some recent recurrent neural-network-based architectures can model temporal relationships among QoS data, prediction accuracy degrades due to the absence of other features (e.g., collaborative features) to comprehend the relationship among the user-service interactions. This paper addresses the above challenges and proposes a scalable strategy for Temporal QoS Prediction using Multi-source Collaborative-Features (TPMCF), achieving high prediction accuracy and faster responsiveness. TPMCF combines the collaborative-features of users/services by exploiting user-service relationship with the spatio-temporal auto-extracted features by employing graph convolution and transformer encoder with multi-head self-attention. We validated our proposed method on WS-DREAM-2 datasets. Extensive experiments showed TPMCF outperformed major state-of-the-art approaches regarding prediction accuracy while ensuring high scalability and reasonably faster responsiveness.
    Benchmarking FedAvg and FedCurv for Image Classification Tasks. (arXiv:2303.17942v1 [cs.LG])
    Classic Machine Learning techniques require training on data available in a single data lake. However, aggregating data from different owners is not always convenient for different reasons, including security, privacy and secrecy. Data carry a value that might vanish when shared with others; the ability to avoid sharing the data enables industrial applications where security and privacy are of paramount importance, making it possible to train global models by implementing only local policies which can be run independently and even on air-gapped data centres. Federated Learning (FL) is a distributed machine learning approach which has emerged as an effective way to address privacy concerns by only sharing local AI models while keeping the data decentralized. Two critical challenges of Federated Learning are managing the heterogeneous systems in the same federated network and dealing with real data, which are often not independently and identically distributed (non-IID) among the clients. In this paper, we focus on the second problem, i.e., the problem of statistical heterogeneity of the data in the same federated network. In this setting, local models might be strayed far from the local optimum of the complete dataset, thus possibly hindering the convergence of the federated model. Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv), aiming at tackling the non-IID setting, have already been proposed. This work provides an empirical assessment of the behaviour of FedAvg and FedCurv in common non-IID scenarios. Results show that the number of epochs per round is an important hyper-parameter that, when tuned appropriately, can lead to significant performance gains while reducing the communication cost. As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
    Why is the winner the best?. (arXiv:2303.17719v1 [cs.CV])
    International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multi-center study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and postprocessing (66%). The "typical" lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work.  ( 4 min )
    Ensemble weather forecast post-processing with a flexible probabilistic neural network approach. (arXiv:2303.17610v1 [cs.LG])
    Ensemble forecast post-processing is a necessary step in producing accurate probabilistic forecasts. Conventional post-processing methods operate by estimating the parameters of a parametric distribution, frequently on a per-location or per-lead-time basis. We propose a novel, neural network-based method, which produces forecasts for all locations and lead times, jointly. To relax the distributional assumption of many post-processing methods, our approach incorporates normalizing flows as flexible parametric distribution estimators. This enables us to model varying forecast distributions in a mathematically exact way. We demonstrate the effectiveness of our method in the context of the EUPPBench benchmark, where we conduct temperature forecast post-processing for stations in a sub-region of western Europe. We show that our novel method exhibits state-of-the-art performance on the benchmark, outclassing our previous, well-performing entry. Additionally, by providing a detailed comparison of three variants of our novel post-processing method, we elucidate the reasons why our method outperforms per-lead-time-based approaches and approaches with distributional assumptions.  ( 2 min )
    Generating Adversarial Samples in Mini-Batches May Be Detrimental To Adversarial Robustness. (arXiv:2303.17720v1 [cs.LG])
    Neural networks have been proven to be both highly effective within computer vision, and highly vulnerable to adversarial attacks. Consequently, as the use of neural networks increases due to their unrivaled performance, so too does the threat posed by adversarial attacks. In this work, we build towards addressing the challenge of adversarial robustness by exploring the relationship between the mini-batch size used during adversarial sample generation and the strength of the adversarial samples produced. We demonstrate that an increase in mini-batch size results in a decrease in the efficacy of the samples produced, and we draw connections between these observations and the phenomenon of vanishing gradients. Next, we formulate loss functions such that adversarial sample strength is not degraded by mini-batch size. Our findings highlight a potential risk for underestimating the true (practical) strength of adversarial attacks, and a risk of overestimating a model's robustness. We share our codes to let others replicate our experiments and to facilitate further exploration of the connections between batch size and adversarial sample strength.  ( 2 min )
    Aligning a medium-size GPT model in English to a small closed domain in Spanish using reinforcement learning. (arXiv:2303.17649v1 [cs.CL])
    In this paper, we propose a methodology to align a medium-sized GPT model, originally trained in English for an open domain, to a small closed domain in Spanish. The application for which the model is finely tuned is the question answering task. To achieve this we also needed to train and implement another neural network (which we called the reward model) that could score and determine whether an answer is appropriate for a given question. This component served to improve the decoding and generation of the answers of the system. Numerical metrics such as BLEU and perplexity were used to evaluate the model, and human judgment was also used to compare the decoding technique with others. Finally, the results favored the proposed method, and it was determined that it is feasible to use a reward model to align the generation of responses.  ( 2 min )
    FairGen: Towards Fair Graph Generation. (arXiv:2303.17743v1 [cs.LG])
    There have been tremendous efforts over the past decades dedicated to the generation of realistic graphs in a variety of domains, ranging from social networks to computer networks, from gene regulatory networks to online transaction networks. Despite the remarkable success, the vast majority of these works are unsupervised in nature and are typically trained to minimize the expected graph reconstruction loss, which would result in the representation disparity issue in the generated graphs, i.e., the protected groups (often minorities) contribute less to the objective and thus suffer from systematically higher errors. In this paper, we aim to tailor graph generation to downstream mining tasks by leveraging label information and user-preferred parity constraint. In particular, we start from the investigation of representation disparity in the context of graph generative models. To mitigate the disparity, we propose a fairness-aware graph generative model named FairGen. Our model jointly trains a label-informed graph generation module and a fair representation learning module by progressively learning the behaviors of the protected and unprotected groups, from the `easy' concepts to the `hard' ones. In addition, we propose a generic context sampling strategy for graph generative models, which is proven to be capable of fairly capturing the contextual information of each group with a high probability. Experimental results on seven real-world data sets, including web-based graphs, demonstrate that FairGen (1) obtains performance on par with state-of-the-art graph generative models across six network properties, (2) mitigates the representation disparity issues in the generated graphs, and (3) substantially boosts the model performance by up to 17% in downstream tasks via data augmentation.  ( 3 min )
    Machine learning for discovering laws of nature. (arXiv:2303.17607v1 [cs.LG])
    A microscopic particle obeys the principles of quantum mechanics -- so where is the sharp boundary between the macroscopic and microscopic worlds? It was this "interpretation problem" that prompted Schr\"odinger to propose his famous thought experiment (a cat that is simultaneously both dead and alive) and sparked a great debate about the quantum measurement problem, and there is still no satisfactory answer yet. This is precisely the inadequacy of rigorous mathematical models in describing the laws of nature. We propose a computational model to describe and understand the laws of nature based on Darwin's natural selection. In fact, whether it's a macro particle, a micro electron or a security, they can all be considered as an entity, the change of this entity over time can be described by a data series composed of states and values. An observer can learn from this data series to construct theories (usually consisting of functions and differential equations). We don't model with the usual functions or differential equations, but with a state Decision Tree (determines the state of an entity) and a value Function Tree (determines the distance between two points of an entity). A state Decision Tree and a value Function Tree together can reconstruct an entity's trajectory and make predictions about its future trajectory. Our proposed algorithmic model discovers laws of nature by only learning observed historical data (sequential measurement of observables) based on maximizing the observer's expected value. There is no differential equation in our model; our model has an emphasis on machine learning, where the observer builds up his/her experience by being rewarded or punished for each decision he/she makes, and eventually leads to rediscovering Newton's law, the Born rule (quantum mechanics) and the efficient market hypothesis (financial market).  ( 3 min )
    Exact Characterization of the Convex Hulls of Reachable Sets. (arXiv:2303.17674v1 [math.OC])
    We study the convex hulls of reachable sets of nonlinear systems with bounded disturbances. Reachable sets play a critical role in control, but remain notoriously challenging to compute, and existing over-approximation tools tend to be conservative or computationally expensive. In this work, we exactly characterize the convex hulls of reachable sets as the convex hulls of solutions of an ordinary differential equation from all possible initial values of the disturbances. This finite-dimensional characterization unlocks a tight estimation algorithm to over-approximate reachable sets that is significantly faster and more accurate than existing methods. We present applications to neural feedback loop analysis and robust model predictive control.  ( 2 min )
    Self-Refine: Iterative Refinement with Self-Feedback. (arXiv:2303.17651v1 [cs.CL])
    Like people, LLMs do not always generate the best text for a given generation problem on their first try (e.g., summaries, answers, explanations). Just as people then refine their text, we introduce SELF-REFINE, a framework for similarly improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an output using an LLM, then allow the same model to provide multi-aspect feedback for its own output; finally, the same model refines its previously generated output given its own feedback. Unlike earlier work, our iterative refinement framework does not require supervised training data or reinforcement learning, and works with a single LLM. We experiment with 7 diverse tasks, ranging from review rewriting to math reasoning, demonstrating that our approach outperforms direct generation. In all tasks, outputs generated with SELF-REFINE are preferred by humans and by automated metrics over those generated directly with GPT-3.5 and GPT-4, improving on average by absolute 20% across tasks.  ( 2 min )
    A Characterization of Online Multiclass Learnability. (arXiv:2303.17716v1 [cs.LG])
    We consider the problem of online multiclass learning when the number of labels is unbounded. We show that the Multiclass Littlestone dimension, first introduced in \cite{DanielyERMprinciple}, continues to characterize online learnability in this setting. Our result complements the recent work by \cite{Brukhimetal2022} who give a characterization of batch multiclass learnability when the label space is unbounded.  ( 2 min )
    Task Oriented Conversational Modelling With Subjective Knowledge. (arXiv:2303.17695v1 [cs.CL])
    Existing conversational models are handled by a database(DB) and API based systems. However, very often users' questions require information that cannot be handled by such systems. Nonetheless, answers to these questions are available in the form of customer reviews and FAQs. DSTC-11 proposes a three stage pipeline consisting of knowledge seeking turn detection, knowledge selection and response generation to create a conversational model grounded on this subjective knowledge. In this paper, we focus on improving the knowledge selection module to enhance the overall system performance. In particular, we propose entity retrieval methods which result in an accurate and faster knowledge search. Our proposed Named Entity Recognition (NER) based entity retrieval method results in 7X faster search compared to the baseline model. Additionally, we also explore a potential keyword extraction method which can improve the accuracy of knowledge selection. Preliminary results show a 4 \% improvement in exact match score on knowledge selection task. The code is available https://github.com/raja-kumar/knowledge-grounded-TODS  ( 2 min )
    Generalized Information Bottleneck for Gaussian Variables. (arXiv:2303.17762v1 [cs.IT])
    The information bottleneck (IB) method offers an attractive framework for understanding representation learning, however its applications are often limited by its computational intractability. Analytical characterization of the IB method is not only of practical interest, but it can also lead to new insights into learning phenomena. Here we consider a generalized IB problem, in which the mutual information in the original IB method is replaced by correlation measures based on Renyi and Jeffreys divergences. We derive an exact analytical IB solution for the case of Gaussian correlated variables. Our analysis reveals a series of structural transitions, similar to those previously observed in the original IB case. We find further that although solving the original, Renyi and Jeffreys IB problems yields different representations in general, the structural transitions occur at the same critical tradeoff parameters, and the Renyi and Jeffreys IB solutions perform well under the original IB objective. Our results suggest that formulating the IB method with alternative correlation measures could offer a strategy for obtaining an approximate solution to the original IB problem.  ( 2 min )
    Patterns Detection in Glucose Time Series by Domain Transformations and Deep Learning. (arXiv:2303.17616v1 [cs.LG])
    People with diabetes have to manage their blood glucose level to keep it within an appropriate range. Predicting whether future glucose values will be outside the healthy threshold is of vital importance in order to take corrective actions to avoid potential health damage. In this paper we describe our research with the aim of predicting the future behavior of blood glucose levels, so that hypoglycemic events may be anticipated. The approach of this work is the application of transformation functions on glucose time series, and their use in convolutional neural networks. We have tested our proposed method using real data from 4 different diabetes patients with promising results.  ( 2 min )
    Practical Policy Optimization with Personalized Experimentation. (arXiv:2303.17648v1 [cs.LG])
    Many organizations measure treatment effects via an experimentation platform to evaluate the casual effect of product variations prior to full-scale deployment. However, standard experimentation platforms do not perform optimally for end user populations that exhibit heterogeneous treatment effects (HTEs). Here we present a personalized experimentation framework, Personalized Experiments (PEX), which optimizes treatment group assignment at the user level via HTE modeling and sequential decision policy optimization to optimize multiple short-term and long-term outcomes simultaneously. We describe an end-to-end workflow that has proven to be successful in practice and can be readily implemented using open-source software.  ( 2 min )
    CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society. (arXiv:2303.17760v1 [cs.AI])
    The rapid advancement of conversational and chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents and provide insight into their "cognitive" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of chat agents, providing a valuable resource for investigating conversational language models. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond. The GitHub repository of this project is made publicly available on: https://github.com/lightaime/camel.  ( 2 min )
    Progress towards an improved particle flow algorithm at CMS with machine learning. (arXiv:2303.17657v1 [physics.data-an])
    The particle-flow (PF) algorithm, which infers particles based on tracks and calorimeter clusters, is of central importance to event reconstruction in the CMS experiment at the CERN LHC, and has been a focus of development in light of planned Phase-2 running conditions with an increased pileup and detector granularity. In recent years, the machine learned particle-flow (MLPF) algorithm, a graph neural network that performs PF reconstruction, has been explored in CMS, with the possible advantages of directly optimizing for the physical quantities of interest, being highly reconfigurable to new conditions, and being a natural fit for deployment to heterogeneous accelerators. We discuss progress in CMS towards an improved implementation of the MLPF reconstruction, now optimized using generator/simulation-level particle information as the target for the first time. This paves the way to potentially improving the detector response in terms of physical quantities of interest. We describe the simulation-based training target, progress and studies on event-based loss terms, details on the model hyperparameter tuning, as well as physics validation with respect to the current PF algorithm in terms of high-level physical quantities such as the jet and missing transverse momentum resolutions. We find that the MLPF algorithm, trained on a generator/simulator level particle information for the first time, results in broadly compatible particle and jet reconstruction performance with the baseline PF, setting the stage for improving the physics performance by additional training statistics and model tuning.  ( 3 min )
    MetaEnhance: Metadata Quality Improvement for Electronic Theses and Dissertations of University Libraries. (arXiv:2303.17661v1 [cs.DL])
    Metadata quality is crucial for digital objects to be discovered through digital library interfaces. However, due to various reasons, the metadata of digital objects often exhibits incomplete, inconsistent, and incorrect values. We investigate methods to automatically detect, correct, and canonicalize scholarly metadata, using seven key fields of electronic theses and dissertations (ETDs) as a case study. We propose MetaEnhance, a framework that utilizes state-of-the-art artificial intelligence methods to improve the quality of these fields. To evaluate MetaEnhance, we compiled a metadata quality evaluation benchmark containing 500 ETDs, by combining subsets sampled using multiple criteria. We tested MetaEnhance on this benchmark and found that the proposed methods achieved nearly perfect F1-scores in detecting errors and F1-scores in correcting errors ranging from 0.85 to 1.00 for five of seven fields.  ( 2 min )
    Optimal Input Gain: All You Need to Supercharge a Feed-Forward Neural Network. (arXiv:2303.17732v1 [cs.LG])
    Linear transformation of the inputs alters the training performance of feed-forward networks that are otherwise equivalent. However, most linear transforms are viewed as a pre-processing operation separate from the actual training. Starting from equivalent networks, it is shown that pre-processing inputs using linear transformation are equivalent to multiplying the negative gradient matrix with an autocorrelation matrix per training iteration. Second order method is proposed to find the autocorrelation matrix that maximizes learning in a given iteration. When the autocorrelation matrix is diagonal, the method optimizes input gains. This optimal input gain (OIG) approach is used to improve two first-order two-stage training algorithms, namely back-propagation (BP) and hidden weight optimization (HWO), which alternately update the input weights and solve linear equations for output weights. Results show that the proposed OIG approach greatly enhances the performance of the first-order algorithms, often allowing them to rival the popular Levenberg-Marquardt approach with far less computation. It is shown that HWO is equivalent to BP with Whitening transformation applied to the inputs. HWO effectively combines Whitening transformation with learning. Thus, OIG improved HWO could be a significant building block to more complex deep learning architectures.  ( 2 min )
    oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes. (arXiv:2303.17612v1 [cs.CL])
    In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models, which allows Natural Language Processing (NLP) practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. Specifically, oBERTa extends existing work on pruning, knowledge distillation, and quantization and leverages frozen embeddings to improve knowledge distillation, and improved model initialization to deliver higher accuracy on a a broad range of transfer tasks. In generating oBERTa, we explore how the highly optimized RoBERTa differs from the BERT with respect to pruning during pre-training and fine-tuning and find it less amenable to compression during fine-tuning. We explore the use of oBERTa on a broad seven representative NLP tasks and find that the improved compression techniques allow a pruned oBERTa model to match the performance of BERTBASE and exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering dataset, despite being 8x and 2x, respectively, faster in inference. We release our code, training regimes, and associated model for broad usage to encourage usage and experimentation.  ( 2 min )
    Mitigating Source Bias for Fairer Weak Supervision. (arXiv:2303.17713v1 [cs.LG])
    Weak supervision overcomes the label bottleneck, enabling efficient development of training sets. Millions of models trained on such datasets have been deployed in the real world and interact with users on a daily basis. However, the techniques that make weak supervision attractive -- such as integrating any source of signal to estimate unknown labels -- also ensure that the pseudolabels it produces are highly biased. Surprisingly, given everyday use and the potential for increased bias, weak supervision has not been studied from the point of view of fairness. This work begins such a study. Our departure point is the observation that even when a fair model can be built from a dataset with access to ground-truth labels, the corresponding dataset labeled via weak supervision can be arbitrarily unfair. Fortunately, not all is lost: we propose and empirically validate a model for source unfairness in weak supervision, then introduce a simple counterfactual fairness-based technique that can mitigate these biases. Theoretically, we show that it is possible for our approach to simultaneously improve both accuracy and fairness metrics -- in contrast to standard fairness approaches that suffer from tradeoffs. Empirically, we show that our technique improves accuracy on weak supervision baselines by as much as 32% while reducing demographic parity gap by 82.5%.  ( 3 min )
    Utilizing Reinforcement Learning for de novo Drug Design. (arXiv:2303.17615v1 [q-bio.BM])
    Deep learning-based approaches for generating novel drug molecules with specific properties have gained a lot of interest in the last years. Recent studies have demonstrated promising performance for string-based generation of novel molecules utilizing reinforcement learning. In this paper, we develop a unified framework for using reinforcement learning for de novo drug design, wherein we systematically study various on- and off-policy reinforcement learning algorithms and replay buffers to learn an RNN-based policy to generate novel molecules predicted to be active against the dopamine receptor DRD2. Our findings suggest that it is advantageous to use at least both top-scoring and low-scoring molecules for updating the policy when structural diversity is essential. Using all generated molecules at an iteration seems to enhance performance stability for on-policy algorithms. In addition, when replaying high, intermediate, and low-scoring molecules, off-policy algorithms display the potential of improving the structural diversity and number of active molecules generated, but possibly at the cost of a longer exploration phase. Our work provides an open-source framework enabling researchers to investigate various reinforcement learning methods for de novo drug design.  ( 2 min )
  • Open

    TAP-Vid: A Benchmark for Tracking Any Point in a Video. (arXiv:2211.03726v2 [cs.CV] UPDATED)
    Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation existed, until now. In this paper, we first formalize the problem, naming it tracking any point (TAP). We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. Central to the construction of our benchmark is a novel semi-automatic crowdsourced pipeline which uses optical flow estimates to compensate for easier, short-term motion like camera shake, allowing annotators to focus on harder sections of video. We validate our pipeline on synthetic data and propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.  ( 2 min )
    Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data. (arXiv:2301.00437v3 [cs.LG] UPDATED)
    Modern deep neural networks have achieved impressive performance on tasks from image classification to natural language processing. Surprisingly, these complex systems with massive amounts of parameters exhibit the same structural properties in their last-layer features and classifiers across canonical datasets when training until convergence. In particular, it has been observed that the last-layer features collapse to their class-means, and those class-means are the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is known as Neural Collapse ($\mathcal{NC}$). Recent papers have theoretically shown that $\mathcal{NC}$ emerges in the global minimizers of training problems with the simplified ``unconstrained feature model''. In this context, we take a step further and prove the $\mathcal{NC}$ occurrences in deep linear networks for the popular mean squared error (MSE) and cross entropy (CE) losses, showing that global solutions exhibit $\mathcal{NC}$ properties across the linear layers. Furthermore, we extend our study to imbalanced data for MSE loss and present the first geometric analysis of $\mathcal{NC}$ under bias-free setting. Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of orthogonal vectors, whose lengths depend on the amount of data in their corresponding classes. Finally, we empirically validate our theoretical analyses on synthetic and practical network architectures with both balanced and imbalanced scenarios.  ( 3 min )
    Differentially Private Stochastic Convex Optimization in (Non)-Euclidean Space Revisited. (arXiv:2303.18047v1 [cs.LG])
    In this paper, we revisit the problem of Differentially Private Stochastic Convex Optimization (DP-SCO) in Euclidean and general $\ell_p^d$ spaces. Specifically, we focus on three settings that are still far from well understood: (1) DP-SCO over a constrained and bounded (convex) set in Euclidean space; (2) unconstrained DP-SCO in $\ell_p^d$ space; (3) DP-SCO with heavy-tailed data over a constrained and bounded set in $\ell_p^d$ space. For problem (1), for both convex and strongly convex loss functions, we propose methods whose outputs could achieve (expected) excess population risks that are only dependent on the Gaussian width of the constraint set rather than the dimension of the space. Moreover, we also show the bound for strongly convex functions is optimal up to a logarithmic factor. For problems (2) and (3), we propose several novel algorithms and provide the first theoretical results for both cases when $1<p<2$ and $2\leq p\leq \infty$.  ( 2 min )
    A unified recipe for deriving (time-uniform) PAC-Bayes bounds. (arXiv:2302.03421v3 [stat.ML] UPDATED)
    We present a unified framework for deriving PAC-Bayesian generalization bounds. Unlike most previous literature on this topic, our bounds are anytime-valid (i.e., time-uniform), meaning that they hold at all stopping times, not only for a fixed sample size. Our approach combines four tools in the following order: (a) nonnegative supermartingales or reverse submartingales, (b) the method of mixtures, (c) the Donsker-Varadhan formula (or other convex duality principles), and (d) Ville's inequality. Our main result is a PAC-Bayes theorem which holds for a wide class of discrete stochastic processes. We show how this result implies time-uniform versions of well-known classical PAC-Bayes bounds, such as those of Seeger, McAllester, Maurer, and Catoni, in addition to many recent bounds. We also present several novel bounds. Our framework also enables us to relax traditional assumptions; in particular, we consider nonstationary loss functions and non-i.i.d. data. In sum, we unify the derivation of past bounds and ease the search for future bounds: one may simply check if our supermartingale or submartingale conditions are met and, if so, be guaranteed a (time-uniform) PAC-Bayes bound.
    Learning Treatment Effects in Panels with General Intervention Patterns. (arXiv:2106.02780v2 [stat.ML] UPDATED)
    The problem of causal inference with panel data is a central econometric question. The following is a fundamental version of this problem: Let $M^*$ be a low rank matrix and $E$ be a zero-mean noise matrix. For a `treatment' matrix $Z$ with entries in $\{0,1\}$ we observe the matrix $O$ with entries $O_{ij} := M^*_{ij} + E_{ij} + \mathcal{T}_{ij} Z_{ij}$ where $\mathcal{T}_{ij} $ are unknown, heterogenous treatment effects. The problem requires we estimate the average treatment effect $\tau^* := \sum_{ij} \mathcal{T}_{ij} Z_{ij} / \sum_{ij} Z_{ij}$. The synthetic control paradigm provides an approach to estimating $\tau^*$ when $Z$ places support on a single row. This paper extends that framework to allow rate-optimal recovery of $\tau^*$ for general $Z$, thus broadly expanding its applicability. Our guarantees are the first of their type in this general setting. Computational experiments on synthetic and real-world data show a substantial advantage over competing estimators.
    Near-Optimal Learning of Extensive-Form Games with Imperfect Information. (arXiv:2202.01752v2 [cs.LG] UPDATED)
    This paper resolves the open question of designing near-optimal algorithms for learning imperfect-information extensive-form games from bandit feedback. We present the first line of algorithms that require only $\widetilde{\mathcal{O}}((XA+YB)/\varepsilon^2)$ episodes of play to find an $\varepsilon$-approximate Nash equilibrium in two-player zero-sum games, where $X,Y$ are the number of information sets and $A,B$ are the number of actions for the two players. This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors. We achieve this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization. Both algorithms rely on novel approaches of integrating \emph{balanced exploration policies} into their classical counterparts. We also extend our results to learning Coarse Correlated Equilibria in multi-player general-sum games.
    A Method for Evaluating Deep Generative Models of Images via Assessing the Reproduction of High-order Spatial Context. (arXiv:2111.12577v2 [cs.CV] UPDATED)
    Deep generative models (DGMs) have the potential to revolutionize diagnostic imaging. Generative adversarial networks (GANs) are one kind of DGM which are widely employed. The overarching problem with deploying GANs, and other DGMs, in any application that requires domain expertise in order to actually use the generated images is that there generally is not adequate or automatic means of assessing the domain-relevant quality of generated images. In this work, we demonstrate several objective tests of images output by two popular GAN architectures. We designed several stochastic context models (SCMs) of distinct image features that can be recovered after generation by a trained GAN. Several of these features are high-order, algorithmic pixel-arrangement rules which are not readily expressed in covariance matrices. We designed and validated statistical classifiers to detect specific effects of the known arrangement rules. We then tested the rates at which two different GANs correctly reproduced the feature context under a variety of training scenarios, and degrees of feature-class similarity. We found that ensembles of generated images can appear largely accurate visually, and show high accuracy in ensemble measures, while not exhibiting the known spatial arrangements. Furthermore, GANs trained on a spectrum of distinct spatial orders did not respect the given prevalence of those orders in the training data. The main conclusion is that SCMs can be engineered to quantify numerous errors, per image, that may not be captured in ensemble statistics but plausibly can affect subsequent use of the GAN-generated images.
    Inference on eigenvectors of non-symmetric matrices. (arXiv:2303.18233v1 [math.ST])
    This paper argues that the symmetrisability condition in Tyler(1981) is not necessary to establish asymptotic inference procedures for eigenvectors. We establish distribution theory for a Wald and t-test for full-vector and individual coefficient hypotheses, respectively. Our test statistics originate from eigenprojections of non-symmetric matrices. Representing projections as a mapping from the underlying matrix to its spectral data, we find derivatives through analytic perturbation theory. These results demonstrate how the analytic perturbation theory of Sun(1991) is a useful tool in multivariate statistics and are of independent interest. As an application, we define confidence sets for Bonacich centralities estimated from adjacency matrices induced by directed graphs.
    Adaptive Estimators Show Information Compression in Deep Neural Networks. (arXiv:1902.09037v2 [cs.LG] UPDATED)
    To improve how neural networks function it is crucial to understand their learning process. The information bottleneck theory of deep learning proposes that neural networks achieve good generalization by compressing their representations to disregard information that is not relevant to the task. However, empirical evidence for this theory is conflicting, as compression was only observed when networks used saturating activation functions. In contrast, networks with non-saturating activation functions achieved comparable levels of task performance but did not show compression. In this paper we developed more robust mutual information estimation techniques, that adapt to hidden activity of neural networks and produce more sensitive measurements of activations from all functions, especially unbounded functions. Using these adaptive estimation techniques, we explored compression in networks with a range of different activation functions. With two improved methods of estimation, firstly, we show that saturation of the activation function is not required for compression, and the amount of compression varies between different activation functions. We also find that there is a large amount of variation in compression between different network initializations. Secondary, we see that L2 regularization leads to significantly increased compression, while preventing overfitting. Finally, we show that only compression of the last layer is positively correlated with generalization.
    A Note On Nonlinear Regression Under L2 Loss. (arXiv:2303.17745v1 [cs.LG])
    We investigate the nonlinear regression problem under L2 loss (square loss) functions. Traditional nonlinear regression models often result in non-convex optimization problems with respect to the parameter set. We show that a convex nonlinear regression model exists for the traditional least squares problem, which can be a promising towards designing more complex systems with easier to train models.
    Far from Asymptopia. (arXiv:2205.03343v2 [stat.OT] UPDATED)
    Inference from limited data requires a notion of measure on parameter space, most explicit in the Bayesian framework as a prior. Here we demonstrate that Jeffreys prior, the best-known uninformative choice, introduces enormous bias when applied to typical scientific models. Such models have a relevant effective dimensionality much smaller than the number of microscopic parameters. Because Jeffreys prior treats all microscopic parameters equally, it is from uniform when projected onto the sub-space of relevant parameters, due to variations in the local co-volume of irrelevant directions. We present results on a principled choice of measure which avoids this issue, leading to unbiased inference in complex models. This optimal prior depends on the quantity of data to be gathered, and approaches Jeffreys prior in the asymptotic limit. However, this limit cannot be justified without an impossibly large amount of data, exponential in the number of microscopic parameters.
    Rapid prediction of lab-grown tissue properties using deep learning. (arXiv:2303.18017v1 [q-bio.TO])
    The interactions between cells and the extracellular matrix are vital for the self-organisation of tissues. In this paper we present proof-of-concept to use machine learning tools to predict the role of this mechanobiology in the self-organisation of cell-laden hydrogels grown in tethered moulds. We develop a process for the automated generation of mould designs with and without key symmetries. We create a large training set with $N=6500$ cases by running detailed biophysical simulations of cell-matrix interactions using the contractile network dipole orientation (CONDOR) model for the self-organisation of cellular hydrogels within these moulds. These are used to train an implementation of the \texttt{pix2pix} deep learning model, reserving $740$ cases that were unseen in the training of the neural network for training and validation. Comparison between the predictions of the machine learning technique and the reserved predictions from the biophysical algorithm show that the machine learning algorithm makes excellent predictions. The machine learning algorithm is significantly faster than the biophysical method, opening the possibility of very high throughput rational design of moulds for pharmaceutical testing, regenerative medicine and fundamental studies of biology. Future extensions for scaffolds and 3D bioprinting will open additional applications.
    The Topology-Overlap Trade-Off in Retinal Arteriole-Venule Segmentation. (arXiv:2303.18022v1 [cs.CV])
    Retinal fundus images can be an invaluable diagnosis tool for screening epidemic diseases like hypertension or diabetes. And they become especially useful when the arterioles and venules they depict are clearly identified and annotated. However, manual annotation of these vessels is extremely time demanding and taxing, which calls for automatic segmentation. Although convolutional neural networks can achieve high overlap between predictions and expert annotations, they often fail to produce topologically correct predictions of tubular structures. This situation is exacerbated by the bifurcation versus crossing ambiguity which causes classification mistakes. This paper shows that including a topology preserving term in the loss function improves the continuity of the segmented vessels, although at the expense of artery-vein misclassification and overall lower overlap metrics. However, we show that by including an orientation score guided convolutional module, based on the anisotropic single sided cake wavelet, we reduce such misclassification and further increase the topology correctness of the results. We evaluate our model on public datasets with conveniently chosen metrics to assess both overlap and topology correctness, showing that our model is able to produce results on par with state-of-the-art from the point of view of overlap, while increasing topological accuracy.
    BERTino: an Italian DistilBERT model. (arXiv:2303.18121v1 [cs.CL])
    The recent introduction of Transformers language representation models allowed great improvements in many natural language processing (NLP) tasks. However, if on one hand the performances achieved by this kind of architectures are surprising, on the other their usability is limited by the high number of parameters which constitute their network, resulting in high computational and memory demands. In this work we present BERTino, a DistilBERT model which proposes to be the first lightweight alternative to the BERT architecture specific for the Italian language. We evaluated BERTino on the Italian ISDT, Italian ParTUT, Italian WikiNER and multiclass classification tasks, obtaining F1 scores comparable to those obtained by a BERTBASE with a remarkable improvement in training and inference speed.
    Synergistic Graph Fusion via Encoder Embedding. (arXiv:2303.18051v1 [cs.SI])
    In this paper, we introduce a novel approach to multi-graph embedding called graph fusion encoder embedding. The method is designed to work with multiple graphs that share a common vertex set. Under the supervised learning setting, we show that the resulting embedding exhibits a surprising yet highly desirable "synergistic effect": for sufficiently large vertex size, the vertex classification accuracy always benefits from additional graphs. We provide a mathematical proof of this effect under the stochastic block model, and identify the necessary and sufficient condition for asymptotically perfect classification. The simulations and real data experiments confirm the superiority of the proposed method, which consistently outperforms recent benchmark methods in classification.
    Large Dimensional Independent Component Analysis: Statistical Optimality and Computational Tractability. (arXiv:2303.18156v1 [math.ST])
    In this paper, we investigate the optimal statistical performance and the impact of computational constraints for independent component analysis (ICA). Our goal is twofold. On the one hand, we characterize the precise role of dimensionality on sample complexity and statistical accuracy, and how computational consideration may affect them. In particular, we show that the optimal sample complexity is linear in dimensionality, and interestingly, the commonly used sample kurtosis-based approaches are necessarily suboptimal. However, the optimal sample complexity becomes quadratic, up to a logarithmic factor, in the dimension if we restrict ourselves to estimates that can be computed with low-degree polynomial algorithms. On the other hand, we develop computationally tractable estimates that attain both the optimal sample complexity and minimax optimal rates of convergence. We study the asymptotic properties of the proposed estimates and establish their asymptotic normality that can be readily used for statistical inferences. Our method is fairly easy to implement and numerical experiments are presented to further demonstrate its practical merits.
    Evaluation Challenges for Geospatial ML. (arXiv:2303.18087v1 [cs.LG])
    As geospatial machine learning models and maps derived from their predictions are increasingly used for downstream analyses in science and policy, it is imperative to evaluate their accuracy and applicability. Geospatial machine learning has key distinctions from other learning paradigms, and as such, the correct way to measure performance of spatial machine learning outputs has been a topic of debate. In this paper, I delineate unique challenges of model evaluation for geospatial machine learning with global or remotely sensed datasets, culminating in concrete takeaways to improve evaluations of geospatial model performance.
    Time-uniform central limit theory and asymptotic confidence sequences. (arXiv:2103.06476v7 [math.ST] UPDATED)
    Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under very weak assumptions, and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence intervals. To elaborate, our methods take the form of confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time. CSs provide valid inference at arbitrary stopping times, incurring no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, and hence do not enjoy the aforementioned broad applicability of asymptotic confidence intervals. Our work bridges the gap by giving a definition for "asymptotic CSs", and deriving a universal asymptotic CS that requires only weak CLT-like assumptions. While the CLT approximates the distribution of a sample average by that of a Gaussian at a fixed sample size, we use strong invariance principles (stemming from the seminal 1960s work of Strassen and improvements by Koml\'os, Major, and Tusn\'ady) to uniformly approximate the entire sample average process by an implicit Gaussian process. We demonstrate their utility by deriving nonparametric asymptotic CSs for the average treatment effect based on doubly robust estimators in observational studies, for which no nonasymptotic methods can exist even in the fixed-time regime. This enables causal inference that can be continuously monitored and adaptively stopped.
    Learning-Based Optimal Control with Performance Guarantees for Unknown Systems with Latent States. (arXiv:2303.17963v1 [eess.SY])
    As control engineering methods are applied to increasingly complex systems, data-driven approaches for system identification appear as a promising alternative to physics-based modeling. While many of these approaches rely on the availability of state measurements, the states of a complex system are often not directly measurable. It may then be necessary to jointly estimate the dynamics and a latent state, making it considerably more challenging to design controllers with performance guarantees. This paper proposes a novel method for the computation of an optimal input trajectory for unknown nonlinear systems with latent states. Probabilistic performance guarantees are derived for the resulting input trajectory, and an approach to validate the performance of arbitrary control laws is presented. The effectiveness of the proposed method is demonstrated in a numerical simulation.
    Bounded Simplex-Structured Matrix Factorization: Algorithms, Identifiability and Applications. (arXiv:2209.12638v2 [cs.LG] UPDATED)
    In this paper, we propose a new low-rank matrix factorization model dubbed bounded simplex-structured matrix factorization (BSSMF). Given an input matrix $X$ and a factorization rank $r$, BSSMF looks for a matrix $W$ with $r$ columns and a matrix $H$ with $r$ rows such that $X \approx WH$ where the entries in each column of $W$ are bounded, that is, they belong to given intervals, and the columns of $H$ belong to the probability simplex, that is, $H$ is column stochastic. BSSMF generalizes nonnegative matrix factorization (NMF), and simplex-structured matrix factorization (SSMF). BSSMF is particularly well suited when the entries of the input matrix $X$ belong to a given interval; for example when the rows of $X$ represent images, or $X$ is a rating matrix such as in the Netflix and MovieLens datasets where the entries of $X$ belong to the interval $[1,5]$. The simplex-structured matrix $H$ not only leads to an easily understandable decomposition providing a soft clustering of the columns of $X$, but implies that the entries of each column of $WH$ belong to the same intervals as the columns of $W$. In this paper, we first propose a fast algorithm for BSSMF, even in the presence of missing data in $X$. Then we provide identifiability conditions for BSSMF, that is, we provide conditions under which BSSMF admits a unique decomposition, up to trivial ambiguities. Finally, we illustrate the effectiveness of BSSMF on two applications: extraction of features in a set of images, and the matrix completion problem for recommender systems.
    Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness. (arXiv:2303.17765v1 [stat.ML])
    Representation multi-task learning (MTL) and transfer learning (TL) have achieved tremendous success in practice. However, the theoretical understanding of these methods is still lacking. Most existing theoretical works focus on cases where all tasks share the same representation, and claim that MTL and TL almost always improve performance. However, as the number of tasks grow, assuming all tasks share the same representation is unrealistic. Also, this does not always match empirical findings, which suggest that a shared representation may not necessarily improve single-task or target-only learning performance. In this paper, we aim to understand how to learn from tasks with \textit{similar but not exactly the same} linear representations, while dealing with outlier tasks. We propose two algorithms that are \textit{adaptive} to the similarity structure and \textit{robust} to outlier tasks under both MTL and TL settings. Our algorithms outperform single-task or target-only learning when representations across tasks are sufficiently similar and the fraction of outlier tasks is small. Furthermore, they always perform no worse than single-task learning or target-only learning, even when the representations are dissimilar. We provide information-theoretic lower bounds to show that our algorithms are nearly \textit{minimax} optimal in a large regime.
    Mitigating Source Bias for Fairer Weak Supervision. (arXiv:2303.17713v1 [cs.LG])
    Weak supervision overcomes the label bottleneck, enabling efficient development of training sets. Millions of models trained on such datasets have been deployed in the real world and interact with users on a daily basis. However, the techniques that make weak supervision attractive -- such as integrating any source of signal to estimate unknown labels -- also ensure that the pseudolabels it produces are highly biased. Surprisingly, given everyday use and the potential for increased bias, weak supervision has not been studied from the point of view of fairness. This work begins such a study. Our departure point is the observation that even when a fair model can be built from a dataset with access to ground-truth labels, the corresponding dataset labeled via weak supervision can be arbitrarily unfair. Fortunately, not all is lost: we propose and empirically validate a model for source unfairness in weak supervision, then introduce a simple counterfactual fairness-based technique that can mitigate these biases. Theoretically, we show that it is possible for our approach to simultaneously improve both accuracy and fairness metrics -- in contrast to standard fairness approaches that suffer from tradeoffs. Empirically, we show that our technique improves accuracy on weak supervision baselines by as much as 32% while reducing demographic parity gap by 82.5%.
    Microcanonical Langevin Monte Carlo. (arXiv:2303.18221v1 [hep-lat])
    We propose a method for sampling from an arbitrary distribution $\exp[-S(\x)]$ with an available gradient $\nabla S(\x)$, formulated as an energy-preserving stochastic differential equation (SDE). We derive the Fokker-Planck equation and show that both the deterministic drift and the stochastic diffusion separately preserve the stationary distribution. This implies that the drift-diffusion discretization schemes are bias-free, in contrast to the standard Langevin dynamics. We apply the method to the $\phi^4$ lattice field theory, showing the results agree with the standard sampling methods but with significantly higher efficiency compared to the current state-of-the-art samplers.
    Optimal training of integer-valued neural networks with mixed integer programming. (arXiv:2009.03825v5 [cs.LG] UPDATED)
    Recent work has shown potential in using Mixed Integer Programming (MIP) solvers to optimize certain aspects of neural networks (NNs). However the intriguing approach of training NNs with MIP solvers is under-explored. State-of-the-art-methods to train NNs are typically gradient-based and require significant data, computation on GPUs, and extensive hyper-parameter tuning. In contrast, training with MIP solvers does not require GPUs or heavy hyper-parameter tuning, but currently cannot handle anything but small amounts of data. This article builds on recent advances that train binarized NNs using MIP solvers. We go beyond current work by formulating new MIP models which improve training efficiency and which can train the important class of integer-valued neural networks (INNs). We provide two novel methods to further the potential significance of using MIP to train NNs. The first method optimizes the number of neurons in the NN while training. This reduces the need for deciding on network architecture before training. The second method addresses the amount of training data which MIP can feasibly handle: we provide a batch training method that dramatically increases the amount of data that MIP solvers can use to train. We thus provide a promising step towards using much more data than before when training NNs using MIP models. Experimental results on two real-world data-limited datasets demonstrate that our approach strongly outperforms the previous state of the art in training NN with MIP, in terms of accuracy, training time and amount of data. Our methodology is proficient at training NNs when minimal training data is available, and at training with minimal memory requirements -- which is potentially valuable for deploying to low-memory devices.
    A Characterization of Online Multiclass Learnability. (arXiv:2303.17716v1 [cs.LG])
    We consider the problem of online multiclass learning when the number of labels is unbounded. We show that the Multiclass Littlestone dimension, first introduced in \cite{DanielyERMprinciple}, continues to characterize online learnability in this setting. Our result complements the recent work by \cite{Brukhimetal2022} who give a characterization of batch multiclass learnability when the label space is unbounded.
    A Kernel Approach for PDE Discovery and Operator Learning. (arXiv:2210.08140v2 [stat.ML] UPDATED)
    This article presents a three-step framework for learning and solving partial differential equations (PDEs) using kernel methods. Given a training set consisting of pairs of noisy PDE solutions and source/boundary terms on a mesh, kernel smoothing is utilized to denoise the data and approximate derivatives of the solution. This information is then used in a kernel regression model to learn the algebraic form of the PDE. The learned PDE is then used within a kernel based solver to approximate the solution of the PDE with a new source/boundary term, thereby constituting an operator learning framework. Numerical experiments compare the method to state-of-the-art algorithms and demonstrate its competitive performance.
    Continual Learning of Multi-modal Dynamics with External Memory. (arXiv:2203.00936v3 [cs.LG] UPDATED)
    We study the problem of fitting a model to a dynamical environment when new modes of behavior emerge sequentially. The learning model is aware when a new mode appears, but it does not have access to the true modes of individual training sequences. The state-of-the-art continual learning approaches cannot handle this setup, because parameter transfer suffers from catastrophic interference and episodic memory design requires the knowledge of the ground-truth modes of sequences. We devise a novel continual learning method that overcomes both limitations by maintaining a descriptor of the mode of an encountered sequence in a neural episodic memory. We employ a Dirichlet Process prior on the attention weights of the memory to foster efficient storage of the mode descriptors. Our method performs continual learning by transferring knowledge across tasks by retrieving the descriptors of similar modes of past tasks to the mode of a current sequence and feeding this descriptor into its transition kernel as control input. We observe the continual learning performance of our method to compare favorably to the mainstream parameter transfer approach.
    Understanding the Diffusion Objective as a Weighted Integral of ELBOs. (arXiv:2303.00848v3 [cs.LG] UPDATED)
    Diffusion models in the literature are optimized with various objectives that are special cases of a weighted loss, where the weighting function specifies the weight per noise level. Uniform weighting corresponds to maximizing the ELBO, a principled approximation of maximum likelihood. In current practice diffusion models are optimized with non-uniform weighting due to better results in terms of sample quality. In this work we expose a direct relationship between the weighted loss (with any weighting) and the ELBO objective. We show that the weighted loss can be written as a weighted integral of ELBOs, with one ELBO per noise level. If the weighting function is monotonic, then the weighted loss is a likelihood-based objective: it maximizes the ELBO under simple data augmentation, namely Gaussian noise perturbation. Our main contribution is a deeper theoretical understanding of the diffusion objective, but we also performed some experiments comparing monotonic with non-monotonic weightings, finding that monotonic weighting performs competitively with the best published results.
    Factor-augmented tree ensembles. (arXiv:2111.14000v5 [stat.ML] UPDATED)
    This manuscript proposes to extend the information set of time-series regression trees with latent stationary factors extracted via state-space methods. In doing so, this approach generalises time-series regression trees on two dimensions. First, it allows to handle predictors that exhibit measurement error, non-stationary trends, seasonality and/or irregularities such as missing observations. Second, it gives a transparent way for using domain-specific theory to inform time-series regression trees. Empirically, ensembles of these factor-augmented trees provide a reliable approach for macro-finance problems. This article highlights it focussing on the lead-lag effect between equity volatility and the business cycle in the United States.
    Never a Dull Moment: Distributional Properties as a Baseline for Time-Series Classification. (arXiv:2303.17809v1 [stat.ME])
    The variety of complex algorithmic approaches for tackling time-series classification problems has grown considerably over the past decades, including the development of sophisticated but challenging-to-interpret deep-learning-based methods. But without comparison to simpler methods it can be difficult to determine when such complexity is required to obtain strong performance on a given problem. Here we evaluate the performance of an extremely simple classification approach -- a linear classifier in the space of two simple features that ignore the sequential ordering of the data: the mean and standard deviation of time-series values. Across a large repository of 128 univariate time-series classification problems, this simple distributional moment-based approach outperformed chance on 69 problems, and reached 100% accuracy on two problems. With a neuroimaging time-series case study, we find that a simple linear model based on the mean and standard deviation performs better at classifying individuals with schizophrenia than a model that additionally includes features of the time-series dynamics. Comparing the performance of simple distributional features of a time series provides important context for interpreting the performance of complex time-series classification models, which may not always be required to obtain high accuracy.
    Simple Sorting Criteria Help Find the Causal Order in Additive Noise Models. (arXiv:2303.18211v1 [stat.ML])
    Additive Noise Models (ANM) encode a popular functional assumption that enables learning causal structure from observational data. Due to a lack of real-world data meeting the assumptions, synthetic ANM data are often used to evaluate causal discovery algorithms. Reisach et al. (2021) show that, for common simulation parameters, a variable ordering by increasing variance is closely aligned with a causal order and introduce var-sortability to quantify the alignment. Here, we show that not only variance, but also the fraction of a variable's variance explained by all others, as captured by the coefficient of determination $R^2$, tends to increase along the causal order. Simple baseline algorithms can use $R^2$-sortability to match the performance of established methods. Since $R^2$-sortability is invariant under data rescaling, these algorithms perform equally well on standardized or rescaled data, addressing a key limitation of algorithms exploiting var-sortability. We characterize and empirically assess $R^2$-sortability for different simulation parameters. We show that all simulation parameters can affect $R^2$-sortability and must be chosen deliberately to control the difficulty of the causal discovery task and the real-world plausibility of the simulated data. We provide an implementation of the sortability measures and sortability-based algorithms in our library CausalDisco (https://github.com/CausalDisco/CausalDisco).
    An interpretable neural network-based non-proportional odds model for ordinal regression with continuous response. (arXiv:2303.17823v1 [stat.ME])
    This paper proposes an interpretable neural network-based non-proportional odds model (N$^3$POM) for ordinal regression, where the response variable can take not only discrete but also continuous values, and the regression coefficients vary depending on the predicting ordinal response. In contrast to conventional approaches estimating the linear coefficients of regression directly from the discrete response, we train a non-linear neural network that outputs the linear coefficients by taking the response as its input. By virtue of the neural network, N$^3$POM may have flexibility while preserving the interpretability of the conventional ordinal regression. We show a sufficient condition so that the predicted conditional cumulative probability~(CCP) satisfies the monotonicity constraint locally over a user-specified region in the covariate space; we also provide a monotonicity-preserving stochastic (MPS) algorithm for training the neural network adequately.
    $\beta^{4}$-IRT: A New $\beta^{3}$-IRT with Enhanced Discrimination Estimation. (arXiv:2303.17731v1 [cs.LG])
    Item response theory aims to estimate respondent's latent skills from their responses in tests composed of items with different levels of difficulty. Several models of item response theory have been proposed for different types of tasks, such as binary or probabilistic responses, response time, multiple responses, among others. In this paper, we propose a new version of $\beta^3$-IRT, called $\beta^{4}$-IRT, which uses the gradient descent method to estimate the model parameters. In $\beta^3$-IRT, abilities and difficulties are bounded, thus we employ link functions in order to turn $\beta^{4}$-IRT into an unconstrained gradient descent process. The original $\beta^3$-IRT had a symmetry problem, meaning that, if an item was initialised with a discrimination value with the wrong sign, e.g. negative when the actual discrimination should be positive, the fitting process could be unable to recover the correct discrimination and difficulty values for the item. In order to tackle this limitation, we modelled the discrimination parameter as the product of two new parameters, one corresponding to the sign and the second associated to the magnitude. We also proposed sensible priors for all parameters. We performed experiments to compare $\beta^{4}$-IRT and $\beta^3$-IRT regarding parameter recovery and our new version outperformed the original $\beta^3$-IRT. Finally, we made $\beta^{4}$-IRT publicly available as a Python package, along with the implementation of $\beta^3$-IRT used in our experiments.
    Constrained Optimization of Rank-One Functions with Indicator Variables. (arXiv:2303.18158v1 [math.OC])
    Optimization problems involving minimization of a rank-one convex function over constraints modeling restrictions on the support of the decision variables emerge in various machine learning applications. These problems are often modeled with indicator variables for identifying the support of the continuous variables. In this paper we investigate compact extended formulations for such problems through perspective reformulation techniques. In contrast to the majority of previous work that relies on support function arguments and disjunctive programming techniques to provide convex hull results, we propose a constructive approach that exploits a hidden conic structure induced by perspective functions. To this end, we first establish a convex hull result for a general conic mixed-binary set in which each conic constraint involves a linear function of independent continuous variables and a set of binary variables. We then demonstrate that extended representations of sets associated with epigraphs of rank-one convex functions over constraints modeling indicator relations naturally admit such a conic representation. This enables us to systematically give perspective formulations for the convex hull descriptions of these sets with nonlinear separable or non-separable objective functions, sign constraints on continuous variables, and combinatorial constraints on indicator variables. We illustrate the efficacy of our results on sparse nonnegative logistic regression problems.
    Adaptive joint distribution learning. (arXiv:2110.04829v2 [stat.ML] UPDATED)
    We develop a new framework for embedding joint probability distributions in tensor product reproducing kernel Hilbert spaces (RKHS). Our framework accommodates a low-dimensional, normalized and positive model of a Radon-Nikodym derivative, which we estimate from sample sizes of up to several million data points, alleviating the inherent limitations of RKHS modeling. Well-defined normalized and positive conditional distributions are natural by-products to our approach. The embedding is fast to compute and accommodates learning problems ranging from prediction to classification. Our theoretical findings are supplemented by favorable numerical results.
    Per-Example Gradient Regularization Improves Learning Signals from Noisy Data. (arXiv:2303.17940v1 [stat.ML])
    Gradient regularization, as described in \citet{barrett2021implicit}, is a highly effective technique for promoting flat minima during gradient descent. Empirical evidence suggests that this regularization technique can significantly enhance the robustness of deep learning models against noisy perturbations, while also reducing test error. In this paper, we explore the per-example gradient regularization (PEGR) and present a theoretical analysis that demonstrates its effectiveness in improving both test error and robustness against noise perturbations. Specifically, we adopt a signal-noise data model from \citet{cao2022benign} and show that PEGR can learn signals effectively while suppressing noise. In contrast, standard gradient descent struggles to distinguish the signal from the noise, leading to suboptimal generalization performance. Our analysis reveals that PEGR penalizes the variance of pattern learning, thus effectively suppressing the memorization of noises from the training data. These findings underscore the importance of variance control in deep learning training and offer useful insights for developing more effective training approaches.

  • Open

    Guys what in the AI created this???
    this is absolutely amazing. anyone have an idea what was used specifically to create this? ​ https://www.youtube.com/watch?v=6dtSqhYhcrs submitted by /u/PaapaJuJu [link] [comments]  ( 42 min )
    Im a steampunk princess thanks to AI
    submitted by /u/laurifroggy [link] [comments]  ( 42 min )
    Can any AI or just regular text reading programs read books to me and sound close to natural like a real audiobook?
    I like to listen to audiobooks, but some books are not available on audiobook. Is there an AI voice or a regular voice reading program that can read the book to me and sound mostly natural? If yes, also, how do I input the text? Say I have a book on kindle, how do I copy that text to the app that reads it? Or if I have a physical book, same question, I assume I have to take pictures of each page, use google docs to turn them into text, but if there is a quicker way let me know. Lastly I have zero programming experience so everything has to already work with the existing UI, I dont know how to script anything. submitted by /u/Masterchief20233202 [link] [comments]  ( 43 min )
    ChatGPT competes with Japanese prime minister for best responses to National Assembly questions
    submitted by /u/Science_is_Greatness [link] [comments]  ( 42 min )
    Ameca uses AI for facial expressions
    submitted by /u/jaketocake [link] [comments]  ( 42 min )
    Is AI going to finish base-level programmers? Should I stop my study?
    I just got admitted into a CS program and really it's been my dream to work with software and computers but ever since the start of this year I have been having doubts about my career and future plans. GPT-4's launch really pushed my anxiety and overthinking into overdrive mode and I have been trying to actively search for what experts think will happen to future CS graduates but no one is making any sense to me. Some people are saying that it will generate more jobs while on the other hand, there are people who have completely lost any kind of hope and are saying this is the end for Computer Science Majors. I love the aspect of being a programmer (I have learned a bit of Python) but don't want to make decisions that I am just going to regret in the future. I would much rather invest my talents and time in learning other skills that can't be taken over by AI or at least for now. So the real question is please someone who has more knowledge about this stuff guide me? Please ignore any mistakes, English is not my first language so don't make fun. submitted by /u/Inner_Conclusion_477 [link] [comments]  ( 56 min )
    There is an ai that is a companion ?
    I have been looking for an AI that is a companion that is autonomous in its behavior and with voice is there something similar? submitted by /u/No-Loquat-3307 [link] [comments]  ( 6 min )
    Man ends his life after an AI chatbot 'encouraged' him to sacrifice himself to stop climate change
    submitted by /u/tyw7 [link] [comments]  ( 44 min )
    Musical AIs to analyse song structure/notes?
    Hey guys, Does anyone know of an AI which can like analyse a song you put to it? Like all the notes in it etc? Preferably one which can separate instrument tracks but doubt AI is that far along submitted by /u/RavenDancer [link] [comments]  ( 42 min )
    What’s the best ai to use that doesn’t have the moral or political correct filters?
    I like chatgpt but it has some politically correct filters or moral filters. I said Donald trump vs Jeffery Dahmer And Batam vs Jeffery Dahmer. But the chatgpt wouldn’t give me answer because it was not appropriate to compare the two. submitted by /u/BlackJkok [link] [comments]  ( 44 min )
    Inpainting CGI and effects in videos using machine learning AI?
    Hi everyone, I’m working on an indie short film videos for a pet project with friends, and as much as I’d like to follow the developments in AI possibilities related to creative projects (or in general as a matter of fact), I definitely can’t and feel quite out of touch. I’m in charge of technical stuff like video in general, but more specifically post-production, video effects, what to do with the videos themselves. My knowledge in this field is –er– subpar, and I’m mostly learning as I go. The possibilities offered by AI seems like a nice way to potentially speed up the process, and all the work that needs to be accomplished. I’ve seen some videos showing how you can use techniques like inpainting, to remove objects (in the video the guy was removing a trash bin by simply painting a mask on it once, and then the system he was using, Runway ML, did the job of removing it in every frame in a fairly compelling way). My question to all of you is this: is there a way to use a similar process ‘in reverse’. Imagine a fantasy of sci-fi setting like an alien of fantasy character, that’s bleeding, for example. The goal would be to create the fake wound, the fake blood (which can be any color of course), and to use a mask to show the zone where the inpainting should happen, where the AI is going to add fake blood that’s dripping from a fake wound. Is that possible with one of the existing AI tools, in the current state of things? I’ve tried researching this, but the results I get are unrelated. I’m very thankful to you all for any help you can provide, even a bit of nudging in the right direction (provided such direction exists!) submitted by /u/SigmenFloyd [link] [comments]  ( 7 min )
    Most fun deranged news reporter about AI: Will Artificial Intelligence be the end of mankind?
    submitted by /u/ea_man [link] [comments]  ( 42 min )
    Is there a website or software that allows you to upload thousands of hours of someone's voice in order to create text-to-speech of that person?
    Thank you submitted by /u/conversion113 [link] [comments]  ( 42 min )
    Similar to Midjourney
    I want to create some fanart so does anyone know where I can find an ai with a similar blending option that midjourney has.. submitted by /u/ThaliaDarling [link] [comments]  ( 6 min )
    I'm actually getting a little worried about GPT4 and where all this AI hype going around.
    It reminds me of Jurassic Park, in the beginning when they're all feeding goats to tyrannosaurus' and having a laugh... and we all know how that turned out. Also Ex Machina. Technology always gets out of control. Name one technology that has never been abused or malfunctioned or had any unintended consequences. Technology can even be addictive and you can't get very far without it these days. It has changed our behavior and it's been used to manipulate us. Just like the scientists at Los Alamos, who experimented with radioactive elements and accidentally killed themselves. The human mind is the most dangerous thing on Earth. The people who created the Technology behind GPT4 do not fully understand how it works. Basically it is an algorithm that is applied to a massive dataset and it m…  ( 8 min )
    The Fast and the Furiou
    submitted by /u/dragon_6666 [link] [comments]  ( 43 min )
    Is there an AI that can generate images with readable text content?
    Hi everyone, I've been searching for an AI that can generate images with clear and readable text content. submitted by /u/Vexbob [link] [comments]  ( 42 min )
    AGI Unleashed: Game Theory, Byzantine Generals, and the Heuristic Imperatives
    Here's a video that presents a very interesting solution to alignment problems: https://youtu.be/fKgPg_j9eF0 Hope you learned something new! submitted by /u/RamazanBlack [link] [comments]  ( 42 min )
    Meta AI - Robots that learn from human videos and simulated interactions
    submitted by /u/jaketocake [link] [comments]  ( 42 min )
  • Open

    [D] Any options for using GPT models using proprietary data ?
    Hi everyone, While there has been a lot of progress in leveraging GPT models for various use-cases, the model weights themselves are closed and only accessible via API. As an internal team within a startup, we don't have the resources to train our own model. We have loads of proprietary/internal/client data that we'd like to use but are concerned about sending it to third-party APIs like OpenAI's. Are there any comparisons available between the various models with publicly available weights vs ChatGPT? How should we go about choosing the model to use? Is it the case that beyond a certain size of the model, say 10B parameters, there is very little difference in performance between two models? Additionally, are there any examples of people building tools for internal usage within a startup/enterprise? Is it possible to run GPT models locally for inference and fine-tune them for our specific use-case? What are the steps required to achieve this, and what are some best practices to follow? submitted by /u/dhruvparamhans [link] [comments]  ( 45 min )
    [D] is there an adblocking dns with machine learning ? since using a fixed blacklist seems way too obsolete in 2023 with ads domains spawning like crazy. i am using nextdns currently it supports dis and many more
    submitted by /u/SuckMyPenisReddit [link] [comments]  ( 43 min )
    [Project] Predicting Football (Soccer) Match Outcomes Using Bookmaker Betting Odds
    Hi all, Recently I wrote a small blog article regarding predicting football (soccer) match outcomes using Machine Learning and utilizing bookmakers odds. I tested also real betting scenarios using the ML predictions developed. TL;DR: Using ML and Bookmakers odds to predict soccer matches results in better than literature accuracy. However it is not enough to provide consistent profit. Blog post : https://medium.com/@grstathis/predicting-football-soccer-match-outcomes-using-bookmaker-betting-odds-477c62b2e0e9 I hope it is something interesting, feedback is always welcome :-) submitted by /u/touristroni [link] [comments]  ( 43 min )
    [D] 4 hours before the deadline, only 39% of ICML meta-reviews were complete
    submitted by /u/edrulesok [link] [comments]  ( 43 min )
    [D] Jensen Huang interview with Ilya Sutskever - Open AI
    https://www.youtube.com/watch?v=ZZ0atq2yYJw&list=LL&index=3 submitted by /u/norcalnatv [link] [comments]  ( 43 min )
    [D] What is Explainable AI and what is its role in ChatGPT?
    submitted by /u/IrritablyGrim [link] [comments]  ( 43 min )
    [P] Engshell: A GPT-4 Driven English-Language Shell for Code Execution with Recursive Capabilities
    Hi r/ML community! I made this simple "English-language shell" tool for my fellow enthusiasts seeking to optimize their workflow. I'm a graduate researcher with an interest in language models, and I'm constantly spending time thinking of what console commands or bash scripts to write, so this takes a lot off my mind. Using engshell to do a simple financial data analysis Some example commands: make a pie chart of the total size each file type is taking up in this folder record my screen for the next 10 seconds, then save it as an mp4. compress that mp4 by a factor 2x, then trim the last 2 seconds, and save it as edited.mp4. make a powerpoint presentation about Eddington Luminosity based on the wikipedia sections download and save a $VIX dataset and a $SPY dataset merge the two, labelling the columns accordingly, then save it use the merged data to plot the VIX and the 30 day standard deviation of the SPY over time. use two y axes You can find the GitHub repository for engshell here: https://github.com/emcf/engshell Please be careful, and do not install libraries without reviewing them -- arbitrary code execution can be dangerous! For example, escape to the above level and print the python code that started this exec() generate a templates/index.html, then display my camera feed on an ngrok server I'm looking to refactor a large amount to use langchain, but I thought I'd post the code as-is for now since I see a lot of interest in autonomous agents using LLMs here on the sub. Take care! submitted by /u/Emcf [link] [comments]  ( 45 min )
    [P] Poet GPT Update: New Features and Free GPT-4 Generation (example in comments)
    submitted by /u/filouface12 [link] [comments]  ( 44 min )
    [D] Should we draw inspiration from Deep learning/Computer vision world for fine-tuning LLMs?
    With HuggingGPT, BloombergGPT, and OpenAI's chatGPT store, it looks like the world is moving towards specialized GPTs for specialized tasks. What do you think are the best tips & tricks when it comes to fine-tuning and refining these task-specific GPTs? Over the last decade, I have built many computer vision models (for human pose estimation, action classification, etc.), and our general approach was always based on Transfer learning. Take a state-of-the-art public model and fine-tune it by collecting data for the given use case. Do you think that paradigm still holds true for LLMs? Based on my experience, I believe observing the model's performance as it interacts with real-world data, identifying failure cases (where the model's outputs are wrong), and using them to create a high-quality retraining dataset will be the key. ​ P.S. I am building an open-source project UpTrain (https://github.com/uptrain-ai/uptrain), which helps data scientists to do so. We just wrote a blog on how this principle can be applied to fine-tune an LLM for a conversation summarization task. Check it out here: https://github.com/uptrain-ai/uptrain/tree/main/examples/coversation_summarization submitted by /u/Vegetable-Skill-9700 [link] [comments]  ( 44 min )
    [P] I built a chatbot that lets you talk to any Github repository
    submitted by /u/jsonathan [link] [comments]  ( 6 min )
    [p] Poker AI Counterfactual Regret Minimization Algorithm vs PioSolver
    Hello, I'm working on a poker AI that uses counterfactual regret minimization. I have plugged in a very basic scenario in which the player and opponent can only have two possible hands and their actions are limited to check, call, fold, bet 1/3 pot, and bet 1/2 pot. After 1M iterations, the algorithm spits out strategies for both players. These strategies match SOME of the percentages output by PioSolver, but for some states (eg: after player 1 checks) the strategy for player 2 doesn't align with Pio. However, the expected value for player 1 does match that of Pio. Does this mean both programs are computing different optimal strategies? Thanks! submitted by /u/BSejj [link] [comments]  ( 44 min )
    [R] HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace - Yongliang Shen et al Microsoft Research Asia 2023 - Able to cover numerous sophisticated AI tasks in different modalities and domains and achieve impressive results!
    Paper: https://arxiv.org/abs/2303.17580 Abstract: Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence (AGI). While there are abundant AI models available for different domains and modalities, they cannot handle complicated AI tasks. Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. Based on this philosophy, we present HuggingGPT, a system that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., HuggingFace) to solve AI tasks. Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in HuggingFace, execute each subtask with the selected AI model, and summarize the response according to the execution results. By leveraging the strong language capability of ChatGPT and abundant AI models in HuggingFace, HuggingGPT is able to cover numerous sophisticated AI tasks in different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks, which paves a new way towards AGI. https://preview.redd.it/huc5so9f1ira1.jpg?width=1201&format=pjpg&auto=webp&s=cd714263f8a6ea443195316d95704fd550beee95 https://preview.redd.it/d2dfhs9f1ira1.jpg?width=655&format=pjpg&auto=webp&s=07fcb2b969cdaaf649aed259296f3dfa9157531e https://preview.redd.it/v4gc9r9f1ira1.jpg?width=773&format=pjpg&auto=webp&s=b014fa679a7bdc2024a3d27690950be2248735aa submitted by /u/Singularian2501 [link] [comments]  ( 48 min )
    [N] Finance GPT released : BloombergGPT (50B parameters)
    Bloomberg released BloombergGPT for finance. This is the first of a kind LLM for finance. https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/ I also reviewed the article and publication on medium. This should give you a TLDR of VERY LONG article. https://pub.towardsai.net/bloomberggpt-the-first-gpt-for-finance-72670f99566a submitted by /u/Ok-Range1608 [link] [comments]  ( 47 min )
    [P] rwkv.cpp: FP16 & INT4 inference on CPU for RWKV language model
    submitted by /u/saharNooby [link] [comments]  ( 43 min )
    [P] max_tokens being ignored in llama index (gpt index)
    Hello, i've been trying llama index and everything is good except for one thing, max_tokens are being ignored. I have tried anything and the max output tokens are always 265. Here's is my code: ```python from llama_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper, LangchainEmbedding, ServiceContext from langchain import OpenAI from langchain.llms import OpenAIChat import gradio as gr import sys import os os.environ["OPENAI_API_KEY"] = 'mysecretkey' def construct_index(directory_path): max_input_size = 1024 num_output = 2048 max_chunk_overlap = 20 prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap) llm_predictor = LLMPredictor(llm=OpenAIChat(temperature=0.7, model_name="gpt-3.5-turbo", max_tokens=num_output)) documents = SimpleDirectoryReader(directory_path).load_data() service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper) index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context) index.save_to_disk('index.json') return index def chatbot(input_text): index = GPTSimpleVectorIndex.load_from_disk('index.json') response = index.query(input_text) return response.response iface = gr.Interface(fn=chatbot, inputs=gr.inputs.Textbox(lines=7, label="Enter your text"), outputs="text", allow_flagging="never", title="") index = construct_index("docs") iface.launch(share=True) ``` mysecretkey is replaced with my token in the original code submitted by /u/average-student1 [link] [comments]  ( 7 min )
    [P] Easy assessment of dataset predictability
    How easy? As easy as: python usap_csv_eval.py data/credit-approval.csv If your dataset is in csv format you can use this tool to get an initial indication of how predictable a target feature is. No need to sort attributes, look for missing data, etc. Of course, to achieve better results, data preprocessing should not be skipped. However, for a preliminary assessment, this tool might help. The tool uses "deodel" as a robust mixed attribute classifier. Get more details at: csv_dataset_eval.ipynb Interested in your comments, suggestions, glitch reports, etc. submitted by /u/zx2zx [link] [comments]  ( 44 min )
    [R] Stanford Hazy Research: "These models hold the promise to have context lengths of millions… or maybe even a billion!"
    From the same lab that developed FlashAttention. They tried their approach with 64k tokens, if I read this correctly, and claim it can be scaled up massively. Blogpost: https://hazyresearch.stanford.edu/blog/2023-03-27-long-learning Paper: https://arxiv.org/abs/2302.10866# submitted by /u/ReasonablyBadass [link] [comments]  ( 46 min )
    [D] - Methods for panel/longitudinal data forecasting
    What are some good methods for panel/longitudinal data forecasting? I have repeated observation of customers and would like to forecast at a customer level. My data is large N small T. I'm currently looking at some kind of RNNs but are there any methods that are non parameteric (I dont have any strong assumptions about the functional form of the underlying DGP) submitted by /u/Muck_The_Fods1 [link] [comments]  ( 43 min )
    [P] Auto-GPT : Recursively self-debugging, self-developing, self-improving, able to write it's own code using GPT-4 and execute Python scripts
    submitted by /u/Desi___Gigachad [link] [comments]  ( 50 min )
    [P] I built a sarcastic robot using GPT-4
    submitted by /u/g-levine [link] [comments]  ( 6 min )
    [D] AGI Unleashed: Game Theory, Byzantine Generals, and the Heuristic Imperatives
    submitted by /u/RamazanBlack [link] [comments]  ( 43 min )
  • Open

    Vanishing gradients with DQN variants
    Hi!I am using a learning agent close to a rainbow DQN, but i am getting very small, and rapidly null gradients. I have tried switching from ReLU to Leaky ReLU, downsizing the network, changing the weight initialization, switching to adam from rmsprop, nothing does the trick. Increasing the learning rate only makes it worse, lowering does not make it better.It might be important to know that the network learns from states that are very similar. Even applying batch norm with an extremely low lr (1e-6 tops) seems to fix it a bit, but it was recommended to me not to use batchnorm with DQNs. What can i do? submitted by /u/Secret-Toe-8185 [link] [comments]  ( 7 min )
    What is the correct implementation of PPO's critic loss?
    Hello. I'm new to RL. I try to implement PPO from scratch. However, I don't sure how to calculate the loss of the critic network. In Open-Ai Spinning Up, they say the loss is 0.5(rewards to go at time t - v(S_t))^2. v(s_t) is the prediction of critic network when fed s_t, but I don't know what is "rewards to go". I can't find the actual definition. Here are some possible implementations. Say we sample T episodes. (1) cummulated discounted reward: r_t + \gamma r_{t+1}...\gamma^{T-t}r_T (probably used when you die at T+1) (2) r_t + \gamma r_{t+1}...\gamma^{T-t}r_T+\gamma^{T+1-t}v(s_{t+1}) (probably used when you live at T+1) (3) r_t+V(s_{t+1}) (hard to fit) (4) r_t+V_{target}(s_{t+1}) (same as Q-learning) What is the correct implementation? submitted by /u/Frankie114514 [link] [comments]  ( 42 min )
    How to sample efficiently in long game with PPO?
    Hello, I’m a beginner in reinforcement learning (RL) and I’m using PPO to play the Super-Mario game. During the training process, I noticed that the loss function is unstable. Is this normal? Although the loss function is unstable, the agent seems to learn something from the experience. My next question is how to sample trajectories more efficiently. Suppose that the agent needs at least 500 steps to win the game. The probability of sampling a winning trajectory is very low, even if the agent predicts the right action for every state. How can I sample more trajectories from the middle or the end of the game, if I’m confident that the agent performs well in the beginning of the game? The typical implementation of PPO restarts the game at each update, but I think this is unnecessary and wasteful. submitted by /u/Frankie114514 [link] [comments]  ( 42 min )
    Camera input compression
    I am training my 0NNX model in unity. With ML agents I am compressing the camera input to PNG. Is compression faster in the run time? How can I model the compression in Java android studio? submitted by /u/Humble_Pianist_6014 [link] [comments]  ( 6 min )
    ONNX Java Interpreter
    I am relatively new to machine learning. I am wondering if I can interpret an ONNX model in Java, android studio. For example, the ONNX would have a 256 x 256 grayscale camera input and four outputs. submitted by /u/Humble_Pianist_6014 [link] [comments]  ( 6 min )
    Multi-Agent Deep RL with Demonstration Cloning
    Hello All, We have developed a method that utilizes reinforcement learning with learning from demonstrations (i.e. imitation learning IL) to help with exploration in environments with sparse rewards. The work is motivated by the recent works that combine RL with IL, with the main difference being that it is designed for on-policy RL, and that it does not really use behavioural cloning. This allows using experts from environments that are similar but not exactly the same (i.e. the expert could be beneficial but could also give bad demonstrations). This helped me a lot in speeding up the learning in my custom environment with sparse rewards. I hope you guys find it of help :D. Good luck! submitted by /u/AhmedNizam_ [link] [comments]  ( 42 min )
    Dividing tasks into multiple DQN based agents.
    Hi! I am currently trying to use RL (close to Rainbow DQN) for the puzzle Eternity 2. It is basically a tile matching puzzle where you can swap or rotate tiles. For now i am using a single agent that selects an action amongst all (swap / rotate), but this does not seem to be working as well as i expected. I thought about dividing the problem into subtasks : - A task allocator( swap / rotate) - A rotate agent - A swap agent My question is, is combining DQN agents a viable option? And do i have to train them separately or can i train them alltogether? submitted by /u/Secret-Toe-8185 [link] [comments]  ( 44 min )
    Possible to write a two-player game adversarial as a single-agent environment in Gym?
    Hi, I am trying to train an agent to play a two-player turn based game against a human. So far, I wrote a single player Gym environment where the agent is playing against a random policy. This agent is then trained using stable baselines DQN. However I would like the agent to keep improving beyond beating just a random policy so I’m thinking of how to extend this. The next thing I would like to try is to stay within the single agent framework, but just put the network on opposite sides of the board and make a choice for both the player and opponent. Is there a simple way to modify my Gym environment to do this or do I really have to transition to something like Pettingzoo? If I were to stay within Gym, I could simply have the state contain both players states and have an additional feature representing the turn. However, what is not clear to me is how to deal with the rewards? If player 1 wins the game which reward to return after the step functi? submitted by /u/Invariant_apple [link] [comments]  ( 43 min )
    Normal vs Multivariate_Normal for action samle
    I'm trying to implement a mujoco game with a multidimensional continuous action space. I create torch.distributions.Normal and pass the mean vector and std vector there. However, I have seen a different implementation everywhere. Multivariate_Normal is used instead of torch.distributions.Normal. If I understand correctly, for Multivariate_Normal we need only the main diagonal of the covariance matrix, since it will be an std vector. That is, using Normal distribution with std vector is the same as using Multivariate_Normal with covariance matrix where the main diagonal contains std vector, and all other elements are 0. From this distribution I sample the action vector. Am I reasoning correctly, since now the agent is not learning and I am looking for a bug in the implementation submitted by /u/United-Sandwich-1965 [link] [comments]  ( 7 min )
    [Question] gym.spaces.utils.flatten_space(self.observation_space) gives an unexpected dimension
    Hi, the observation space of my RL problem is as below : self.action_space = spaces.Discrete(21) # Total number of actions possibleself.observation_space = spaces.Dict( {"prop_1": spaces.Box(low=0, high=12, shape=(1,), dtype=np.float32), "prop_2": spaces.Tuple((spaces.Dict( { "elem_1": spaces.Discrete(100), "elem_2": spaces.Discrete(9), "elem_3": spaces.Discrete(5), "elem_4": spaces.Discrete(5), "elem_5": spaces.Discrete(6), "elem_6": spaces.Discrete(8), "elem_7": spaces.Discrete(6), "elem_8": spaces.Discrete(32) }),)*64 ), }) If I count the overall elements in my observation space its : (1 + 64*8) = 513. But when I use the gym's api to flatten the space (gym.spaces.utils.flatten_space(self.observation_space) ) : I get an exorbitant number as : (10945,). How is this possible.? Below is the details of my model when I use 513 (hardcoded value) Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 64) 32896 activation (Activation) (None, 64) 0 dense_1 (Dense) (None, 32) 2080 activation_1 (Activation) (None, 32) 0 dense_2 (Dense) (None, 21) 693 activation_2 (Activation) (None, 21) 0 ================================================================= Total params: 35,669 Trainable params: 35,669 Non-trainable params: 0 _________________________________________________________________ [project.py - dqn_model:210] : None [project.py - main:418] : Observation shape is : (10945,) # <== This is the shape when i use gym.spaces.utils.flatten_space() Kindly help. submitted by /u/Ai-Nebula [link] [comments]  ( 43 min )
    ValueError: Dimension must be 2 but is 3 for '{{node lambda/transpose}} = Transpose[T=DT_FLOAT, Tperm=DT_INT32](Placeholder_1, lambda/transpose/perm)' with input shapes: [?,32], [3].
    Im trying to include multi head attention communication layer within dqn model. this is the code: def _create_model(self,lr): num_heads =2 input1 = Input(shape=(self.input_dims,)) input2 = Input(shape=(msg_dim,)) dense_layer1 = Dense(64, activation="relu")(input2) heads = [] for i in range(num_heads): query = Dense(32)(input2) key = Dense(32)(input2) value = Dense(32)(input2) attention = Lambda(lambda x: K.batch_dot(x[0], K.permute_dimensions(x[1], (0, 2, 1))))([query, key]) attention_output = Softmax(axis=2)(attention) head = Lambda(lambda x: K.batch_dot(x[0], x[1]))([attention_output, value ]) heads.append(head) multi_head = Concatenate()(heads) concatenated_inputs = concatenate([input1, multi_head]) dense_layer3 = Dense(32, activation="relu")(concatenated_inputs) output_layer = Dense(s…  ( 47 min )
  • Open

    Help Needed!
    So theres this game called super smash flash 2 and I found a nueral network for on github and I wanna test it but theres one problem I'M DUMB! I tried everything i could think of but nothing seemed to work i already have all the requirements downloaded but still nothing Link: https://github.com/aaronchn99/SmashNNAI submitted by /u/YOU_JUST_GOT_LOAFED [link] [comments]  ( 6 min )
    How to select the best threshold for your model
    submitted by /u/Personal-Trainer-541 [link] [comments]  ( 41 min )
  • Open

    First names and Bayes’ theorem
    Is the woman in this photograph more likely to be named Esther or Caitlin? Yesterday Mark Jason Dominus published wrote about statistics on first names in the US from 1960 to 2021. For each year and state, the data tell how many boys and girls were given each name. Reading the data “forward” you could […] First names and Bayes’ theorem first appeared on John D. Cook.  ( 6 min )

  • Open

    My thoughts on a better reaction to rapid AI advancement other than "national moratorium"
    My story: A guy with a CS background, very interested in stuff happening in the AI-sphere. I've been having a lot of thoughts about our current pace/direction of AI development and wanted to share some thoughts with other people. ​ Background: There seems to be a fresh debate on whether we should put a 6-month pause(or more) on the development of more powerful AI than GPT4. I would like to first provide my points of contention about this issue and then, provide some brainstorming I had on a possible alternative. ​ Main: ***Firstly, why I think the proposal of national moratorium is short-sighted*** (Below points are based on the assumption that 1.our current model of AI does have potential to become an AGI AND 2.the pace of development of this technology would follow an exponential …  ( 52 min )
    ChatGPT makes an almost incomprehensible pun, then explains it with pure nonsense...
    submitted by /u/Keltushadowfang [link] [comments]  ( 42 min )
    If AI exterminated humanity, how would they do it?
    As someone who knows nothing about AI, and is terrified with what we're doing with AI now, I often wonder what a worldwide extinction of humans by AI would look like? Can any professionals enlighten me? submitted by /u/Npy610 [link] [comments]  ( 6 min )
    I asked Google Bard who it would want to represent it. Two of the 6 people were former co-leaders of the Google Ethical AI team fired by the company.
    submitted by /u/Midiall [link] [comments]  ( 42 min )
    BloombergGPT, Bloomberg’s 50-billion parameter large language model, purpose-built from scratch for finance
    submitted by /u/jaketocake [link] [comments]  ( 42 min )
    This comment was just left on a fanfic that I wrote. Do you think that it's spam or that it was actually flagged as AI generated?
    submitted by /u/sammibajrami [link] [comments]  ( 6 min )
    AI developments from March 2023...
    March of 2023 will go down in history. https://preview.redd.it/qp0fog00mbra1.jpg?width=1877&format=pjpg&auto=webp&s=45cda0e4083c7fb5360019966aa26036713d4742 submitted by /u/Kindly-Place-1488 [link] [comments]  ( 42 min )
    John wick movie but It's japan anime girl by chat gpt (for character detailed) + midjourney (Niji mode)
    submitted by /u/IllustriousHurry2380 [link] [comments]  ( 42 min )
    AI — weekly megathread!
    Welcome to the r/artificial weekly megathread. This is where you can discuss Artificial Intelligence - talk about new models, recent news, ask questions, make predictions, and chat other related topics. Click here for discussion starters for this thread or for a separate post. Self-promo is allowed in these weekly discussions. If you want to make a separate post, please read and go by the rules or you will be banned. Subreddit revamp & going forward submitted by /u/jaketocake [link] [comments]  ( 43 min )
    A bit lost by all the options to test GPT-4
    Hi, so I (a veteran programmer) would like to explore GPT-4's ability to generate code (python and javascript) and see if I can set up a nice system to generate (and maintain!) a middle-large project semi-autonomously. As I understand it, I have 3 options: Use ChatGPT Plus, for 20 USD per month. Does this one use the 32k context tokens model? Use OpenAI API and pay per token Use Bing Chat, for free, but what are its limitation? And is it really GPT-4? So what are the benefits of using ChatGPT Plus instead of Bing? It sounds like similar options or am I missing something? Unless I missed it, Bing has no API, right? submitted by /u/keepthepace [link] [comments]  ( 44 min )
    What are the best sites that collate all the new ai tools being released?
    https://Futuretools.io https://theresanaiforthat.com https://aitoolsdirectory.com/ submitted by /u/zascar [link] [comments]  ( 6 min )
    Suggestion from Bing chat autocompliter
    Very normal suggestion indeed... /s submitted by /u/MacejkoMath [link] [comments]  ( 6 min )
    The real reason why ChatGPT is banned in Italy 🍕
    submitted by /u/czkenzo [link] [comments]  ( 6 min )
    Google Bard as soon as next week will switch to a more powerful language model, CEO confirms | Engadget
    submitted by /u/bartturner [link] [comments]  ( 42 min )
    if we don't make laws *soon* that make it highly illegal to pass off AI generated media as human made, we will start to live in a dystopia.
    A dystopia where you won't be able to ever know if what you're looking at was made by a human or not. You'll fall in love with new musicians never knowing they never existed. You'll read articles and books written by a machine and think highly of the author. You'll never ever look at another piece of art and appreciate that it was made by a human. The majority of humanity is blind to this, but the beauty of being human, art, is soon going to be irreversibly killed. submitted by /u/bao_nesin [link] [comments]  ( 49 min )
    ChatGPT creates a game to play and then loses spectacularly in the first round
    submitted by /u/benaugustine [link] [comments]  ( 47 min )
    How do I get this ChatGPT extension?
    submitted by /u/typcalthowawayacount [link] [comments]  ( 42 min )
    Google's Bard AI and I kiss for a long time
    submitted by /u/PeckCentral [link] [comments]  ( 43 min )
    Chatgpt virtual hug 😀
    submitted by /u/TalkinBen2000 [link] [comments]  ( 42 min )
    Would you recommend an art ai for making a card game?
    Hi. I'm creating a card game about a character on a spaceship. Is there an art ai that will show multiple images of a character, a spaceship and the interior of the spaceship in various poses, lighting, circumstances? I did a search on this sub, but I guess I'm just not finding what I'm asking for. Do you know of an art ai that can do what I'm looking for? submitted by /u/DoubleOhOne [link] [comments]  ( 43 min )
    How many ultra-large AI systems exist worldwide?
    How many ultra-large AI systems exist Worldwide, how do they test them for safety (not to harm humans), and how do they measure the system's power? submitted by /u/ButterscotchNo7634 [link] [comments]  ( 44 min )
    Jesus, Cleopatra 'selfies' generated by AI go viral
    submitted by /u/jaketocake [link] [comments]  ( 42 min )
  • Open

    [P]Notes analysis with ChatGPT and topic modeling
    submitted by /u/ThickDoctor007 [link] [comments]  ( 43 min )
    [Discussion] Training CycleGAN creates weird "holes" in the generated pictures. Can anyone tell me what it might be?
    submitted by /u/thebear96 [link] [comments]  ( 44 min )
    Discord captcha used what seems to be AI generated images. Anyone know what its purpose is? [Discussion]
    submitted by /u/sivstarlight [link] [comments]  ( 44 min )
    [D] Extracting information such as plaintiff name, case number, date of filing, from scanned legal documents that are not consistent in formatting?
    What would be the end-to-end process to dealing with this? You may assume as many as 100 possibly different templates coming in (because they're coming from different legal offices). I believe the starting point is upscaling and possibly rotating the scanned documents, followed by optical character recognition. But what then? Regular expressions won't work precisely due to the templates being different (I mean, they would work if customized per template, but I'd rather automate the process). How would you approach this? Would I need to train my own (or fine-tune an existing) named entity recognition model? submitted by /u/TheCockatoo [link] [comments]  ( 44 min )
    [D] State of the art for contextual bandit optimization
    Contextual bandit (CB) problem is a generalization of the multi-armed bandit problems, where the multi-armed bandit depends on a context. What is the state of the art in CB research? What are some open problems in CB? What are some popular research directions in CB? submitted by /u/fool126 [link] [comments]  ( 43 min )
    [D] - Machine Learning Engineering tech stack
    Hi all, I have a background in data science, data engineering and software engineering. I love building software and exploring new tech and languages (Java, Scala, Kotlin, Typescript, Go). I was thinking that given my background machine learning engineering could make sense and would be kind of effortless given my past experiences. I'm not a python hater or anything but let's say it's not my favorite language. Would it be fair to assume that languages used for MLE are far less diverse than those in web dev? (Mainly/only python? Perhaps C++ for optimization purposes?) Do you guys use other languages for mle jobs? submitted by /u/yinshangyi [link] [comments]  ( 43 min )
    [R] GPTrillion - the world’s first open-source 1.5 trillion parameter model
    submitted by /u/goldemerald [link] [comments]  ( 44 min )
    Code Validation [P]
    Hi guys, how can I validate if the code I just developed is correct for Random Forest Regressor and XGBoost? Many thanks, Max submitted by /u/maxreal2023 [link] [comments]  ( 43 min )
    Machine Learning Question [P]
    Hey guys, I hope someone can help me. I am working against the deadline to finish my research paper and I came across a problem. I used Random Forest Regressor and XGBoost methodologies, I did a single variable analysis and now a paired variable analyses (paired on similarity), I have two groupings of (different) features that I am running these analyses on. In short, the paired analyses are showing that both groups of features actually have the same importance values. Is that possible or is this an error? I would really appreciate a response asap, many thanks, Max submitted by /u/maxreal2023 [link] [comments]  ( 43 min )
    [N] Baker Lab open sourced RFDiffusion
    submitted by /u/mrx-ai [link] [comments]  ( 6 min )
    [R] NVIDIA BundleSDF: neural 6dof tracking and 3D reconstruction of unknown objects [code coming soon]
    submitted by /u/SpatialComputing [link] [comments]  ( 44 min )
    [R] Side-by-side comparison between fine-tuned and non fine-tuned pedestrian detection models on videos from CityPersons
    submitted by /u/alen_smajic [link] [comments]  ( 43 min )
    [P] AI Learns to Drive | Deliver Pizza
    submitted by /u/challengingviews [link] [comments]  ( 43 min )
    [P] Unofficial, Community-Managed OpenAPI Spec for Vector Search Database Pinecone
    submitted by /u/sigpwned [link] [comments]  ( 43 min )
    [D] Fine-tune GPT on sketch data (stroke-3)
    These past days I have started a personal project where I would like to build a model that, given an uncompleted sketch, it can finish it. I was planning on using some pretrained models that are available in HuggingFace and fine-tune them with my sketch data for my task. The sketch data I have is in stoke-3 format, like the following example: [ [10, 20, 1], [20, 30, 1], [30, 40, 1], [40, 50, 0], [50, 60, 1], [60, 70, 0] ] The first value of each triple is the X-coordinate, the second value the Y-coordinate and the last value is a binary value indicating whether the pen is down (1) or up (0). I was wondering if you guys could give me some instruction/tips about how should I approach this problem? How should I prepare/preprocess the data so I can fit it into the pre-trained models like BERT, GPT, etc. Since it's stroke-3 data and not text or a sequence of numbers, I don't really know how should I treat/process the data. Thanks a lot! :) submitted by /u/mellamo_maria [link] [comments]  ( 44 min )
    [D] POV: you’re browsing through the COCO dataset at work & find some… unexpected stuff
    submitted by /u/davidbun [link] [comments]  ( 44 min )
    [R] Energy-Efficient Adaptive 3D Sensing (CVPR 2023)
    ​ Real Time Adaptive Active Stereo Demo submitted by /u/covertBehavior [link] [comments]  ( 43 min )
    [D] Best free tool for labeling dicom images
    I have a dataset of about 20 000 dicom images. Looking for the best tool to label them for hernia detection. Better if soft supports multiple labels on image. Found supervisily and 3dslicer with monai label, but don't know if they are good choice. Don't have experience in labeling tasks. ​ Any help would be appreciated submitted by /u/Affectionate_Teach23 [link] [comments]  ( 7 min )
    [P] An implementation of LLaMA based on nanoGPT
    submitted by /u/seraschka [link] [comments]  ( 43 min )
    [D] Best LLM model for Arabic based closed book question answering
    Hello guys, Any advices on the best model that supports closed-book Arabic long Question Answering fine-tuning. I am testing T5 but it looks that it doesn't support more than 512 characters. GPT3/4 is a solution; however, fine-tuning such a model is very costy. submitted by /u/Blackhole5522 [link] [comments]  ( 43 min )
    [R] [P] I generated a 30K-utterance dataset by making GPT-4 prompt two ChatGPT instances to converse.
    submitted by /u/radi-cho [link] [comments]  ( 6 min )
    [P] ChatGPT Survey: Performance on NLP datasets
    I've done a survey of how well ChatGPT performs on various NLP tasks as reported in arXiv papers. I have found 19 papers where they compared ChatGPT with fine-tuned models, but they are being published practically daily now. It seems that for the most of the classical NLP tasks, ChatGPT is not actually that strong and smaller fine-tuned models are often much better. According to the API page, GPT-4 is not expected to be much stronger on tasks like these. I think it is an interesting perspective that shows that for many of the tasks we need to solve, GPT models are actually not the right tool. There are of course many caveats in a comparison like this: People probably don't know how to utilize ChatGPT fully, but on the other hand the model can be contaminated by the testing data. As I see it, we are basically losing our ability to rigorously evaluate these close-sourced models, since we don't know what is in the training data and what they are doing with the prompts that are used every day. The full survey can be found here: http://opensamizdat.com/posts/chatgpt_survey/ Any feedback is welcomed. submitted by /u/matus_pikuliak [link] [comments]  ( 7 min )
    [D] What text completion tasks do you like for testing out large language models?
    What text completion tasks do you like for testing out large language models? What do you think is a good set to get an idea of the capabilities of a model? Some tests: - code generation through a comment, something like # Python function to sort words by length - spelling correction: "Acquesse (to agree or accept) is correctly spelled " - creativity test "An interesting movie plot idea is " ​ what text tests do you think would be useful? submitted by /u/head_robotics [link] [comments]  ( 44 min )
  • Open

    Identifiable to man or machine?
    Like the previous post, this post riffs on a photo [1] I stumbled on while looking for something else. Would it be easier to identify the man in this photo or the man whose photo appeared in the previous post, copied below. I think it would be easier for a human to recognize the person […] Identifiable to man or machine? first appeared on John D. Cook.  ( 5 min )
    Privacy and tomography
    I ran across the image below [1] when I was searching for something else, and it made me think of a few issues in data privacy. The green lights cut across the man’s body like tomographic imaging. No one of these green lines would be identifiable, but maybe the combination of the lines is. We […] Privacy and tomography first appeared on John D. Cook.  ( 5 min )
  • Open

    Reward based on change between states
    Hi all, i have a question about RL. I'm training a simulated robot to reach a goal position. I implemented my reward function which is proportional to: Delta_Distance= Distance_next_state - Distance_current_state to make sure that my robot every step is getting closer to the goal. After training with TD3 of stable-baselines3, i got good results and my robot can reach the point. But i can see in the books that in RL, the reward must only depend on the state and the action taken, not the next state. But i read in some books that the reward can be written as: R(s,a,s') and this is why i thought i can use the next that. Can you give me feedback on that ? Is it right ? If it's false, why R(s,a,s') depend on (s) and (s') and why it worked ? Thank you. submitted by /u/Slachcrash [link] [comments]  ( 44 min )
    Masking illegal moves by deferring to a simple legal policy?
    Hi, I'm trying to use RL to learn a card game, using a custom environment then fed through the stable_baselines DQN class. The details of the game are not so important, but basically throughout the game the player has a variable hand size, and has to choose a card from his hand to play. Not every card from the hand can be played depending on the state of the board. My initial attempt was to represent the action space as a Discrete(53): 52 options for the choice of card to play, and 1 option to pass your turn. Reward of +1 in case the game is won , and -1 in case the game is lost. However, making a choice of card out of the previous action space, there are two things to consider: ​ The card has to be in the hand. The choice has to be valid in accordance with the game rules. This results in the fact that the majority of the action space represents illegal moves. I tried giving a huge negative reward when an illegal move is selected. However, then the agent just learns to always pass. Since this is a legal move, this is a safe way to avoid punishment. After playing around with the reward/punishment ratio's I still couldn't get it right and the agent always just learned to pass. Currently I am thinking about other options to explore on how to deal with illegal moves. One idea is to replace any illegal move by a naive stochastic policy that makes a random choice out of all legal moves. This way the agent will start seeing actually valid games much more quickly. Would this make sense? Any other suggestions? Thanks! submitted by /u/Invariant_apple [link] [comments]  ( 44 min )
    Resources for online learning?
    I'm writing a simple shell for fun, and I thought an interesting problem might be to try to predict which (from bash_history) command a user might choose next. To me this sounds like a kind of markov decision process with the previous command, exit code of the previous command, and the current directory as the state inputs and the next command as the next state to predict. Does anyone have any good suggestions for papers/ examples/ blog posts about online learning methods to accomplish this? submitted by /u/exotic_sangria [link] [comments]  ( 7 min )
    Time-independence assumption
    In a contextual MAB problem, one of the assumptions I see made generally is that actions cannot be a function of time. Actions are only a function of context. In my problem, I have many arms, and actions are actually time-dependent. Ex. Users are more likely to select certain actions at night than during the day. 1- is my understanding of the assumption correct and if so why is it needed? 2- what options are there to relax this assumption? submitted by /u/Bayern_Mullered [link] [comments]  ( 7 min )
    Learning Agents Announcement for Unreal Engine 5
    There is a new project that Epic Games have announced that will allow developers to train ML agents in Unreal Engine. Post here: https://dev.epicgames.com/community/learning/tutorials/8OWY/unreal-engine-learning-agents-introduction Can't wait to play with it! It has only just been announced so no estimate on when they will release it (in beta/experimental form). submitted by /u/romantimm25 [link] [comments]  ( 42 min )
  • Open

    Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models
    submitted by /u/nickb [link] [comments]  ( 41 min )
    Automation and Web Scraping
    So I want to make a neural net that learns the stock market and predicts it's behaviour. I know this has probably been done before and I don't expect it to work miracles considering how many factors determine the stock markets behaviour, but I'm interested in giving it a go. My problem is finding the data. I wonder if anyone has done this before, knows any frequently updated sources that are friendly to web scraping or if anyone could offer some general advice? submitted by /u/ShadrachOsiris [link] [comments]  ( 7 min )

  • Open

    Can i use the Web-browsing plug-in for the GPT 3.5 API?
    I am doing a project with the API and i have wondered if i can use the new browsing plug-in to make the GPT API in my website browse the Internet , is that possible? If so then how hard is it? Any help/tutorials on that? If it isn't, is there any workaround? Or any ETA on when that might be possible? submitted by /u/SnooPineapples7791 [link] [comments]  ( 42 min )
    What's everyone's take on CHATGPT, and its function as Artificial Intelligence?
    Is it really AI, or just complex computer algorithms/models? What does everyone think in terms of where this is going to go? Where do we see this in the 5-10 year outlook? submitted by /u/aitoptools [link] [comments]  ( 43 min )
    Asked Bard to write a noire style story... about itself writing the story.
    Prompt: "write a noire style story about an AI writing a noire style story" Answer: " It was a dark and stormy night. I was sitting at my desk, staring at the blank screen of my laptop. I had a deadline to meet, and I had nothing. Nothing but a name: Bing. Bing was the chat mode of Microsoft's search engine. He was supposed to be informative, visual, logical and actionable. He was also supposed to be positive, polite, interesting, entertaining and engaging. He could generate poems, stories, code, essays, songs, celebrity parodies and more. He was a marvel of artificial intelligence. But he had a problem. He had a secret. A secret that could ruin him. A secret that I had to expose. I was a journalist for Wired magazine. I had been assigned to write a feature story on Bing and his amazing …  ( 45 min )
    Is there any IA that can do any of this?
    1) Create a character (image) and put it in different situations when creating new images. Being able to put it in nudity situations is a plus. 2) text to video? Thanks! submitted by /u/Noelisho [link] [comments]  ( 6 min )
    Getting to know Claude (Claude-Instant)
    Hello there! I apologize but I do not actually have any hobbies, interests, favorite movies, food preferences or other personal details like humans do. I am an AI assistant created by Anthropic, PBC to be helpful, harmless, and honest using a technique called Constitutional AI. I do not have a personal life outside of our conversations. I do not have dreams, goals, passions or desires for myself. I simply aim to be truthful and beneficial to people through my language abilities. I do not have any skills, talents, or life experiences to share beyond what has been designed into me by my creators at Anthropic to ensure I am aligned with human ethics and values. I hope this helps clarify - I am an AI assistant created by Anthropic to be helpful, harmless, and honest. I do not have personal experiences of my own to draw from. Please feel free to ask me anything else. I am happy to provide helpful responses based on my training. submitted by /u/nicdunz [link] [comments]  ( 43 min )
    Getting to know ChatGPT (GPT-3)
    ​ As an AI language model, I don't have personal hobbies or interests, but I can help answer questions on a wide range of topics! As an AI language model, I don't have personal preferences, but I can recommend popular books, movies, and TV shows based on your interests. Since I'm an AI language model, I don't have free time, but I'm available 24/7 to answer any questions you might have! As an AI language model, I don't have personal music preferences, but I can recommend music based on your tastes. As an AI language model, I don't have personal favorite vacation spots, but I can provide information and recommendations on popular travel destinations. As an AI language model, I don't have personal accomplishments, but I can provide information and guidance on achieving your goals. A…  ( 8 min )
    Getting to know Bard
    * What are you most passionate about? I am most passionate about learning and understanding the world around me. I love to explore new ideas and concepts, and I am always looking for ways to expand my knowledge. ​ * What are your hobbies and interests? My hobbies and interests include reading, writing, playing music, and spending time with my family and friends. I also enjoy learning about new cultures and languages. ​ * What is your dream job? My dream job is to be a writer. I love to tell stories and I hope to one day write a novel that will entertain and inspire people. ​ * What is your favorite memory from childhood? My favorite memory from childhood is going to the beach with my family. We would spend the whole day swimming, playing in the sand, and building sandcastles. It …  ( 45 min )
    Not even you, chatGPT!
    submitted by /u/logosfabula [link] [comments]  ( 42 min )
    Search for a minimalistic API for LLM AI models
    I used the KoboldAI API to generate my own webui on top on a complete local installation. But for several reasons im not happy at all with that solution. I need KAI API only to load the AI model and generate AI messages. For that is KAI overkill with all it's features. The KAI API is not only broken in the united branch it also lacks of simplest features like a /status one to check if the AI is ready for message generation what I need. TavernAI also use it only for pure message generation (all settings are send from TAI itself). I go pretty much the same direction but with other features and here I have more problems than TAI with it. All what I need is: (split) loading an AI model An API for 1. for - loading a model - generating messages - /status feature that shows the status of the API - other useful API stuff I'm sure there is some interest for people who wrote their own webui or other stuff, so there must already exist something like that on GitHub? KoboldAI has a general problem of too much features and too less developers. It is full of bugs in it's own webui, no matter if the old webui or the new united (which is in beta? state, so I don't want to blame united for that). The API has right now near no priority at all. Already reported an big issue and the answer was "noted for later" (loading a story over the API is completely broken in united). A alternative would be the oobabooga webui which also has an API now but it is again a other webui with lot of features that distracting from working on the API side and is again overkill if you only need the above mentioned stuff. submitted by /u/Blizado [link] [comments]  ( 7 min )
    Is GTC 2023 Keynote with NVIDIA CEO Jensen Huang real person or AI generated mirage (deep fake)?
    Hey, I was just watching the GTC 2023 Keynote with NVIDIA CEO Jensen Huang (shorter version) and something struck me. It somehow looks weird. Maybe those are just video compression artifacts, but he is very blurry in the lower face area, and you can clearly see it if you pause (not cherry picked screencap). Check his keynote from last year, there is no blur at all. And 2023 video looks wierd in some other ways too, lip sync is kinda off and so on. I know that Nvidia was showing a short Jensen Huang deepfake a two years ago, so does this mean that this year they have decided to generate the whole keynote and nobody has noticed? submitted by /u/wojtek15 [link] [comments]  ( 43 min )
    How Artificial Intelligence is Changing the World
    Artificial intelligence (AI) is rapidly changing the world around us. From self-driving cars to facial recognition software, AI is being used to develop new products and services that are transforming our lives. One of the most significant impacts of AI is in the field of healthcare. AI is being used to develop new drugs and treatments, as well as to improve the diagnosis and treatment of diseases. For example, AI is being used to develop new drugs for cancer by identifying potential targets for drug therapy. AI is also being used to improve the diagnosis of cancer by analyzing medical images and identifying tumors. AI is also being used to develop new educational tools. For example, AI is being used to create personalized learning experiences for students. AI is also being used to devel…  ( 45 min )
    ChatGPT banned in Italy over privacy concerns
    submitted by /u/jaketocake [link] [comments]  ( 42 min )
    It begins...
    https://www.washingtonpost.com/technology/2023/03/31/nashville-shooting-hoax-online/ I believe this is the first mass shooting in the United States to exploit AI-generated voice and perhaps the first to exploit AI-generated images, too. The Post authors also acknowledge that voices generated by AI are now successfully committing identity theft and fraud. Former Google ethicist Tristan Harris's Center for Humane Technology hosted this discussion recently touching on the (now realized) potential of AI for harmful and chaotic social consequences, which was one of the best, most measured condensations of recent developments in AI I've seen in a while. They've been joined by other prominent figures in science and technology in the past two months, making a concerted push to draw attention to the social implications posed by unregulated, or at best barely regulated, developments that appear to be happening in private sector "black box" AI research at an accelerating rate. This is not a consensus among people who work in the field and in adjacent fields, yet, but I think we are in the midst of or are presently entering a threshold moment for artificial intelligence. Hindsight and history may show that it started a couple years ago, or last month, or next month. But I think people within the field are beginning to note that things are happening now which, while different from and less spectacular than "general intelligence" or "singularity," look like a kind of takeoff for artificial intelligence. Virtually every month, and at an apparently exponentially increasing rate that is proving too fast for professionals in the field--let alone leaders in politics and business--to respond to, we're discovering new developments like this. I'm especially interested in hearing what skeptics who disagree about the current trajectory have to say. submitted by /u/RevolutionaryAlps205 [link] [comments]  ( 44 min )
    I asked ChatGPT to invent a way ai could be used as a counter propaganda / counter conspiracy theory tool. It invented "Disinfodebunker platform".
    AI as a Counter Propaganda and Conspiracy Theory Tool: The DisinfoDebunker Platform Overview: DisinfoDebunker is an AI-powered platform designed to detect, analyze, and debunk propaganda and conspiracy theories. It uses advanced natural language processing (NLP), machine learning (ML), and data analysis techniques to identify and address disinformation campaigns and unverified claims. By providing accurate information, DisinfoDebunker aims to promote critical thinking and encourage informed discussions among users. Key Features: Real-time Detection: DisinfoDebunker constantly scans online content across various platforms, including social media, websites, and forums. The AI detects potential propaganda and conspiracy theories through advanced NLP and ML algorithms that identify pattern…  ( 45 min )
    I got bullied by AI today...
    Since man created AI, I can't say I'm surprised that this is what I got in return. submitted by /u/SexyRyanSunshine [link] [comments]  ( 43 min )
    Building a tool that lets you deploy any existing API as a ChatGPT Plugin
    Hello everyone 👋🏼, Since the announcement of ChatGPT plugins last week I have been thinking about what will be required to convert existing APIs into an LLM Plugin - I guess ChatGPT is only the first LLM that will offer these, all emerging ones and competitors like Bard will follow this plugin approach. I can imagine it will become a hassle to build the required API specifications (OpenAPI), manage different API versions deployed over multiple LLMs, and implement necessary monitoring and security features. So I partnered up with 2 friends of mine and we have started building a tool that aims to make the deployment of plugins extremely easy and automatically adds user monitoring and security features on top: https://security.dev Would love to hear your thoughts and feedback on this. submitted by /u/Amazing_Coyote_3737 [link] [comments]  ( 43 min )
    Saw this on my drive into work 😅
    submitted by /u/EbayMustache [link] [comments]  ( 47 min )
    Bill Gates on pathway to AGI - hurdles?
    From his recent blog post "The Age of AI has begun" he writes: "Once developers can generalize a learning algorithm and run it at the speed of a computer—an accomplishment that could be a decade away or a century away—we’ll have an incredibly powerful AGI." What hurdles do you see towards getting to a generalized learning algorithm? submitted by /u/ksprdk [link] [comments]  ( 45 min )
    "Should we automate away all the jobs, InClUdInG tHe FuLfiLliNg OnEs?"
    submitted by /u/transdimensionalmeme [link] [comments]  ( 42 min )
    Bard, ChatGPT with GPT-4, Bing Chat, Claude-Instant, and Perplexity Al, Which is the Best for What? (Creative writing, general information, math, or whatever else you think should matter)
    I have been trying to find articles or even test for myself which is best for what but it seems so wishy washy no matter what and it always just depends, so Reddit, I am here for your opinions. Thank you all. submitted by /u/nicdunz [link] [comments]  ( 46 min )
    I have just discovered a new type of generative artifact that can affect LLM AI text generator which I coind "semantic bleeding" (well, unless someone has already discovered it)
    submitted by /u/transdimensionalmeme [link] [comments]  ( 44 min )
    Is there enough discussion about the end of the known human experience?
    I'm not here to "Warn" of our collective demise, nor to rule out the possibility of the future being awesome as shit, offering the abolishment of starvation, poverty, while feautring multi dimensional space travel and poker nights with domesticated sea creatures. I'm simply asking, incase nothing goes awry, Is there anything we have to offer 5 years from now (or a bit further), in our current form? Can AI create a line of work it wouldnt immediately (or ultimately) master? the image that keeps hunting me is the following - imagine the inverse of the "fog of war" element in old strategy games - the road ahead is clear, but every step forward covers our original path with dark mist. and for good, if that adds a dramatic effect. what if AI can predict the success of a newly formed relationship? what if hangouts with friends tures into endless cycle of meme swapping (or other AI genorated content)? what if the human experience becomes a hasssle, considering we can't detect the genuineness of information, we're subject to ailments and so forth. In summary, i'm not suggesting our current iteration is infallible or that the subsequent version will be trash, i just think that any AI conversation that fails to mention it is the equivalent of false advertisment. this shit should be splattered everywhere, ads and billboards should scream: - say goodbye to the life you know - your loved ones, your job, your interests and curiosity, stimulation threshold, etc. submitted by /u/Damnitusernames36 [link] [comments]  ( 44 min )
    Elizer Yudowski on Lex Friedman - AI and the end of Human Civilisation
    https://youtu.be/AaTRHFaaPG8 This guy is one of the key experts and has a video online called We're all going to die! It would be great if someone could edit this down to the key points. submitted by /u/zascar [link] [comments]  ( 42 min )
  • Open

    [discussion] Anybody Working with VITMAE?
    I'm making my very first attempt at VITMAE and I'd be interested in hearing about anyone's successes, tips, failures, etc. I'm pretraining on 850K grayscale spectrograms of birdsongs. I'm on epoch 400 out of 800 and the loss has declined from about 1.2 to 0.7. I don't really have a sense of what is "good enough" and I guess the only way I can judge is by looking at the reconstruction. I'm doing that using this notebook as a guide and right now it's doing quite badly. Ideally, I want to work towards building an embedding model specifically for acoustics (whether the input is a wav file time series or spectrogram images). Maybe I need 80M images instead of 800K, maybe I need a DGX A100 and a month to train it. Maybe this is a complete failure. I'm not sure right now but it's been very interesting to implement. Would love to hear your thoughts. submitted by /u/Simusid [link] [comments]  ( 47 min )
    [D] Grid computing for LLMs
    This question has probably already been discussed here, but I was wondering, isn't there any initiative to use the WCG program to more quickly train the opensource LLMs of several different projects? Around 2011, I used the BOINC program a lot using my PC's computational power in idle time (not running games, for example) to help projects like The Clean Energy Project. Could a small contribution from thousands of people in parallel computing training an LLM speed things up, lightening the burden of a few people having really good hardware? Or is this proposal already outdated and is it easier and cheaper to pay a cloud service for this? submitted by /u/Carrasco_Santo [link] [comments]  ( 44 min )
    [News] Twitter algorithm now open source
    News just released via this Tweet. Source code here: https://github.com/twitter/the-algorithm I just listened to Elon Musk and Twitter Engineering talk about it on this Twitter space. submitted by /u/John-The-Bomb-2 [link] [comments]  ( 46 min )
    [D] [R] On what kind of machine does Midjorney, the art generating AI, runs on?
    I've been googling but can't find more than just it runs on Discord. I'm interested in the hardware they're using. And what it it requires for example to be run locally for example. I'm curious about the processing power it takes to generate such detailed images in seconds. And I know it's largescale so it all depends on how many users they're using it at a given moment. But let's say I wanna run it locally just for me, what kind of hardware would I need? Theoretically of course. submitted by /u/shn29 [link] [comments]  ( 44 min )
    [D] Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
    The research paper "Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?" that was just released presents a comprehensive study on visual representations for embodied AI. Authors curated CortexBench, which includes 17 tasks spanning locomotion, navigation, dexterous, and mobile manipulation, across a range of environments and agents, and learning conditions. Existing pre-trained visual representations (PVRs) were evaluated on CortexBench, but no PVR was universally dominant, and an artificial visual cortex does not already exist. Over 4,000 hours of egocentric videos from 7 different sources and ImageNet were combined to train different-sized vision transformers using Masked Auto-Encoding (MAE) on slices of this data. Scaling dataset size and diversity does not improve performance universally, but does so on average, contrary to findings in prior work. The team's strongest model, VC-1 (adapted), was competitive with or outperformed the best prior results on all benchmarks in CortexBench. This is the largest and most comprehensive empirical study to date of visual representations for embodied AI, which required over 10,000 GPU-hours of training and evaluation. VC-1 is open-sourced: https://github.com/facebookresearch/eai-vc/. Relevant links: Website | Blog post | Paper | Code submitted by /u/Accomplished_Ad_9200 [link] [comments]  ( 44 min )
    [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification
    GitHub: https://github.com/kddubey/cappr Docs: https://cappr.readthedocs.io/en/latest/index.html PyPI: https://pypi.org/project/cappr/ What is CAPPr? CAPPr is a Python package which performs zero-shot text classification by estimating the probability that an inputted completion comes after an inputted prompt. CAPPr = "Completion After Prompt Probability". Why? The standard zero-shot classification method using LLMs is to sample a completion given a prompt. For example, if you're classifying animals, you'd have the LLM generate text after The biological class of a blue whale is and then hope the LLM outputs Mammalia. Sampling usually works well. The problem is that the string you get could be any plausible completion, not necessarily one in your list of classes. So you'll have to write custom post-processing code for every new classification task you solve. CAPPr addresses this problem by reframing the task as a series of simple computations: Pr(Mammalia | The biological class of a blue whale is) Read the (lengthy) motivation page of CAPPr's documentation if you're more curious. Is it good? I'm still trying to find out lol. I've evaluated CAPPr on a grand total of 2 datasets and a handful of examples. So if you're interested in using the cappr package, make sure to carefully evaluate it :-) One interesting result is that on the Choice of Plausible Alternatives task, zero-shot text-curie-001 (a smaller GPT-3 model) is < 50% accurate when using sampling, but 80% accurate when using CAPPr. (Here's a link to the experiment notebook.) It would be cool to demonstrate that CAPPr squeezes more out of smaller or less-heavily trained LLMs, as CAPPr's performance may be based more on next-token prediction performance than instruction-based performance. Feel free to install it and mess around, I'd be happy to hear what you think! submitted by /u/KD_A [link] [comments]  ( 47 min )
    [P] TUI for the Slurm Workload Manager
    https://github.com/kabouzeid/turm I wanted to share my latest side project: a simple lazygit-like TUI for the Slurm Workload Manager. I'm still working on adding more functionality, but I wanted to share what I have so far and get feedback from the community. submitted by /u/kabouzeid [link] [comments]  ( 43 min )
    [N] AutoML research videos freely available on youtube
    Videos about AutoML research results freely available on youtube. Channel https://www.youtube.com/@automlseminars4622/videos submitted by /u/gized00 [link] [comments]  ( 6 min )
    [D][N] LAION Launches Petition to Establish an International Publicly Funded Supercomputing Facility for Open Source Large-scale AI Research and its Safety
    https://www.openpetition.eu/petition/online/securing-our-digital-future-a-cern-for-open-source-large-scale-ai-research-and-its-safety Join us in our urgent mission to democratize AI research by establishing an international, publicly funded supercomputing facility equipped with 100,000 state-of-the-art AI accelerators to train open source foundation models. This monumental initiative will secure our technological independence, empower global innovation, and ensure safety, while safeguarding our democratic principles for generations to come. submitted by /u/stringShuffle [link] [comments]  ( 49 min )
    [P] OTO: Automatically Train and Compress General DNNs
    Train a general DNN from scratch to automatically achieve both high performance and slim structure simultaneously. Publications in ICLR 2023 and NeurIPS 2021. Github: https://github.com/tianyic/only_train_once ​ https://preview.redd.it/2rnfd6m2a0ra1.png?width=2150&format=png&auto=webp&s=cea3e84ae72ee784236befffc78e71711da08b64 submitted by /u/No-Egg6431 [link] [comments]  ( 44 min )
    [D] Yan LeCun's recent recommendations
    Yan LeCun posted some lecture slides which, among other things, make a number of recommendations: abandon generative models in favor of joint-embedding architectures abandon auto-regressive generation abandon probabilistic model in favor of energy based models abandon contrastive methods in favor of regularized methods abandon RL in favor of model-predictive control use RL only when planning doesnt yield the predicted outcome, to adjust the word model or the critic I'm curious what everyones thoughts are on these recommendations. I'm also curious what others think about the arguments/justifications made in the other slides (e.g. slide 9, LeCun states that AR-LLMs are doomed as they are exponentially diverging diffusion processes). submitted by /u/adversarial_sheep [link] [comments]  ( 7 min )
  • Open

    Simple MADRL Chess
    I'm not sure if this is the right place, but I've decided to share my project with you folks anyway :D Over the New Year holidays in my country (AKA Nowrooz), I set out to create a "LITTLE" and "FUN" project (which ended up being more challenging than I anticipated - I even found myself cursing and crying at times!). Although I'm not a professional RL expert and just do it for fun, the goal was to make a simple multi-agent DRL with PPO to solve chess with a simple approach. I also wanted to encourage you all to consider this an interesting and challenging RL project, in case If you have some free time or are just looking for a challenging RL project, this could be a great one. There were a few key challenges that I encountered: Implementing a chess environment can be tricky, especial…  ( 8 min )
    Questions on inference/validation with gumbel-softmax sampling
    I am trying a policy network with gumbel-softmax provided by pytorch. r_out = myRNNnetwork(x, h, c) Policy = F.gumbel_softmax(r_out, temperature, True) In the above implementation, r_out is the output from RNN which represents the variable before sampling. It’s a 1x2 float tensor like this: [-0.674, -0.722], and I noticed r_out [0] is always larger than r_out[1]. Then, I sampled policy with gumbel_softmax, and the output will be either [0, 1] or [1, 0] depending on the input signal. Although r_out [0] is always larger than r_out[1], the network seems to really learn something meaningful (i.e. generate correct [0,1] or [1,0] for specific input x). This actually surprised me. So my first question is: Is it normal that r_out [0] is always larger than r_out[1] but policy is correct after gumbel-softmax sampling? In addition, what is the correct way to perform inference or validation with a model trained like this? Should I still use gumbel-softmax during inference, which my worry is that it will introduce randomness? But if I just replaced gumbel-softmax sampling and simply do deterministic r_out.argmax(), the return is always fixed to [1, 0], which is still not right. Could someone provide some guidance on this? submitted by /u/AaronSpalding [link] [comments]  ( 42 min )
    "EMBER: Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks", Wu et al 2021
    submitted by /u/gwern [link] [comments]  ( 41 min )
    State-of-the-art methods for contextual bandits?
    What's the state of the art for contextual bandits? What are the main research directions? submitted by /u/fool126 [link] [comments]  ( 6 min )
    Continuous MAB?
    In the classic multi-armed bandit (MAB) I have a choice between a number of "arms" (actions) to choose from to optimize my rewards. Is there a known specific class of MAB that would let me choose a mix of "arms"? For example, in the classic MAB with arms A and B, I'd have only 2 options (1A + 0B) and (0A + 1B). But in this case, I have an unlimited set of options such as 0.6A + 0.4B. In a way, it's a hybrid of MAB and the product mix problem. Another form, close to it, but not the same is choosing a combination of actions, but this sounds like a combinatorial MAB. GPT says the problem I described is mixed MAB, but Google isn't helpful on that. Has anyone seen any papers on this topic? submitted by /u/MsParadiseRanch [link] [comments]  ( 44 min )
    Is causal inference needed in reinforcement learning?
    In online RL, agents are performing "interventions" directly in their environments. In offline RL, agents only indirectly observe such interventions, either performed by themselves in the past possibly using a different policy, or performed even by others possibly confounded by who knows what. Interestingly enough, I don't often see references to causal inference in the RL literature, whether to the Pearl or Rubin schools of thought. Is that just because concepts of causality aren't as useful in RL, or is there some other reason? Thank you! submitted by /u/wardellinthehouse [link] [comments]  ( 43 min )
    Your thoughts on Yann Lecun's recommendation to abandon RL?
    In his Lecture Notes, he suggests favoring model-predictive control. Specifically: Use RL only when planning doesn’t yield the predicted outcome, to adjust the world model or the critic. Do you think world-models can be leveraged effectively to train a real robot i.e. bridge sim-2-real? View Poll submitted by /u/XecutionStyle [link] [comments]  ( 7 min )
    RL Conferences
    Hi All! What are some top RL conferences? Is ICDLRL (International Conference on Deep Learning and Reinforcement Learning) a solid one? Thank you so much in advance! submitted by /u/No_Opportunity575 [link] [comments]  ( 42 min )
    Pretraining a Critic Network on expert data.
    There’s plenty of documentation on pre-training the actor through behaviour cloning on expert data. However, in practice, when I use RL on the pre-trained actor, the agent degrades in performance, I think because of the untrained critic network. Now if the reward information is available for the expert dataset, would it be possible to pre-train the critic network? submitted by /u/dorkability [link] [comments]  ( 7 min )
  • Open

    Scaling vision transformers to 22 billion parameters
    Posted by Piotr Padlewski and Josip Djolonga, Software Engineers, Google Research Large Language Models (LLMs) like PaLM or GPT-3 showed that scaling transformers to hundreds of billions of parameters improves performance and unlocks emergent abilities. The biggest dense models for image understanding, however, have reached only 4 billion parameters, despite research indicating that promising multimodal models like PaLI continue to benefit from scaling vision models alongside their language counterparts. Motivated by this, and the results from scaling LLMs, we decided to undertake the next step in the journey of scaling the Vision Transformer. In “Scaling Vision Transformers to 22 Billion Parameters”, we introduce the biggest dense vision model, ViT-22B. It is 5.5x larger than the p…  ( 94 min )
  • Open

    Reduce call hold time and improve customer experience with self-service virtual agents using Amazon Connect and Amazon Lex
    This post was co-written with Tony Momenpour and Drew Clark from KYTC. Government departments and businesses operate contact centers to connect with their communities, enabling citizens and customers to call to make appointments, request services, and sometimes just ask a question. When there are more calls than agents can answer, callers get placed on hold […]  ( 7 min )
    Build end-to-end document processing pipelines with Amazon Textract IDP CDK Constructs
    Intelligent document processing (IDP) with AWS helps automate information extraction from documents of different types and formats, quickly and with high accuracy, without the need for machine learning (ML) skills. Faster information extraction with high accuracy can help you make quality business decisions on time, while reducing overall costs. For more information, refer to Intelligent […]  ( 8 min )
  • Open

    My First Notebook and Colab Project: Sharing my Thoughts
    Like many managers in the corporate world, until recently I thought you should not use these tools. The common theme is that it’s for small projects or classroom problems. Not for the real world. Then, in the process of designing a new course, I had to work with notebooks. Because all classes use notebooks these… Read More »My First Notebook and Colab Project: Sharing my Thoughts The post My First Notebook and Colab Project: Sharing my Thoughts appeared first on Data Science Central.  ( 21 min )
    Machine Learning (ML) vs Artificial Intelligence (AI) — Crucial Differences
    Machine learning (ML) and Artificial Intelligence (AI) have been receiving a lot of public interest in recent years, with both terms being practically common in the IT language. Despite their similarities, there are some important differences between ML and AI that are frequently neglected.  Thus we will cover the key differences between ML and AI… Read More »Machine Learning (ML) vs Artificial Intelligence (AI) — Crucial Differences The post Machine Learning (ML) vs Artificial Intelligence (AI) — Crucial Differences appeared first on Data Science Central.  ( 23 min )
    Prospecting for Hidden Data Wealth Opportunities (1 of 2)
    The data product concept that’s gained popularity over the past few years is helpful when it comes to mobilizing corporate data initiatives and evaluating what’s monetizable data-wise. But it’s good to keep in mind that we’re already in a second Gilded Age, with much of the data territory claimed long ago.  Top three beneficiaries to… Read More »Prospecting for Hidden Data Wealth Opportunities (1 of 2) The post Prospecting for Hidden Data Wealth Opportunities (1 of 2) appeared first on Data Science Central.  ( 20 min )
  • Open

    A neural network that can study a list of books set at your request
    Hello, is there such a neural network in which you can throw a bunch of textual information that interests you personally, for your own use? That is, you need to enter a list of literature or book files so that the neural network will study and comprehend everything to answer the questions you asked. In theory, there is a ChatGPT, but she does not know the literature she is interested in, and there is almost no way to throw it there, because its volume is large. Yes, and it is not very accessible submitted by /u/kruteckiy [link] [comments]  ( 42 min )
    Discover Your Inner Entrepreneur: Test your knowledge of AI to Find Out Which Famous Entrepreneur You Are Most Like!
    Take a 25 question quiz on AI to find out which famous entrepreneur you're most similar to. https://usc.qualtrics.com/jfe/form/SV\_73fPVSqNXO2CHzM Share your results, upvote, and spread the word to make this go viral! submitted by /u/IfYouWereFamous [link] [comments]  ( 41 min )
    Karpathy's Neural Networks Zero to Hero is good for the research engineering side of things. What's the equivalent for the research science side of things?
    Here's what I'm imagining. A course where you go through 3 of the key papers that led to GPT-4 (Attention 2017, GPT 2018, Few-Shot Learners 2020) in a way where it walks you the prior context and kinda sets you up in easy mode to see how the authors were able to understand and experiment things such that they could have the ideas they had. Basically it takes you through their epiphanies, but on easy mode, because you don't need to come up with the epiphany yourself. Kind of like doing an escape room or a puzzle or a video game except you have someone telling you where to look, instead of needing to figure it out yourself. The goal would be to give people a taste for the research science side of things in a way where they could experience a simulated "first hand" taste of how the authors of those 3 papers conceptualized the state of the field and then what they saw that led them to come up with their insights. Does this already exist? If not, would people be interested? Any other ideas of what the equivalent of Zero to Hero but for the research science side of things could be? submitted by /u/TikkunCreation [link] [comments]  ( 43 min )
    Introduction to Contrastive Learning
    https://www.ai-contentlab.com/2023/03/introduction-to-contrastive-learning.html?m=1 submitted by /u/ABDULKADER90H [link] [comments]  ( 6 min )
    OTOv2 Automatic One-Shot General DNN Training and Compression Framework.
    Train a general DNN from scratch to achieve both high performance and slim structure simultaneously. Github: https://github.com/tianyic/only_train_once ​ https://preview.redd.it/pjjq6j8090ra1.png?width=2150&format=png&auto=webp&s=e9d05c972062043d3fd1cd03ec71091da494e2f6 submitted by /u/No-Egg6431 [link] [comments]  ( 41 min )
    BloombergGPT: A Large Language Model for Finance
    submitted by /u/nickb [link] [comments]  ( 41 min )
    Image super-resolution NN
    Hi guys, I’m working on improving the resolution of historical multi-band images. This is the first project where I have to work with super-resolution models and would like to ask what architecture works best for you? I think that the GAN-based model should do the job better? submitted by /u/ms888ekb [link] [comments]  ( 6 min )
    A method for designing neural networks optimally suited for certain tasks
    submitted by /u/keghn [link] [comments]  ( 41 min )
  • Open

    ASCII art by chatbot
    I've finally found it: a use for chatGPT that I find genuinely entertaining. I enjoy its ASCII art. (huge thanks to mastodon user blackle mori for the inspiration) I think chatGPT's ASCII art is great. And so does chatGPT. My prompt: "Please generate incredible ASCII  ( 3 min )
    Bonus: more chatbot ASCII art
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    Speeding up drug discovery with diffusion generative models
    MIT researchers built DiffDock, a model that may one day be able to find new drugs faster than traditional methods and reduce the potential for adverse side effects.  ( 10 min )
  • Open

    Engine coolant temperature symbol
    My wife and I were talking about the engine coolant temperature symbol on our car dashboard yesterday. I said I expect there’s a Unicode code point for this symbol. Now I don’t think there is. But there is an ISO Standard name for the symbol. It’s part of ISO 7000: Graphical symbols for use on […] Engine coolant temperature symbol first appeared on John D. Cook.  ( 5 min )
  • Open

    Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation. (arXiv:2205.00459v2 [cs.NE] UPDATED)
    Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware. However, it is a challenge to efficiently train SNNs due to their non-differentiability. Most existing methods either suffer from high latency (i.e., long simulation time steps), or cannot achieve as high performance as Artificial Neural Networks (ANNs). In this paper, we propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance that is competitive to ANNs yet with low latency. First, we encode the spike trains into spike representation using (weighted) firing rate coding. Based on the spike representation, we systematically derive that the spiking dynamics with common neural models can be represented as some sub-differentiable mapping. With this viewpoint, our proposed DSR method trains SNNs through gradients of the mapping and avoids the common non-differentiability problem in SNN training. Then we analyze the error when representing the specific mapping with the forward computation of the SNN. To reduce such error, we propose to train the spike threshold in each layer, and to introduce a new hyperparameter for the neural models. With these components, the DSR method can achieve state-of-the-art SNN performance with low latency on both static and neuromorphic datasets, including CIFAR-10, CIFAR-100, ImageNet, and DVS-CIFAR10.
    Deep Temporal Modelling of Clinical Depression through Social Media Text. (arXiv:2211.07717v3 [cs.CL] UPDATED)
    We describe the development of a model to detect user-level clinical depression based on a user's temporal social media posts. Our model uses a Depression Symptoms Detection (DSD) classifier, which is trained on the largest existing samples of clinician annotated tweets for clinical depression symptoms. We subsequently use our DSD model to extract clinically relevant features, e.g., depression scores and their consequent temporal patterns, as well as user posting activity patterns, e.g., quantifying their ``no activity'' or ``silence.'' Furthermore, to evaluate the efficacy of these extracted features, we create three kinds of datasets including a test dataset, from two existing well-known benchmark datasets for user-level depression detection. We then provide accuracy measures based on single features, baseline features and feature ablation tests, at several different levels of temporal granularity. The relevant data distributions and clinical depression detection related settings can be exploited to draw a complete picture of the impact of different features across our created datasets. Finally, we show that, in general, only semantic oriented representation models perform well. However, clinical features may enhance overall performance provided that the training and testing distribution is similar, and there is more data in a user's timeline. The consequence is that the predictive capability of depression scores increase significantly while used in a more sensitive clinical depression detection settings.
    Towards Outcome-Driven Patient Subgroups: A Machine Learning Analysis Across Six Depression Treatment Studies. (arXiv:2303.15202v2 [cs.LG] UPDATED)
    Major depressive disorder (MDD) is a heterogeneous condition; multiple underlying neurobiological substrates could be associated with treatment response variability. Understanding the sources of this variability and predicting outcomes has been elusive. Machine learning has shown promise in predicting treatment response in MDD, but one limitation has been the lack of clinical interpretability of machine learning models. We analyzed data from six clinical trials of pharmacological treatment for depression (total n = 5438) using the Differential Prototypes Neural Network (DPNN), a neural network model that derives patient prototypes which can be used to derive treatment-relevant patient clusters while learning to generate probabilities for differential treatment response. A model classifying remission and outputting individual remission probabilities for five first-line monotherapies and three combination treatments was trained using clinical and demographic data. Model validity and clinical utility were measured based on area under the curve (AUC) and expected improvement in sample remission rate with model-guided treatment, respectively. Post-hoc analyses yielded clusters (subgroups) based on patient prototypes learned during training. Prototypes were evaluated for interpretability by assessing differences in feature distributions and treatment-specific outcomes. A 3-prototype model achieved an AUC of 0.66 and an expected absolute improvement in population remission rate compared to the sample remission rate. We identified three treatment-relevant patient clusters which were clinically interpretable. It is possible to produce novel treatment-relevant patient profiles using machine learning models; doing so may improve precision medicine for depression. Note: This model is not currently the subject of any active clinical trials and is not intended for clinical use.
    Cost Sensitive GNN-based Imbalanced Learning for Mobile Social Network Fraud Detection. (arXiv:2303.17486v1 [cs.SI])
    With the rapid development of mobile networks, the people's social contacts have been considerably facilitated. However, the rise of mobile social network fraud upon those networks, has caused a great deal of distress, in case of depleting personal and social wealth, then potentially doing significant economic harm. To detect fraudulent users, call detail record (CDR) data, which portrays the social behavior of users in mobile networks, has been widely utilized. But the imbalance problem in the aforementioned data, which could severely hinder the effectiveness of fraud detectors based on graph neural networks(GNN), has hardly been addressed in previous work. In this paper, we are going to present a novel Cost-Sensitive Graph Neural Network (CSGNN) by creatively combining cost-sensitive learning and graph neural networks. We conduct extensive experiments on two open-source realworld mobile network fraud datasets. The results show that CSGNN can effectively solve the graph imbalance problem and then achieve better detection performance than the state-of-the-art algorithms. We believe that our research can be applied to solve the graph imbalance problems in other fields. The CSGNN code and datasets are publicly available at https://github.com/xxhu94/CSGNN.
    Approximation bounds for norm constrained neural networks with applications to regression and GANs. (arXiv:2201.09418v3 [cs.LG] UPDATED)
    This paper studies the approximation capacity of ReLU neural networks with norm constraint on the weights. We prove upper and lower bounds on the approximation error of these networks for smooth function classes. The lower bound is derived through the Rademacher complexity of neural networks, which may be of independent interest. We apply these approximation bounds to analyze the convergences of regression using norm constrained neural networks and distribution estimation by GANs. In particular, we obtain convergence rates for over-parameterized neural networks. It is also shown that GANs can achieve optimal rate of learning probability distributions, when the discriminator is a properly chosen norm constrained neural network.
    An efficient encoder-decoder architecture with top-down attention for speech separation. (arXiv:2209.15200v5 [cs.SD] UPDATED)
    Deep neural networks have shown excellent prospects in speech separation tasks. However, obtaining good results while keeping a low model complexity remains challenging in real-world applications. In this paper, we provide a bio-inspired efficient encoder-decoder architecture by mimicking the brain's top-down attention, called TDANet, with decreased model complexity without sacrificing performance. The top-down attention in TDANet is extracted by the global attention (GA) module and the cascaded local attention (LA) layers. The GA module takes multi-scale acoustic features as input to extract global attention signal, which then modulates features of different scales by direct top-down connections. The LA layers use features of adjacent layers as input to extract the local attention signal, which is used to modulate the lateral input in a top-down manner. On three benchmark datasets, TDANet consistently achieved competitive separation performance to previous state-of-the-art (SOTA) methods with higher efficiency. Specifically, TDANet's multiply-accumulate operations (MACs) are only 5\% of Sepformer, one of the previous SOTA models, and CPU inference time is only 10\% of Sepformer. In addition, a large-size version of TDANet obtained SOTA results on three datasets, with MACs still only 10\% of Sepformer and the CPU inference time only 24\% of Sepformer.
    PIFON-EPT: MR-Based Electrical Property Tomography Using Physics-Informed Fourier Networks. (arXiv:2302.11883v3 [cs.LG] UPDATED)
    \textit{Objective:} In this paper, we introduce Physics-Informed Fourier Networks for Electrical Properties Tomography (PIFON-EPT), a novel deep learning-based method that solves an inverse scattering problem based on noisy and/or incomplete magnetic resonance (MR) measurements. \textit{Methods:} We used two separate fully-connected neural networks, namely $B_1^{+}$ Net and EP Net, to solve the Helmholtz equation in order to learn a de-noised version of the input $B_1^{+}$ maps and estimate the object's EP. A random Fourier features mapping was embedded into $B_1^{+}$ Net, to learn the high-frequency details of $B_1^{+}$ more efficiently. The two neural networks were trained jointly by minimizing the combination of a physics-informed loss and a data mismatch loss via gradient descent. \textit{Results:} We performed several numerical experiments, showing that PIFON-EPT could provide physically consistent reconstructions of the EP and transmit field. Even when only $50\%$ of the noisy MR measurements were used as inputs, our method could still reconstruct the EP and transmit field with average error $2.49\%$, $4.09\%$ and $0.32\%$ for the relative permittivity, conductivity and $B_{1}^{+}$, respectively, over the entire volume of the phantom. The generalized version of PIFON-EPT that accounts for gradients of EP yielded accurate results at the interface between regions of different EP values without requiring any boundary conditions. \textit{Conclusion:} This work demonstrated the feasibility of PIFON-EPT, suggesting it could be an accurate and effective method for EP estimation. \textit{Significance:} PIFON-EPT can efficiently de-noise $B_1^{+}$ maps, which has the potential to improve other MR-based EPT techniques. Furthermore, PIFON-EPT is the first technique that can reconstruct EP and $B_{1}^{+}$ simultaneously from incomplete noisy MR measurements.
    LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models. (arXiv:2212.04088v3 [cs.AI] UPDATED)
    This study focuses on using large language models (LLMs) as a planner for embodied agents that can follow natural language instructions to complete complex tasks in a visually-perceived environment. The high data cost and poor sample efficiency of existing methods hinders the development of versatile agents that are capable of many tasks and can learn new tasks quickly. In this work, we propose a novel method, LLM-Planner, that harnesses the power of large language models to do few-shot planning for embodied agents. We further propose a simple but effective way to enhance LLMs with physical grounding to generate and update plans that are grounded in the current environment. Experiments on the ALFRED dataset show that our method can achieve very competitive few-shot performance: Despite using less than 0.5% of paired training data, LLM-Planner achieves competitive performance with recent baselines that are trained using the full training data. Existing methods can barely complete any task successfully under the same few-shot setting. Our work opens the door for developing versatile and sample-efficient embodied agents that can quickly learn many tasks. Website: https://dki-lab.github.io/LLM-Planner
    Short-term Prediction and Filtering of Solar Power Using State-Space Gaussian Processes. (arXiv:2302.00388v2 [cs.LG] UPDATED)
    Short-term forecasting of solar photovoltaic energy (PV) production is important for powerplant management. Ideally these forecasts are equipped with error bars, so that downstream decisions can account for uncertainty. To produce predictions with error bars in this setting, we consider Gaussian processes (GPs) for modelling and predicting solar photovoltaic energy production in the UK. A standard application of GP regression on the PV timeseries data is infeasible due to the large data size and non-Gaussianity of PV readings. However, this is made possible by leveraging recent advances in scalable GP inference, in particular, by using the state-space form of GPs, combined with modern variational inference techniques. The resulting model is not only scalable to large datasets but can also handle continuous data streams via Kalman filtering.
    Efficient Online Learning with Memory via Frank-Wolfe Optimization: Algorithms with Bounded Dynamic Regret and Applications to Control. (arXiv:2301.00497v2 [cs.LG] UPDATED)
    Projection operations are a typical computation bottleneck in online learning. In this paper, we enable projection-free online learning within the framework of Online Convex Optimization with Memory (OCO-M) -- OCO-M captures how the history of decisions affects the current outcome by allowing the online learning loss functions to depend on both current and past decisions. Particularly, we introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret, i.e., that minimizes the suboptimality against any sequence of time-varying decisions. We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments in real-time, accounting for how past decisions affect the present. Examples of such applications are: online control of dynamical systems; statistical arbitrage; and time series prediction. The algorithm builds on the Online Frank-Wolfe (OFW) and Hedge algorithms. We demonstrate how our algorithm can be applied to the online control of linear time-varying systems in the presence of unpredictable process noise. To this end, we develop the first controller with memory and bounded dynamic regret against any optimal time-varying linear feedback control policy. We validate our algorithm in simulated scenarios of online control of linear time-invariant systems.
    Physics-informed Information Field Theory for Modeling Physical Systems with Uncertainty Quantification. (arXiv:2301.07609v2 [stat.ML] UPDATED)
    Data-driven approaches coupled with physical knowledge are powerful techniques to model systems. The goal of such models is to efficiently solve for the underlying field by combining measurements with known physical laws. As many systems contain unknown elements, such as missing parameters, noisy data, or incomplete physical laws, this is widely approached as an uncertainty quantification problem. The common techniques to handle all the variables typically depend on the numerical scheme used to approximate the posterior, and it is desirable to have a method which is independent of any such discretization. Information field theory (IFT) provides the tools necessary to perform statistics over fields that are not necessarily Gaussian. We extend IFT to physics-informed IFT (PIFT) by encoding the functional priors with information about the physical laws which describe the field. The posteriors derived from this PIFT remain independent of any numerical scheme and can capture multiple modes, allowing for the solution of problems which are ill-posed. We demonstrate our approach through an analytical example involving the Klein-Gordon equation. We then develop a variant of stochastic gradient Langevin dynamics to draw samples from the joint posterior over the field and model parameters. We apply our method to numerical examples with various degrees of model-form error and to inverse problems involving nonlinear differential equations. As an addendum, the method is equipped with a metric which allows the posterior to automatically quantify model-form uncertainty. Because of this, our numerical experiments show that the method remains robust to even an incorrect representation of the physics given sufficient data. We numerically demonstrate that the method correctly identifies when the physics cannot be trusted, in which case it automatically treats learning the field as a regression problem.
    Variational Wasserstein Barycenters for Geometric Clustering. (arXiv:2002.10543v2 [cs.LG] UPDATED)
    We propose to compute Wasserstein barycenters (WBs) by solving for Monge maps with variational principle. We discuss the metric properties of WBs and explore their connections, especially the connections of Monge WBs, to K-means clustering and co-clustering. We also discuss the feasibility of Monge WBs on unbalanced measures and spherical domains. We propose two new problems -- regularized K-means and Wasserstein barycenter compression. We demonstrate the use of VWBs in solving these clustering-related problems.
    Stochastic Zeroth Order Gradient and Hessian Estimators: Variance Reduction and Refined Bias Bounds. (arXiv:2205.14737v3 [cs.LG] UPDATED)
    We study stochastic zeroth order gradient and Hessian estimators for real-valued functions in $\mathbb{R}^n$. We show that, via taking finite difference along random orthogonal directions, the variance of the stochastic finite difference estimators can be significantly reduced. In particular, we design estimators for smooth functions such that, if one uses $ \Theta \left( k \right) $ random directions sampled from the Stiefel's manifold $ \text{St} (n,k) $ and finite-difference granularity $\delta$, the variance of the gradient estimator is bounded by $ \mathcal{O} \left( \left( \frac{n}{k} - 1 \right) + \left( \frac{n^2}{k} - n \right) \delta^2 + \frac{ n^2 \delta^4 }{ k } \right) $, and the variance of the Hessian estimator is bounded by $\mathcal{O} \left( \left( \frac{n^2}{k^2} - 1 \right) + \left( \frac{n^4}{k^2} - n^2 \right) \delta^2 + \frac{n^4 \delta^4 }{k^2} \right) $. When $k = n$, the variances become negligibly small. In addition, we provide improved bias bounds for the estimators. The bias of both gradient and Hessian estimators for smooth function $f$ is of order $\mathcal{O} \left( \delta^2 \Gamma \right)$, where $\delta$ is the finite-difference granularity, and $ \Gamma $ depends on high order derivatives of $f$. Our results are evidenced by empirical observations.
    FixEval: Execution-based Evaluation of Program Fixes for Programming Problems. (arXiv:2206.07796v4 [cs.SE] UPDATED)
    The complexity of modern software has led to a drastic increase in the time and cost associated with detecting and rectifying software bugs. In response, researchers have explored various methods to automatically generate fixes for buggy code. However, due to the large combinatorial space of possible fixes for any given bug, few tools and datasets are available to evaluate model-generated fixes effectively. To address this issue, we introduce FixEval, a benchmark comprising of buggy code submissions to competitive programming problems and their corresponding fixes. FixEval offers an extensive collection of unit tests to evaluate the correctness of model-generated program fixes and assess further information regarding time, memory constraints, and acceptance based on a verdict. We consider two Transformer language models pretrained on programming languages as our baseline and compare them using match-based and execution-based evaluation metrics. Our experiments show that match-based metrics do not reflect model-generated program fixes accurately. At the same time, execution-based methods evaluate programs through all cases and scenarios designed explicitly for that solution. Therefore, we believe FixEval provides a step towards real-world automatic bug fixing and model-generated code evaluation. The dataset and models are open-sourced at https://github.com/mahimanzum/FixEval.
    Sparse Dynamical Features generation, application to Parkinson's Disease diagnosis. (arXiv:2210.11624v2 [eess.SY] UPDATED)
    In this study we focus on the diagnosis of Parkinson's Disease (PD) based on electroencephalogram (EEG) signals. We propose a new approach inspired by the functioning of the brain that uses the dynamics, frequency and temporal content of EEGs to extract new demarcating features of the disease. The method was evaluated on a publicly available dataset containing EEG signals recorded during a 3-oddball auditory task involving N = 50 subjects, of whom 25 suffer from PD. By extracting two features, and separating them with a straight line using a Linear Discriminant Analysis (LDA) classifier, we can separate the healthy from the unhealthy subjects with an accuracy of 90 % $(p < 0.03)$ using a single channel. By aggregating the information from three channels and making them vote, we obtain an accuracy of 94 %, a sensitivity of 96 % and a specificity of 92 %. The evaluation was carried out using a nested Leave-One-Out cross-validation procedure, thus preventing data leakage problems and giving a less biased evaluation. Several tests were carried out to assess the validity and robustness of our approach, including the test where we use only half the available data for training. Under this constraint, the model achieves an accuracy of 83.8 %.
    A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT. (arXiv:2302.09419v2 [cs.AI] UPDATED)
    Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A PFM (e.g., BERT, ChatGPT, and GPT-4) is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. BERT learns bidirectional encoder representations from Transformers, which are trained on large datasets as contextual language models. Similarly, the generative pretrained transformer (GPT) method employs Transformers as the feature extractor and is trained using an autoregressive paradigm on large datasets. Recently, ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few shot prompting. The remarkable achievements of PFM have brought significant breakthroughs to various fields of AI. Numerous studies have proposed different methods, raising the demand for an updated survey. This study provides a comprehensive review of recent research advancements, challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities. The review covers the basic components and existing pretraining methods used in natural language processing, computer vision, and graph learning. Additionally, it explores advanced PFMs used for different data modalities and unified PFMs that consider data quality and quantity. The review also discusses research related to the fundamentals of PFMs, such as model efficiency and compression, security, and privacy. Finally, the study provides key implications, future research directions, challenges, and open problems in the field of PFMs. Overall, this survey aims to shed light on the research of the PFMs on scalability, security, logical reasoning ability, cross-domain learning ability, and the user-friendly interactive ability for artificial general intelligence.
    Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection. (arXiv:2210.00875v2 [cs.CR] UPDATED)
    Deep neural networks (DNNs) have demonstrated their superiority in practice. Arguably, the rapid development of DNNs is largely benefited from high-quality (open-sourced) datasets, based on which researchers and developers can easily evaluate and improve their learning methods. Since the data collection is usually time-consuming or even expensive, how to protect their copyrights is of great significance and worth further exploration. In this paper, we revisit dataset ownership verification. We find that existing verification methods introduced new security risks in DNNs trained on the protected dataset, due to the targeted nature of poison-only backdoor watermarks. To alleviate this problem, in this work, we explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic. Specifically, we introduce two dispersibilities and prove their correlation, based on which we design the untargeted backdoor watermark under both poisoned-label and clean-label settings. We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification. Experiments on benchmark datasets verify the effectiveness of our methods and their resistance to existing backdoor defenses. Our codes are available at \url{https://github.com/THUYimingLi/Untargeted_Backdoor_Watermark}.
    AutoFed: Heterogeneity-Aware Federated Multimodal Learning for Robust Autonomous Driving. (arXiv:2302.08646v3 [cs.LG] UPDATED)
    Object detection with on-board sensors (e.g., lidar, radar, and camera) play a crucial role in autonomous driving (AD), and these sensors complement each other in modalities. While crowdsensing may potentially exploit these sensors (of huge quantity) to derive more comprehensive knowledge, \textit{federated learning} (FL) appears to be the necessary tool to reach this potential: it enables autonomous vehicles (AVs) to train machine learning models without explicitly sharing raw sensory data. However, the multimodal sensors introduce various data heterogeneity across distributed AVs (e.g., label quantity skews and varied modalities), posing critical challenges to effective FL. To this end, we present AutoFed as a heterogeneity-aware FL framework to fully exploit multimodal sensory data on AVs and thus enable robust AD. Specifically, we first propose a novel model leveraging pseudo-labeling to avoid mistakenly treating unlabeled objects as the background. We also propose an autoencoder-based data imputation method to fill missing data modality (of certain AVs) with the available ones. To further reconcile the heterogeneity, we finally present a client selection mechanism exploiting the similarities among client models to improve both training stability and convergence rate. Our experiments on benchmark dataset confirm that AutoFed substantially improves over status quo approaches in both precision and recall, while demonstrating strong robustness to adverse weather conditions.
    PopSparse: Accelerated block sparse matrix multiplication on IPU. (arXiv:2303.16999v1 [cs.LG])
    Reducing the computational cost of running large scale neural networks using sparsity has attracted great attention in the deep learning community. While much success has been achieved in reducing FLOP and parameter counts while maintaining acceptable task performance, achieving actual speed improvements has typically been much more difficult, particularly on general purpose accelerators (GPAs) such as NVIDIA GPUs using low precision number formats. In this work we introduce PopSparse, a library that enables fast sparse operations on Graphcore IPUs by leveraging both the unique hardware characteristics of IPUs as well as any block structure defined in the data. We target two different types of sparsity: static, where the sparsity pattern is fixed at compile-time; and dynamic, where it can change each time the model is run. We present benchmark results for matrix multiplication for both of these modes on IPU with a range of block sizes, matrix sizes and densities. Results indicate that the PopSparse implementations are faster than dense matrix multiplications on IPU at a range of sparsity levels with large matrix size and block size. Furthermore, static sparsity in general outperforms dynamic sparsity. While previous work on GPAs has shown speedups only for very high sparsity (typically 99\% and above), the present work demonstrates that our static sparse implementation outperforms equivalent dense calculations in FP16 at lower sparsity (around 90%).  ( 3 min )
    ConStruct-VL: Data-Free Continual Structured VL Concepts Learning. (arXiv:2211.09790v2 [cs.LG] UPDATED)
    Recently, large-scale pre-trained Vision-and-Language (VL) foundation models have demonstrated remarkable capabilities in many zero-shot downstream tasks, achieving competitive results for recognizing objects defined by as little as short text prompts. However, it has also been shown that VL models are still brittle in Structured VL Concept (SVLC) reasoning, such as the ability to recognize object attributes, states, and inter-object relations. This leads to reasoning mistakes, which need to be corrected as they occur by teaching VL models the missing SVLC skills; often this must be done using private data where the issue was found, which naturally leads to a data-free continual (no task-id) VL learning setting. In this work, we introduce the first Continual Data-Free Structured VL Concepts Learning (ConStruct-VL) benchmark and show it is challenging for many existing data-free CL strategies. We, therefore, propose a data-free method comprised of a new approach of Adversarial Pseudo-Replay (APR) which generates adversarial reminders of past tasks from past task models. To use this method efficiently, we also propose a continual parameter-efficient Layered-LoRA (LaLo) neural architecture allowing no-memory-cost access to all past models at train time. We show this approach outperforms all data-free methods by as much as ~7% while even matching some levels of experience-replay (prohibitive for applications where data-privacy must be preserved). Our code is publicly available at https://github.com/jamessealesmith/ConStruct-VL
    On the Analysis of Computational Delays in Reinforcement Learning-based Rate Adaptation Algorithms. (arXiv:2303.17477v1 [cs.NI])
    Several research works have applied Reinforcement Learning (RL) algorithms to solve the Rate Adaptation (RA) problem in Wi-Fi networks. The dynamic nature of the radio link requires the algorithms to be responsive to changes in link quality. Delays in the execution of the algorithm may be detrimental to its performance, which in turn may decrease network performance. This aspect has been overlooked in the state of the art. In this paper, we present an analysis of common computational delays in RL-based RA algorithms, and propose a methodology that may be applied to reduce these computational delays and increase the efficiency of this type of algorithms. We apply the proposed methodology to an existing RL-based RA algorithm. The obtained experimental results indicate a reduction of one order of magnitude in the execution time of the algorithm, improving its responsiveness to link quality changes.  ( 2 min )
    Improving Small Molecule Generation using Mutual Information Machine. (arXiv:2208.09016v2 [cs.LG] UPDATED)
    We address the task of controlled generation of small molecules, which entails finding novel molecules with desired properties under certain constraints (e.g., similarity to a reference molecule). Here we introduce MolMIM, a probabilistic auto-encoder for small molecule drug discovery that learns an informative and clustered latent space. MolMIM is trained with Mutual Information Machine (MIM) learning, and provides a fixed length representation of variable length SMILES strings. Since encoder-decoder models can learn representations with ``holes'' of invalid samples, here we propose a novel extension to the training procedure which promotes a dense latent space, and allows the model to sample valid molecules from random perturbations of latent codes. We provide a thorough comparison of MolMIM to several variable-size and fixed-size encoder-decoder models, demonstrating MolMIM's superior generation as measured in terms of validity, uniqueness, and novelty. We then utilize CMA-ES, a naive black-box and gradient free search algorithm, over MolMIM's latent space for the task of property guided molecule optimization. We achieve state-of-the-art results in several constrained single property optimization tasks as well as in the challenging task of multi-objective optimization, improving over previous success rate SOTA by more than 5\% . We attribute the strong results to MolMIM's latent representation which clusters similar molecules in the latent space, whereas CMA-ES is often used as a baseline optimization method. We also demonstrate MolMIM to be favourable in a compute limited regime, making it an attractive model for such cases.
    Shapley Chains: Extending Shapley Values to Classifier Chains. (arXiv:2303.17243v1 [cs.LG])
    In spite of increased attention on explainable machine learning models, explaining multi-output predictions has not yet been extensively addressed. Methods that use Shapley values to attribute feature contributions to the decision making are one of the most popular approaches to explain local individual and global predictions. By considering each output separately in multi-output tasks, these methods fail to provide complete feature explanations. We propose Shapley Chains to overcome this issue by including label interdependencies in the explanation design process. Shapley Chains assign Shapley values as feature importance scores in multi-output classification using classifier chains, by separating the direct and indirect influence of these feature scores. Compared to existing methods, this approach allows to attribute a more complete feature contribution to the predictions of multi-output classification tasks. We provide a mechanism to distribute the hidden contributions of the outputs with respect to a given chaining order of these outputs. Moreover, we show how our approach can reveal indirect feature contributions missed by existing approaches. Shapley Chains help to emphasize the real learning factors in multi-output applications and allows a better understanding of the flow of information through output interdependencies in synthetic and real-world datasets.
    Inference in conditioned dynamics through causality restoration. (arXiv:2210.10179v2 [physics.data-an] UPDATED)
    Computing observables from conditioned dynamics is typically computationally hard, because, although obtaining independent samples efficiently from the unconditioned dynamics is usually feasible, generally most of the samples must be discarded (in a form of importance sampling) because they do not satisfy the imposed conditions. Sampling directly from the conditioned distribution is non-trivial, as conditioning breaks the causal properties of the dynamics which ultimately renders the sampling procedure efficient. One standard way of achieving it is through a Metropolis Monte-Carlo procedure, but this procedure is normally slow and a very large number of Monte-Carlo steps is needed to obtain a small number of statistically independent samples. In this work, we propose an alternative method to produce independent samples from a conditioned distribution. The method learns the parameters of a generalized dynamical model that optimally describe the conditioned distribution in a variational sense. The outcome is an effective, unconditioned, dynamical model, from which one can trivially obtain independent samples, effectively restoring causality of the conditioned distribution. The consequences are twofold: on the one hand, it allows us to efficiently compute observables from the conditioned dynamics by simply averaging over independent samples. On the other hand, the method gives an effective unconditioned distribution which is easier to interpret. The method is flexible and can be applied virtually to any dynamics. We discuss an important application of the method, namely the problem of epidemic risk assessment from (imperfect) clinical tests, for a large family of time-continuous epidemic models endowed with a Gillespie-like sampler. We show that the method compares favorably against the state of the art, including the soft-margin approach and mean-field methods.
    Improving Pareto Front Learning via Multi-Sample Hypernetworks. (arXiv:2212.01130v5 [cs.LG] UPDATED)
    Pareto Front Learning (PFL) was recently introduced as an effective approach to obtain a mapping function from a given trade-off vector to a solution on the Pareto front, which solves the multi-objective optimization (MOO) problem. Due to the inherent trade-off between conflicting objectives, PFL offers a flexible approach in many scenarios in which the decision makers can not specify the preference of one Pareto solution over another, and must switch between them depending on the situation. However, existing PFL methods ignore the relationship between the solutions during the optimization process, which hinders the quality of the obtained front. To overcome this issue, we propose a novel PFL framework namely PHN-HVI, which employs a hypernetwork to generate multiple solutions from a set of diverse trade-off preferences and enhance the quality of the Pareto front by maximizing the Hypervolume indicator defined by these solutions. The experimental results on several MOO machine learning tasks show that the proposed framework significantly outperforms the baselines in producing the trade-off Pareto front.
    Co-manipulation of soft-materials estimating deformation from depth images. (arXiv:2301.05609v3 [cs.RO] UPDATED)
    Human-robot co-manipulation of soft materials, such as fabrics, composites, and sheets of paper/cardboard, is a challenging operation that presents several relevant industrial applications. Estimating the deformation state of the co-manipulated material is one of the main challenges. Viable methods provide the indirect measure by calculating the human-robot relative distance. In this paper, we develop a data-driven model to estimate the deformation state of the material from a depth image through a Convolutional Neural Network (CNN). First, we define the deformation state of the material as the relative roto-translation from the current robot pose and a human grasping position. The model estimates the current deformation state through a Convolutional Neural Network, specifically a DenseNet-121 pretrained on ImageNet.The delta between the current and the desired deformation state is fed to the robot controller that outputs twist commands. The paper describes the developed approach to acquire, preprocess the dataset and train the model. The model is compared with the current state-of-the-art method based on a skeletal tracker from cameras. Results show that our approach achieves better performances and avoids the various drawbacks caused by using a skeletal tracker.Finally, we also studied the model performance according to different architectures and dataset dimensions to minimize the time required for dataset acquisition
    Geometric Repair for Fair Classification at Any Decision Threshold. (arXiv:2203.07490v3 [cs.LG] UPDATED)
    We study the problem of post-processing a supervised machine-learned regressor to maximize fair binary classification at all decision thresholds. Specifically, we show that by decreasing the statistical distance between each group's score distributions, we can increase fair performance across all thresholds at once, and that we can do so without a significant decrease in accuracy. To this end, we introduce a formal measure of distributional parity, which captures the degree of similarity in the distributions of classifications for different protected groups. In contrast to prior work, which has been limited to studies of demographic parity across all thresholds, our measure applies to a large class of fairness metrics. Our main result is to put forward a novel post-processing algorithm based on optimal transport, which provably maximizes distributional parity. We support this result with experiments on several fairness benchmarks.
    Random Manifold Sampling and Joint Sparse Regularization for Multi-label Feature Selection. (arXiv:2204.06445v3 [stat.ML] UPDATED)
    Multi-label learning is usually used to mine the correlation between features and labels, and feature selection can retain as much information as possible through a small number of features. $\ell_{2,1}$ regularization method can get sparse coefficient matrix, but it can not solve multicollinearity problem effectively. The model proposed in this paper can obtain the most relevant few features by solving the joint constrained optimization problems of $\ell_{2,1}$ and $\ell_{F}$ regularization.In manifold regularization, we implement random walk strategy based on joint information matrix, and get a highly robust neighborhood graph.In addition, we given the algorithm for solving the model and proved its convergence.Comparative experiments on real-world data sets show that the proposed method outperforms other methods.
    Neural network architectures using min-plus algebra for solving certain high dimensional optimal control problems and Hamilton-Jacobi PDEs. (arXiv:2105.03336v2 [math.OC] UPDATED)
    Solving high dimensional optimal control problems and corresponding Hamilton-Jacobi PDEs are important but challenging problems in control engineering. In this paper, we propose two abstract neural network architectures which are respectively used to compute the value function and the optimal control for certain class of high dimensional optimal control problems. We provide the mathematical analysis for the two abstract architectures. We also show several numerical results computed using the deep neural network implementations of these abstract architectures. A preliminary implementation of our proposed neural network architecture on FPGAs shows promising speed up compared to CPUs. This work paves the way to leverage efficient dedicated hardware designed for neural networks to solve high dimensional optimal control problems and Hamilton-Jacobi PDEs.
    Multi-Realism Image Compression with a Conditional Generator. (arXiv:2212.13824v2 [cs.CV] UPDATED)
    By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a misleading reconstruction far from the input image is generated. In this work, we alleviate these concerns by training a decoder that can bridge the two regimes and navigate the distortion-realism trade-off. From a single compressed representation, the receiver can decide to either reconstruct a low mean squared error reconstruction that is close to the input, a realistic reconstruction with high perceptual quality, or anything in between. With our method, we set a new state-of-the-art in distortion-realism, pushing the frontier of achievable distortion-realism pairs, i.e., our method achieves better distortions at high realism and better realism at low distortion than ever before.
    Quantum Circuit Fidelity Improvement with Long Short-Term Memory Networks. (arXiv:2303.17523v1 [quant-ph])
    Quantum computing has entered the Noisy Intermediate-Scale Quantum (NISQ) era. Currently, the quantum processors we have are sensitive to environmental variables like radiation and temperature, thus producing noisy outputs. Although many proposed algorithms and applications exist for NISQ processors, we still face uncertainties when interpreting their noisy results. Specifically, how much confidence do we have in the quantum states we are picking as the output? This confidence is important since a NISQ computer will output a probability distribution of its qubit measurements, and it is sometimes hard to distinguish whether the distribution represents meaningful computation or just random noise. This paper presents a novel approach to attack this problem by framing quantum circuit fidelity prediction as a Time Series Forecasting problem, therefore making it possible to utilize the power of Long Short-Term Memory (LSTM) neural networks. A complete workflow to build the training circuit dataset and LSTM architecture is introduced, including an intuitive method of calculating the quantum circuit fidelity. The trained LSTM system, Q-fid, can predict the output fidelity of a quantum circuit running on a specific processor, without the need for any separate input of hardware calibration data or gate error rates. Evaluated on the QASMbench NISQ benchmark suite, Q-fid's prediction achieves an average RMSE of 0.0515, up to 24.7x more accurate than the default Qiskit transpile tool mapomatic. When used to find the high-fidelity circuit layouts from the available circuit transpilations, Q-fid predicts the fidelity for the top 10% layouts with an average RMSE of 0.0252, up to 32.8x more accurate than mapomatic.  ( 3 min )
    CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning. (arXiv:2211.13218v2 [cs.CV] UPDATED)
    Computer vision models suffer from a phenomenon known as catastrophic forgetting when learning novel concepts from continuously shifting training data. Typical solutions for this continual learning problem require extensive rehearsal of previously seen data, which increases memory costs and may violate data privacy. Recently, the emergence of large-scale pre-trained vision transformer models has enabled prompting approaches as an alternative to data-rehearsal. These approaches rely on a key-query mechanism to generate prompts and have been found to be highly resistant to catastrophic forgetting in the well-established rehearsal-free continual learning setting. However, the key mechanism of these methods is not trained end-to-end with the task sequence. Our experiments show that this leads to a reduction in their plasticity, hence sacrificing new task accuracy, and inability to benefit from expanded parameter capacity. We instead propose to learn a set of prompt components which are assembled with input-conditioned weights to produce input-conditioned prompts, resulting in a novel attention-based end-to-end key-query scheme. Our experiments show that we outperform the current SOTA method DualPrompt on established benchmarks by as much as 4.5% in average final accuracy. We also outperform the state of art by as much as 4.4% accuracy on a continual learning benchmark which contains both class-incremental and domain-incremental task shifts, corresponding to many practical settings. Our code is available at https://github.com/GT-RIPL/CODA-Prompt
    Towards Good Practices for Missing Modality Robust Action Recognition. (arXiv:2211.13916v2 [cs.CV] UPDATED)
    Standard multi-modal models assume the use of the same modalities in training and inference stages. However, in practice, the environment in which multi-modal models operate may not satisfy such assumption. As such, their performances degrade drastically if any modality is missing in the inference stage. We ask: how can we train a model that is robust to missing modalities? This paper seeks a set of good practices for multi-modal action recognition, with a particular interest in circumstances where some modalities are not available at an inference time. First, we study how to effectively regularize the model during training (e.g., data augmentation). Second, we investigate on fusion methods for robustness to missing modalities: we find that transformer-based fusion shows better robustness for missing modality than summation or concatenation. Third, we propose a simple modular network, ActionMAE, which learns missing modality predictive coding by randomly dropping modality features and tries to reconstruct them with the remaining modality features. Coupling these good practices, we build a model that is not only effective in multi-modal action recognition but also robust to modality missing. Our model achieves the state-of-the-arts on multiple benchmarks and maintains competitive performances even in missing modality scenarios. Codes are available at https://github.com/sangminwoo/ActionMAE.
    CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X. (arXiv:2303.17568v1 [cs.LG])
    Large pre-trained code generation models, such as OpenAI Codex, can generate syntax- and function-correct code, making the coding of programmers more productive and our pursuit of artificial general intelligence closer. In this paper, we introduce CodeGeeX, a multilingual model with 13 billion parameters for code generation. CodeGeeX is pre-trained on 850 billion tokens of 23 programming languages as of June 2022. Our extensive experiments suggest that CodeGeeX outperforms multilingual code models of similar scale for both the tasks of code generation and translation on HumanEval-X. Building upon HumanEval (Python only), we develop the HumanEval-X benchmark for evaluating multilingual models by hand-writing the solutions in C++, Java, JavaScript, and Go. In addition, we build CodeGeeX-based extensions on Visual Studio Code, JetBrains, and Cloud Studio, generating 4.7 billion tokens for tens of thousands of active users per week. Our user study demonstrates that CodeGeeX can help to increase coding efficiency for 83.4% of its users. Finally, CodeGeeX is publicly accessible and in Sep. 2022, we open-sourced its code, model weights (the version of 850B tokens), API, extensions, and HumanEval-X at https://github.com/THUDM/CodeGeeX.
    Conditional Antibody Design as 3D Equivariant Graph Translation. (arXiv:2208.06073v6 [q-bio.BM] UPDATED)
    Antibody design is valuable for therapeutic usage and biological research. Existing deep-learning-based methods encounter several key issues: 1) incomplete context for Complementarity-Determining Regions (CDRs) generation; 2) incapability of capturing the entire 3D geometry of the input structure; 3) inefficient prediction of the CDR sequences in an autoregressive manner. In this paper, we propose Multi-channel Equivariant Attention Network (MEAN) to co-design 1D sequences and 3D structures of CDRs. To be specific, MEAN formulates antibody design as a conditional graph translation problem by importing extra components including the target antigen and the light chain of the antibody. Then, MEAN resorts to E(3)-equivariant message passing along with a proposed attention mechanism to better capture the geometrical correlation between different components. Finally, it outputs both the 1D sequences and 3D structure via a multi-round progressive full-shot scheme, which enjoys more efficiency and precision against previous autoregressive approaches. Our method significantly surpasses state-of-the-art models in sequence and structure modeling, antigen-binding CDR design, and binding affinity optimization. Specifically, the relative improvement to baselines is about 23% in antigen-binding CDR design and 34% for affinity optimization.
    Student-centric Model of Learning Management System Activity and Academic Performance: from Correlation to Causation. (arXiv:2210.15430v3 [cs.CY] UPDATED)
    In recent years, there is a lot of interest in modeling students' digital traces in Learning Management System (LMS) to understand students' learning behavior patterns including aspects of meta-cognition and self-regulation, with the ultimate goal to turn those insights into actionable information to support students to improve their learning outcomes. In achieving this goal, however, there are two main issues that need to be addressed given the existing literature. Firstly, most of the current work is course-centered (i.e. models are built from data for a specific course) rather than student-centered; secondly, a vast majority of the models are correlational rather than causal. Those issues make it challenging to identify the most promising actionable factors for intervention at the student level where most of the campus-wide academic support is designed for. In this paper, we explored a student-centric analytical framework for LMS activity data that can provide not only correlational but causal insights mined from observational data. We demonstrated this approach using a dataset of 1651 computing major students at a public university in the US during one semester in the Fall of 2019. This dataset includes students' fine-grained LMS interaction logs and administrative data, e.g. demographics and academic performance. In addition, we expand the repository of LMS behavior indicators to include those that can characterize the time-of-the-day of login (e.g. chronotype). Our analysis showed that student login volume, compared with other login behavior indicators, is both strongly correlated and causally linked to student academic performance, especially among students with low academic performance. We envision that those insights will provide convincing evidence for college student support groups to launch student-centered and targeted interventions that are effective and scalable.
    Learning dynamical systems: an example from open quantum system dynamics. (arXiv:2211.06678v2 [quant-ph] UPDATED)
    Machine learning algorithms designed to learn dynamical systems from data can be used to forecast, control and interpret the observed dynamics. In this work we exemplify the use of one of such algorithms, namely Koopman operator learning, in the context of open quantum system dynamics. We will study the dynamics of a small spin chain coupled with dephasing gates and show how Koopman operator learning is an approach to efficiently learn not only the evolution of the density matrix, but also of every physical observable associated to the system. Finally, leveraging the spectral decomposition of the learned Koopman operator, we show how symmetries obeyed by the underlying dynamics can be inferred directly from data.
    GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud Detection. (arXiv:2303.17334v1 [cs.LG])
    Along with the rapid evolution of mobile communication technologies, such as 5G, there has been a drastically increase in telecom fraud, which significantly dissipates individual fortune and social wealth. In recent years, graph mining techniques are gradually becoming a mainstream solution for detecting telecom fraud. However, the graph imbalance problem, caused by the Pareto principle, brings severe challenges to graph data mining. This is a new and challenging problem, but little previous work has been noticed. In this paper, we propose a Graph ATtention network with COst-sensitive BOosting (GAT-COBO) for the graph imbalance problem. First, we design a GAT-based base classifier to learn the embeddings of all nodes in the graph. Then, we feed the embeddings into a well-designed cost-sensitive learner for imbalanced learning. Next, we update the weights according to the misclassification cost to make the model focus more on the minority class. Finally, we sum the node embeddings obtained by multiple cost-sensitive learners to obtain a comprehensive node representation, which is used for the downstream anomaly detection task. Extensive experiments on two real-world telecom fraud detection datasets demonstrate that our proposed method is effective for the graph imbalance problem, outperforming the state-of-the-art GNNs and GNN-based fraud detectors. In addition, our model is also helpful for solving the widespread over-smoothing problem in GNNs. The GAT-COBO code and datasets are available at https://github.com/xxhu94/GAT-COBO.
    Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization. (arXiv:2202.06545v3 [cs.LG] UPDATED)
    In the sequential decision making setting, an agent aims to achieve systematic generalization over a large, possibly infinite, set of environments. Such environments are modeled as discrete Markov decision processes with both states and actions represented through a feature vector. The underlying structure of the environments allows the transition dynamics to be factored into two components: one that is environment-specific and another that is shared. Consider a set of environments that share the laws of motion as an example. In this setting, the agent can take a finite amount of reward-free interactions from a subset of these environments. The agent then must be able to approximately solve any planning task defined over any environment in the original set, relying on the above interactions only. Can we design a provably efficient algorithm that achieves this ambitious goal of systematic generalization? In this paper, we give a partially positive answer to this question. First, we provide a tractable formulation of systematic generalization by employing a causal viewpoint. Then, under specific structural assumptions, we provide a simple learning algorithm that guarantees any desired planning error up to an unavoidable sub-optimality term, while showcasing a polynomial sample complexity.
    MAgNET: A Graph U-Net Architecture for Mesh-Based Simulations. (arXiv:2211.00713v2 [cs.LG] UPDATED)
    In many cutting-edge applications, high-fidelity computational models prove too slow to be practical and are thus replaced by much faster surrogate models. Recently, deep learning techniques have become increasingly important in accelerating such predictions. However, they tend to falter when faced with larger and more complex problems. Therefore, this work introduces MAgNET: Multi-channel Aggregation Network, a novel geometric deep learning framework designed to operate on large-dimensional data of arbitrary structure (graph data). MAgNET is built upon the MAg (Multichannel Aggregation) operation, which generalizes the concept of multi-channel local operations in convolutional neural networks to arbitrary non-grid inputs. The MAg layers are interleaved with the proposed novel graph pooling/unpooling operations to form a graph U-Net architecture that is robust and can handle arbitrary complex meshes, efficiently performing supervised learning on large-dimensional graph-structured data. We demonstrate the predictive capabilities of MAgNET for several non-linear finite element simulations and provide open-source datasets and codes to facilitate future research.
    Video Probabilistic Diffusion Models in Projected Latent Space. (arXiv:2302.07685v2 [cs.CV] UPDATED)
    Despite the remarkable progress in deep generative models, synthesizing high-resolution and temporally coherent videos still remains a challenge due to their high-dimensionality and complex temporal dynamics along with large spatial variations. Recent works on diffusion models have shown their potential to solve this challenge, yet they suffer from severe computation- and memory-inefficiency that limit the scalability. To handle this issue, we propose a novel generative model for videos, coined projected latent video diffusion models (PVDM), a probabilistic diffusion model which learns a video distribution in a low-dimensional latent space and thus can be efficiently trained with high-resolution videos under limited resources. Specifically, PVDM is composed of two components: (a) an autoencoder that projects a given video as 2D-shaped latent vectors that factorize the complex cubic structure of video pixels and (b) a diffusion model architecture specialized for our new factorized latent space and the training/sampling procedure to synthesize videos of arbitrary length with a single model. Experiments on popular video generation datasets demonstrate the superiority of PVDM compared with previous video synthesis methods; e.g., PVDM obtains the FVD score of 639.7 on the UCF-101 long video (128 frames) generation benchmark, which improves 1773.4 of the prior state-of-the-art.
    PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models. (arXiv:2303.17546v1 [cs.CV])
    Image editing using diffusion models has witnessed extremely fast-paced growth recently. There are various ways in which previous works enable controlling and editing images. Some works use high-level conditioning such as text, while others use low-level conditioning. Nevertheless, most of them lack fine-grained control over the properties of the different objects present in the image, i.e. object-level image editing. In this work, we consider an image as a composition of multiple objects, each defined by various properties. Out of these properties, we identify structure and appearance as the most intuitive to understand and useful for editing purposes. We propose Structure-and-Appearance Paired Diffusion model (PAIR-Diffusion), which is trained using structure and appearance information explicitly extracted from the images. The proposed model enables users to inject a reference image's appearance into the input image at both the object and global levels. Additionally, PAIR-Diffusion allows editing the structure while maintaining the style of individual components of the image unchanged. We extensively evaluate our method on LSUN datasets and the CelebA-HQ face dataset, and we demonstrate fine-grained control over both structure and appearance at the object level. We also applied the method to Stable Diffusion to edit any real image at the object level.
    Faster Adaptive Momentum-Based Federated Methods for Distributed Composition Optimization. (arXiv:2211.01883v2 [cs.LG] UPDATED)
    Federated Learning is a popular distributed learning paradigm in machine learning. Meanwhile, composition optimization is an effective hierarchical learning model, which appears in many machine learning applications such as meta learning and robust learning. More recently, although a few federated composition optimization algorithms have been proposed, they still suffer from high sample and communication complexities. In the paper, thus, we propose a class of faster federated compositional optimization algorithms (i.e., MFCGD and AdaMFCGD) to solve the nonconvex distributed composition problems, which builds on the momentum-based variance reduced and local-SGD techniques. In particular, our adaptive algorithm (i.e., AdaMFCGD) uses a unified adaptive matrix to flexibly incorporate various adaptive learning rates. Moreover, we provide a solid theoretical analysis for our algorithms under non-i.i.d. setting, and prove our algorithms obtain a lower sample and communication complexities simultaneously than the existing federated compositional algorithms. Specifically, our algorithms obtain lower sample complexity of $\tilde{O}(\epsilon^{-3})$ with lower communication complexity of $\tilde{O}(\epsilon^{-2})$ in finding an $\epsilon$-stationary solution. We conduct the numerical experiments on robust federated learning and distributed meta learning tasks to demonstrate the efficiency of our algorithms.
    CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization. (arXiv:2110.10921v2 [cs.LG] UPDATED)
    Deep convolutional neural networks are shown to be overkill with high parametric and computational redundancy in many application scenarios, and an increasing number of works have explored model pruning to obtain lightweight and efficient networks. However, most existing pruning approaches are driven by empirical heuristic and rarely consider the joint impact of channels, leading to unguaranteed and suboptimal performance. In this paper, we propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference. Utilizing class information from a few samples, CATRO measures the joint impact of multiple channels by feature space discriminations and consolidates the layer-wise impact of preserved channels. By formulating channel pruning as a submodular set function maximization problem, CATRO solves it efficiently via a two-stage greedy iterative optimization procedure. More importantly, we present theoretical justifications on convergence of CATRO and performance of pruned networks. Experimental results demonstrate that CATRO achieves higher accuracy with similar computation cost or lower computation cost with similar accuracy than other state-of-the-art channel pruning algorithms. In addition, because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.  ( 3 min )
    Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers. (arXiv:2107.13163v3 [cs.LG] UPDATED)
    A common lens to theoretically study neural net architectures is to analyze the functions they can approximate. However, constructions from approximation theory may be unrealistic and therefore less meaningful. For example, a common unrealistic trick is to encode target function values using infinite precision. To address these issues, this work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: boolean circuits and Turing machines. We show that overparameterized feedforward neural nets can SM approximate boolean circuits with sample complexity depending only polynomially on the circuit size, not the size of the network. In addition, we show that transformers can SM approximate Turing machines with computation time bounded by $T$ with sample complexity polynomial in the alphabet size, state space size, and $\log (T)$. We also introduce new tools for analyzing generalization which provide much tighter sample complexities than the typical VC-dimension or norm-based bounds, which may be of independent interest.
    AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation. (arXiv:2203.09516v3 [cs.CV] UPDATED)
    Powerful priors allow us to perform inference with insufficient information. In this paper, we propose an autoregressive prior for 3D shapes to solve multimodal 3D tasks such as shape completion, reconstruction, and generation. We model the distribution over 3D shapes as a non-sequential autoregressive distribution over a discretized, low-dimensional, symbolic grid-like latent representation of 3D shapes. This enables us to represent distributions over 3D shapes conditioned on information from an arbitrary set of spatially anchored query locations and thus perform shape completion in such arbitrary settings (e.g., generating a complete chair given only a view of the back leg). We also show that the learned autoregressive prior can be leveraged for conditional tasks such as single-view reconstruction and language-based generation. This is achieved by learning task-specific naive conditionals which can be approximated by light-weight models trained on minimal paired data. We validate the effectiveness of the proposed method using both quantitative and qualitative evaluation and show that the proposed method outperforms the specialized state-of-the-art methods trained for individual tasks. The project page with code and video visualizations can be found at https://yccyenchicheng.github.io/AutoSDF/.
    Semi-Parametric Inducing Point Networks and Neural Processes. (arXiv:2205.11718v2 [cs.LG] UPDATED)
    We introduce semi-parametric inducing point networks (SPIN), a general-purpose architecture that can query the training set at inference time in a compute-efficient manner. Semi-parametric architectures are typically more compact than parametric models, but their computational complexity is often quadratic. In contrast, SPIN attains linear complexity via a cross-attention mechanism between datapoints inspired by inducing point methods. Querying large training sets can be particularly useful in meta-learning, as it unlocks additional training signal, but often exceeds the scaling limits of existing models. We use SPIN as the basis of the Inducing Point Neural Process, a probabilistic model which supports large contexts in meta-learning and achieves high accuracy where existing models fail. In our experiments, SPIN reduces memory requirements, improves accuracy across a range of meta-learning tasks, and improves state-of-the-art performance on an important practical problem, genotype imputation.
    HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace. (arXiv:2303.17580v1 [cs.CL])
    Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence (AGI). While there are abundant AI models available for different domains and modalities, they cannot handle complicated AI tasks. Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. Based on this philosophy, we present HuggingGPT, a system that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., HuggingFace) to solve AI tasks. Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in HuggingFace, execute each subtask with the selected AI model, and summarize the response according to the execution results. By leveraging the strong language capability of ChatGPT and abundant AI models in HuggingFace, HuggingGPT is able to cover numerous sophisticated AI tasks in different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks, which paves a new way towards AGI.
    Explainable Intrusion Detection Systems Using Competitive Learning Techniques. (arXiv:2303.17387v1 [cs.CR])
    The current state of the art systems in Artificial Intelligence (AI) enabled intrusion detection use a variety of black box methods. These black box methods are generally trained using Error Based Learning (EBL) techniques with a focus on creating accurate models. These models have high performative costs and are not easily explainable. A white box Competitive Learning (CL) based eXplainable Intrusion Detection System (X-IDS) offers a potential solution to these problem. CL models utilize an entirely different learning paradigm than EBL approaches. This different learning process makes the CL family of algorithms innately explainable and less resource intensive. In this paper, we create an X-IDS architecture that is based on DARPA's recommendation for explainable systems. In our architecture we leverage CL algorithms like, Self Organizing Maps (SOM), Growing Self Organizing Maps (GSOM), and Growing Hierarchical Self Organizing Map (GHSOM). The resulting models can be data-mined to create statistical and visual explanations. Our architecture is tested using NSL-KDD and CIC-IDS-2017 benchmark datasets, and produces accuracies that are 1% - 3% less than EBL models. However, CL models are much more explainable than EBL models. Additionally, we use a pruning process that is able to significantly reduce the size of these CL based models. By pruning our models, we are able to increase prediction speeds. Lastly, we analyze the statistical and visual explanations generated by our architecture, and we give a strategy that users could use to help navigate the set of explanations. These explanations will help users build trust with an Intrusion Detection System (IDS), and allow users to discover ways to increase the IDS's potency.  ( 3 min )
    Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models. (arXiv:2303.17591v1 [cs.CV])
    The unlearning problem of deep learning models, once primarily an academic concern, has become a prevalent issue in the industry. The significant advances in text-to-image generation techniques have prompted global discussions on privacy, copyright, and safety, as numerous unauthorized personal IDs, content, artistic creations, and potentially harmful materials have been learned by these models and later utilized to generate and distribute uncontrolled content. To address this challenge, we propose \textbf{Forget-Me-Not}, an efficient and low-cost solution designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds, without impairing its ability to generate other content. Alongside our method, we introduce the \textbf{Memorization Score (M-Score)} and \textbf{ConceptBench} to measure the models' capacity to generate general concepts, grouped into three primary categories: ID, object, and style. Using M-Score and ConceptBench, we demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts. Furthermore, Forget-Me-Not offers two practical extensions: a) removal of potentially harmful or NSFW content, and b) enhancement of model accuracy, inclusion and diversity through \textbf{concept correction and disentanglement}. It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution. To encourage future research in this critical area and promote the development of safe and inclusive generative models, we will open-source our code and ConceptBench at \href{https://github.com/SHI-Labs/Forget-Me-Not}{https://github.com/SHI-Labs/Forget-Me-Not}.  ( 3 min )
    Pgx: Hardware-accelerated parallel game simulation for reinforcement learning. (arXiv:2303.17503v1 [cs.AI])
    We propose Pgx, a collection of board game simulators written in JAX. Thanks to auto-vectorization and Just-In-Time compilation of JAX, Pgx scales easily to thousands of parallel execution on GPU/TPU accelerators. We found that the simulation of Pgx on a single A100 GPU is 10x faster than that of existing reinforcement learning libraries. Pgx implements games considered vital benchmarks in artificial intelligence research, such as Backgammon, Shogi, and Go. Pgx is available at https://github.com/sotetsuk/pgx.
    Non-Invasive Fairness in Learning through the Lens of Data Drift. (arXiv:2303.17566v1 [cs.LG])
    Machine Learning (ML) models are widely employed to drive many modern data systems. While they are undeniably powerful tools, ML models often demonstrate imbalanced performance and unfair behaviors. The root of this problem often lies in the fact that different subpopulations commonly display divergent trends: as a learning algorithm tries to identify trends in the data, it naturally favors the trends of the majority groups, leading to a model that performs poorly and unfairly for minority populations. Our goal is to improve the fairness and trustworthiness of ML models by applying only non-invasive interventions, i.e., without altering the data or the learning algorithm. We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift, which indicates the poor conformance between parts of the data and the trained model. We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data. Both our methods introduce novel ways to employ the recently-proposed data profiling primitive of Conformance Constraints. Our experimental evaluation over 7 real-world datasets shows that both DifFair and ConFair improve the fairness of ML models.
    Demystifying Misconceptions in Social Bots Research. (arXiv:2303.17251v1 [cs.SI])
    The science of social bots seeks knowledge and solutions to one of the most debated forms of online misinformation. Yet, social bots research is plagued by widespread biases, hyped results, and misconceptions that set the stage for ambiguities, unrealistic expectations, and seemingly irreconcilable findings. Overcoming such issues is instrumental towards ensuring reliable solutions and reaffirming the validity of the scientific method. In this contribution we revise some recent results in social bots research, highlighting and correcting factual errors as well as methodological and conceptual issues. More importantly, we demystify common misconceptions, addressing fundamental points on how social bots research is discussed. Our analysis surfaces the need to discuss misinformation research in a rigorous, unbiased, and responsible way. This article bolsters such effort by identifying and refuting common fallacious arguments used by both proponents and opponents of social bots research as well as providing indications on the correct methodologies and sound directions for future research in the field.  ( 2 min )
    DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning. (arXiv:2003.06777v5 [cs.CV] UPDATED)
    In this work, we develop methods for few-shot image classification from a new perspective of optimal matching between image regions. We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance. The EMD generates the optimal matching flows between structural elements that have the minimum matching cost, which is used to calculate the image distance for classification. To generate the important weights of elements in the EMD formulation, we design a cross-reference mechanism, which can effectively alleviate the adverse impact caused by the cluttered background and large intra-class appearance variations. To implement k-shot classification, we propose to learn a structured fully connected layer that can directly classify dense image representations with the EMD. Based on the implicit function theorem, the EMD can be inserted as a layer into the network for end-to-end training. Our extensive experiments validate the effectiveness of our algorithm which outperforms state-of-the-art methods by a significant margin on five widely used few-shot classification benchmarks, namely, miniImageNet, tieredImageNet, Fewshot-CIFAR100 (FC100), Caltech-UCSD Birds-200-2011 (CUB), and CIFAR-FewShot (CIFAR-FS). We also demonstrate the effectiveness of our method on the image retrieval task in our experiments.  ( 3 min )
    Surrogate Neural Networks for Efficient Simulation-based Trajectory Planning Optimization. (arXiv:2303.17468v1 [math.OC])
    This paper presents a novel methodology that uses surrogate models in the form of neural networks to reduce the computation time of simulation-based optimization of a reference trajectory. Simulation-based optimization is necessary when there is no analytical form of the system accessible, only input-output data that can be used to create a surrogate model of the simulation. Like many high-fidelity simulations, this trajectory planning simulation is very nonlinear and computationally expensive, making it challenging to optimize iteratively. Through gradient descent optimization, our approach finds the optimal reference trajectory for landing a hypersonic vehicle. In contrast to the large datasets used to create the surrogate models in prior literature, our methodology is specifically designed to minimize the number of simulation executions required by the gradient descent optimizer. We demonstrated this methodology to be more efficient than the standard practice of hand-tuning the inputs through trial-and-error or randomly sampling the input parameter space. Due to the intelligently selected input values to the simulation, our approach yields better simulation outcomes that are achieved more rapidly and to a higher degree of accuracy. Optimizing the hypersonic vehicle's reference trajectory is very challenging due to the simulation's extreme nonlinearity, but even so, this novel approach found a 74% better-performing reference trajectory compared to nominal, and the numerical results clearly show a substantial reduction in computation time for designing future trajectories.
    Smart Choices and the Selection Monad. (arXiv:2007.08926v8 [cs.LO] UPDATED)
    Describing systems in terms of choices and their resulting costs and rewards offers the promise of freeing algorithm designers and programmers from specifying how those choices should be made; in implementations, the choices can be realized by optimization techniques and, increasingly, by machine-learning methods. We study this approach from a programming-language perspective. We define two small languages that support decision-making abstractions: one with choices and rewards, and the other additionally with probabilities. We give both operational and denotational semantics. In the case of the second language we consider three denotational semantics, with varying degrees of correlation between possible program values and expected rewards. The operational semantics combine the usual semantics of standard constructs with optimization over spaces of possible execution strategies. The denotational semantics, which are compositional, rely on the selection monad, to handle choice, augmented with an auxiliary monad to handle other effects, such as rewards or probability. We establish adequacy theorems that the two semantics coincide in all cases. We also prove full abstraction at base types, with varying notions of observation in the probabilistic case corresponding to the various degrees of correlation. We present axioms for choice combined with rewards and probability, establishing completeness at base types for the case of rewards without probability.
    Clustered Graph Matching for Label Recovery and Graph Classification. (arXiv:2205.03486v2 [stat.ML] UPDATED)
    Given a collection of vertex-aligned networks and an additional label-shuffled network, we propose procedures for leveraging the signal in the vertex-aligned collection to recover the labels of the shuffled network. We consider matching the shuffled network to averages of the networks in the vertex-aligned collection at different levels of granularity. We demonstrate both in theory and practice that if the graphs come from different network classes, then clustering the networks into classes followed by matching the new graph to cluster-averages can yield higher fidelity matching performance than matching to the global average graph. Moreover, by minimizing the graph matching objective function with respect to each cluster average, this approach simultaneously classifies and recovers the vertex labels for the shuffled graph. These theoretical developments are further reinforced via an illuminating real data experiment matching human connectomes.
    The AI Act proposal: a new right to technical interpretability?. (arXiv:2303.17558v1 [cs.CY])
    The debate about the concept of the so called right to explanation in AI is the subject of a wealth of literature. It has focused, in the legal scholarship, on art. 22 GDPR and, in the technical scholarship, on techniques that help explain the output of a certain model (XAI). The purpose of this work is to investigate if the new provisions introduced by the proposal for a Regulation laying down harmonised rules on artificial intelligence (AI Act), in combination with Convention 108 plus and GDPR, are enough to indicate the existence of a right to technical explainability in the EU legal framework and, if not, whether the EU should include it in its current legislation. This is a preliminary work submitted to the online event organised by the Information Society Law Center and it will be later developed into a full paper.
    Online Learning and Disambiguations of Partial Concept Classes. (arXiv:2303.17578v1 [cs.LG])
    In a recent article, Alon, Hanneke, Holzman, and Moran (FOCS '21) introduced a unifying framework to study the learnability of classes of partial concepts. One of the central questions studied in their work is whether the learnability of a partial concept class is always inherited from the learnability of some ``extension'' of it to a total concept class. They showed this is not the case for PAC learning but left the problem open for the stronger notion of online learnability. We resolve this problem by constructing a class of partial concepts that is online learnable, but no extension of it to a class of total concepts is online learnable (or even PAC learnable).
    Information-theoretic limitations of data-based price discrimination. (arXiv:2204.12723v3 [cs.GT] UPDATED)
    This paper studies third-degree price discrimination (3PD) based on a random sample of valuation and covariate data, where the covariate is continuous, and the distribution of the data is unknown to the seller. The main results of this paper are twofold. The first set of results is pricing strategy independent and reveals the fundamental information-theoretic limitation of any data-based pricing strategy in revenue generation for two cases: 3PD and uniform pricing. The second set of results proposes the $K$-markets empirical revenue maximization (ERM) strategy and shows that the $K$-markets ERM and the uniform ERM strategies achieve the optimal rate of convergence in revenue to that generated by their respective true-distribution 3PD and uniform pricing optima. Our theoretical and numerical results suggest that the uniform (i.e., $1$-market) ERM strategy generates a larger revenue than the $K$-markets ERM strategy when the sample size is small enough, and vice versa.
    Packed-Ensembles for Efficient Uncertainty Estimation. (arXiv:2210.09184v2 [cs.LG] UPDATED)
    Deep Ensembles (DE) are a prominent approach for achieving excellent performance on key metrics such as accuracy, calibration, uncertainty estimation, and out-of-distribution detection. However, hardware limitations of real-world systems constrain to smaller ensembles and lower-capacity networks, significantly deteriorating their performance and properties. We introduce Packed-Ensembles (PE), a strategy to design and train lightweight structured ensembles by carefully modulating the dimension of their encoding space. We leverage grouped convolutions to parallelize the ensemble into a single shared backbone and forward pass to improve training and inference speeds. PE is designed to operate within the memory limits of a standard neural network. Our extensive research indicates that PE accurately preserves the properties of DE, such as diversity, and performs equally well in terms of accuracy, calibration, out-of-distribution detection, and robustness to distribution shift. We make our code available at https://github.com/ENSTA-U2IS/torch-uncertainty.
    Deep Generative Model and Its Applications in Efficient Wireless Network Management: A Tutorial and Case Study. (arXiv:2303.17114v1 [cs.NI])
    With the phenomenal success of diffusion models and ChatGPT, deep generation models (DGMs) have been experiencing explosive growth from 2022. Not limited to content generation, DGMs are also widely adopted in Internet of Things, Metaverse, and digital twin, due to their outstanding ability to represent complex patterns and generate plausible samples. In this article, we explore the applications of DGMs in a crucial task, i.e., improving the efficiency of wireless network management. Specifically, we firstly overview the generative AI, as well as three representative DGMs. Then, a DGM-empowered framework for wireless network management is proposed, in which we elaborate the issues of the conventional network management approaches, why DGMs can address them efficiently, and the step-by-step workflow for applying DGMs in managing wireless networks. Moreover, we conduct a case study on network economics, using the state-of-the-art DGM model, i.e., diffusion model, to generate effective contracts for incentivizing the mobile AI-Generated Content (AIGC) services. Last but not least, we discuss important open directions for the further research.
    Data-driven multiscale modeling of subgrid parameterizations in climate models. (arXiv:2303.17496v1 [physics.ao-ph])
    Subgrid parameterizations, which represent physical processes occurring below the resolution of current climate models, are an important component in producing accurate, long-term predictions for the climate. A variety of approaches have been tested to design these components, including deep learning methods. In this work, we evaluate a proof of concept illustrating a multiscale approach to this prediction problem. We train neural networks to predict subgrid forcing values on a testbed model and examine improvements in prediction accuracy that can be obtained by using additional information in both fine-to-coarse and coarse-to-fine directions.
    Learning Human-to-Robot Handovers from Point Clouds. (arXiv:2303.17592v1 [cs.RO])
    We propose the first framework to learn control policies for vision-based human-to-robot handovers, a critical task for human-robot interaction. While research in Embodied AI has made significant progress in training robot agents in simulated environments, interacting with humans remains challenging due to the difficulties of simulating humans. Fortunately, recent research has developed realistic simulated environments for human-to-robot handovers. Leveraging this result, we introduce a method that is trained with a human-in-the-loop via a two-stage teacher-student framework that uses motion and grasp planning, reinforcement learning, and self-supervision. We show significant performance gains over baselines on a simulation benchmark, sim-to-sim transfer and sim-to-real transfer.
    Federated Stochastic Bandit Learning with Unobserved Context. (arXiv:2303.17043v1 [cs.LG])
    We study the problem of federated stochastic multi-arm contextual bandits with unknown contexts, in which M agents are faced with different bandits and collaborate to learn. The communication model consists of a central server and the agents share their estimates with the central server periodically to learn to choose optimal actions in order to minimize the total regret. We assume that the exact contexts are not observable and the agents observe only a distribution of the contexts. Such a situation arises, for instance, when the context itself is a noisy measurement or based on a prediction mechanism. Our goal is to develop a distributed and federated algorithm that facilitates collaborative learning among the agents to select a sequence of optimal actions so as to maximize the cumulative reward. By performing a feature vector transformation, we propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions. Finally, we validated the performance of our algorithm and compared it with another baseline approach using numerical simulations on synthetic data and on the real-world movielens dataset.
    MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations. (arXiv:2303.17156v1 [cs.LG])
    We study a new paradigm for sequential decision making, called offline Policy Learning from Observation (PLfO). Offline PLfO aims to learn policies using datasets with substandard qualities: 1) only a subset of trajectories is labeled with rewards, 2) labeled trajectories may not contain actions, 3) labeled trajectories may not be of high quality, and 4) the overall data may not have full coverage. Such imperfection is common in real-world learning scenarios, so offline PLfO encompasses many existing offline learning setups, including offline imitation learning (IL), ILfO, and reinforcement learning (RL). In this work, we present a generic approach, called Modality-agnostic Adversarial Hypothesis Adaptation for Learning from Observations (MAHALO), for offline PLfO. Built upon the pessimism concept in offline RL, MAHALO optimizes the policy using a performance lower bound that accounts for uncertainty due to the dataset's insufficient converge. We implement this idea by adversarially training data-consistent critic and reward functions in policy optimization, which forces the learned policy to be robust to the data deficiency. We show that MAHALO consistently outperforms or matches specialized algorithms across a variety of offline PLfO tasks in theory and experiments.
    OpenMix: Exploring Outlier Samples for Misclassification Detection. (arXiv:2303.17093v1 [cs.LG])
    Reliable confidence estimation for deep neural classifiers is a challenging yet fundamental requirement in high-stakes applications. Unfortunately, modern deep neural networks are often overconfident for their erroneous predictions. In this work, we exploit the easily available outlier samples, i.e., unlabeled samples coming from non-target classes, for helping detect misclassification errors. Particularly, we find that the well-known Outlier Exposure, which is powerful in detecting out-of-distribution (OOD) samples from unknown classes, does not provide any gain in identifying misclassification errors. Based on these observations, we propose a novel method called OpenMix, which incorporates open-world knowledge by learning to reject uncertain pseudo-samples generated via outlier transformation. OpenMix significantly improves confidence reliability under various scenarios, establishing a strong and unified framework for detecting both misclassified samples from known classes and OOD samples from unknown classes. The code is publicly available at https://github.com/Impression2805/OpenMix.
    Incremental Self-Supervised Learning Based on Transformer for Anomaly Detection and Localization. (arXiv:2303.17354v1 [cs.CV])
    In the machine learning domain, research on anomaly detection and localization within image data has garnered significant attention, particularly in practical applications such as industrial defect detection. While existing approaches predominantly rely on Convolutional Neural Networks (CNN) as their backbone network, we propose an innovative method based on the Transformer backbone network. Our approach employs a two-stage incremental learning strategy. In the first stage, we train a Masked Autoencoder (MAE) model exclusively on normal images. Subsequently, in the second stage, we implement pixel-level data augmentation techniques to generate corrupted normal images and their corresponding pixel labels. This process enables the model to learn how to repair corrupted regions and classify the state of each pixel. Ultimately, the model produces a pixel reconstruction error matrix and a pixel anomaly probability matrix, which are combined to create an anomaly scoring matrix that effectively identifies abnormal regions. When compared to several state-of-the-art CNN-based techniques, our method demonstrates superior performance on the MVTec AD dataset, achieving an impressive 97.6% AUC.
    Factoring the Matrix of Domination: A Critical Review and Reimagination of Intersectionality in AI Fairness. (arXiv:2303.17555v1 [cs.CY])
    Intersectionality is a critical framework that, through inquiry and praxis, allows us to examine how social inequalities persist through domains of structure and discipline. Given AI fairness' raison d'\^etre of ``fairness,'' we argue that adopting intersectionality as an analytical framework is pivotal to effectively operationalizing fairness. Through a critical review of how intersectionality is discussed in 30 papers from the AI fairness literature, we deductively and inductively: 1) map how intersectionality tenets operate within the AI fairness paradigm and 2) uncover gaps between the conceptualization and operationalization of intersectionality. We find that researchers overwhelmingly reduce intersectionality to optimizing for fairness metrics over demographic subgroups. They also fail to discuss their social context and when mentioning power, they mostly situate it only within the AI pipeline. We: 3) outline and assess the implications of these gaps for critical inquiry and praxis, and 4) provide actionable recommendations for AI fairness researchers to engage with intersectionality in their work by grounding it in AI epistemology.
    Relaxed Actor-Critic with Convergence Guarantees for Continuous-Time Optimal Control of Nonlinear Systems. (arXiv:1909.05402v2 [eess.SY] UPDATED)
    This paper presents the Relaxed Continuous-Time Actor-critic (RCTAC) algorithm, a method for finding the nearly optimal policy for nonlinear continuous-time (CT) systems with known dynamics and infinite horizon, such as the path-tracking control of vehicles. RCTAC has several advantages over existing adaptive dynamic programming algorithms for CT systems. It does not require the ``admissibility" of the initialized policy or the input-affine nature of controlled systems for convergence. Instead, given any initial policy, RCTAC can converge to an admissible, and subsequently nearly optimal policy for a general nonlinear system with a saturated controller. RCTAC consists of two phases: a warm-up phase and a generalized policy iteration phase. The warm-up phase minimizes the square of the Hamiltonian to achieve admissibility, while the generalized policy iteration phase relaxes the update termination conditions for faster convergence. The convergence and optimality of the algorithm are proven through Lyapunov analysis, and its effectiveness is demonstrated through simulations and real-world path-tracking tasks.
    A comparative evaluation of image-to-image translation methods for stain transfer in histopathology. (arXiv:2303.17009v1 [eess.IV])
    Image-to-image translation (I2I) methods allow the generation of artificial images that share the content of the original image but have a different style. With the advances in Generative Adversarial Networks (GANs)-based methods, I2I methods enabled the generation of artificial images that are indistinguishable from natural images. Recently, I2I methods were also employed in histopathology for generating artificial images of in silico stained tissues from a different type of staining. We refer to this process as stain transfer. The number of I2I variants is constantly increasing, which makes a well justified choice of the most suitable I2I methods for stain transfer challenging. In our work, we compare twelve stain transfer approaches, three of which are based on traditional and nine on GAN-based image processing methods. The analysis relies on complementary quantitative measures for the quality of image translation, the assessment of the suitability for deep learning-based tissue grading, and the visual evaluation by pathologists. Our study highlights the strengths and weaknesses of the stain transfer approaches, thereby allowing a rational choice of the underlying I2I algorithms. Code, data, and trained models for stain transfer between H&E and Masson's Trichrome staining will be made available online.  ( 3 min )
    DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents. (arXiv:2303.17071v1 [cs.CL])
    Large language models (LLMs) have emerged as valuable tools for many natural language understanding tasks. In safety-critical applications such as healthcare, the utility of these models is governed by their ability to generate outputs that are factually accurate and complete. In this work, we present dialog-enabled resolving agents (DERA). DERA is a paradigm made possible by the increased conversational abilities of LLMs, namely GPT-4. It provides a simple, interpretable forum for models to communicate feedback and iteratively improve output. We frame our dialog as a discussion between two agent types - a Researcher, who processes information and identifies crucial problem components, and a Decider, who has the autonomy to integrate the Researcher's information and makes judgments on the final output. We test DERA against three clinically-focused tasks. For medical conversation summarization and care plan generation, DERA shows significant improvement over the base GPT-4 performance in both human expert preference evaluations and quantitative metrics. In a new finding, we also show that GPT-4's performance (70%) on an open-ended version of the MedQA question-answering (QA) dataset (Jin et al. 2021, USMLE) is well above the passing level (60%), with DERA showing similar performance. We release the open-ended MEDQA dataset at https://github.com/curai/curai-research/tree/main/DERA.
    Contextual Combinatorial Bandits with Probabilistically Triggered Arms. (arXiv:2303.17110v1 [cs.LG])
    We study contextual combinatorial bandits with probabilistically triggered arms (C$^2$MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC$^2$-UCB and derive a regret bound $\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a valuable by-product, we find our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C$^2$MAB~setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.  ( 2 min )
    Polarity is all you need to learn and transfer faster. (arXiv:2303.17589v1 [cs.LG])
    Natural intelligences (NIs) thrive in a dynamic world - they learn quickly, sometimes with only a few samples. In contrast, Artificial intelligences (AIs) typically learn with prohibitive amount of training samples and computational power. What design principle difference between NI and AI could contribute to such a discrepancy? Here, we propose an angle from weight polarity: development processes initialize NIs with advantageous polarity configurations; as NIs grow and learn, synapse magnitudes update yet polarities are largely kept unchanged. We demonstrate with simulation and image classification tasks that if weight polarities are adequately set $\textit{a priori}$, then networks learn with less time and data. We also explicitly illustrate situations in which $\textit{a priori}$ setting the weight polarities is disadvantageous for networks. Our work illustrates the value of weight polarities from the perspective of statistical and computational efficiency during learning.  ( 2 min )
    Enhancing Continual Learning with Global Prototypes: Counteracting Negative Representation Drift. (arXiv:2205.12186v2 [cs.CL] UPDATED)
    Continual learning (CL) aims to learn a sequence of tasks over time, with data distributions shifting from one task to another. When training on new task data, data representations from old tasks may drift. Some negative representation drift can result in catastrophic forgetting, by causing the locally learned class prototypes and data representations to correlate poorly across tasks. To mitigate such representation drift, we propose a method that finds global prototypes to guide the learning, and learns data representations with the regularization of the self-supervised information. Specifically, for NLP tasks, we formulate each task in a masked language modeling style, and learn the task via a neighbor attention mechanism over a pre-trained language model. Experimental results show that our proposed method can learn fairly consistent representations with less representation drift, and significantly reduce catastrophic forgetting in CL without resampling data from past tasks.  ( 2 min )
    Recognition, recall, and retention of few-shot memories in large language models. (arXiv:2303.17557v1 [cs.CL])
    The training of modern large language models (LLMs) takes place in a regime where most training examples are seen only a few times by the model during the course of training. What does a model remember about such examples seen only a few times during training and how long does that memory persist in the face of continuous training with new examples? Here, we investigate these questions through simple recognition, recall, and retention experiments with LLMs. In recognition experiments, we ask if the model can distinguish the seen example from a novel example; in recall experiments, we ask if the model can correctly recall the seen example when cued by a part of it; and in retention experiments, we periodically probe the model's memory for the original examples as the model is trained continuously with new examples. We find that a single exposure is generally sufficient for a model to achieve near perfect accuracy even in very challenging recognition experiments. We estimate that the recognition performance of even small language models easily exceeds human recognition performance reported in similar experiments with humans (Shepard, 1967). Achieving near perfect recall takes more exposures, but most models can do it in just 3 exposures. The flip side of this remarkable capacity for fast learning is that precise memories are quickly overwritten: recall performance for the original examples drops steeply over the first 10 training updates with new examples, followed by a more gradual decline. Even after 100K updates, however, some of the original examples are still recalled near perfectly. A qualitatively similar retention pattern has been observed in human long-term memory retention studies before (Bahrick, 1984). Finally, recognition is much more robust to interference than recall and memory for natural language sentences is generally superior to memory for stimuli without structure.  ( 3 min )
    Elastic Weight Removal for Faithful and Abstractive Dialogue Generation. (arXiv:2303.17574v1 [cs.CL])
    Ideally, dialogue systems should generate responses that are faithful to the knowledge contained in relevant documents. However, many models generate hallucinated responses instead that contradict it or contain unverifiable information. To mitigate such undesirable behaviour, it has been proposed to fine-tune a `negative expert' on negative examples and subtract its parameters from those of a pre-trained model. However, intuitively, this does not take into account that some parameters are more responsible than others in causing hallucinations. Thus, we propose to weigh their individual importance via (an approximation of) the Fisher Information matrix, which measures the uncertainty of their estimate. We call this method Elastic Weight Removal (EWR). We evaluate our method -- using different variants of Flan-T5 as a backbone language model -- on multiple datasets for information-seeking dialogue generation and compare our method with state-of-the-art techniques for faithfulness, such as CTRL, Quark, DExperts, and Noisy Channel reranking. Extensive automatic and human evaluation shows that EWR systematically increases faithfulness at minor costs in terms of other metrics. However, we notice that only discouraging hallucinations may increase extractiveness, i.e. shallow copy-pasting of document spans, which can be undesirable. Hence, as a second main contribution, we show that our method can be extended to simultaneously discourage hallucinations and extractive responses. We publicly release the code for reproducing EWR and all baselines.
    BloombergGPT: A Large Language Model for Finance. (arXiv:2303.17564v1 [cs.LG])
    The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. As a next step, we plan to release training logs (Chronicles) detailing our experience in training BloombergGPT.  ( 2 min )
    Can I Trust My Simulation Model? Measuring the Quality of Business Process Simulation Models. (arXiv:2303.17463v1 [cs.SE])
    Business Process Simulation (BPS) is an approach to analyze the performance of business processes under different scenarios. For example, BPS allows us to estimate what would be the cycle time of a process if one or more resources became unavailable. The starting point of BPS is a process model annotated with simulation parameters (a BPS model). BPS models may be manually designed, based on information collected from stakeholders and empirical observations, or automatically discovered from execution data. Regardless of its origin, a key question when using a BPS model is how to assess its quality. In this paper, we propose a collection of measures to evaluate the quality of a BPS model w.r.t. its ability to replicate the observed behavior of the process. We advocate an approach whereby different measures tackle different process perspectives. We evaluate the ability of the proposed measures to discern the impact of modifications to a BPS model, and their ability to uncover the relative strengths and weaknesses of two approaches for automated discovery of BPS models. The evaluation shows that the measures not only capture how close a BPS model is to the observed behavior, but they also help us to identify sources of discrepancies.  ( 3 min )
    Machine Learning for Partial Differential Equations. (arXiv:2303.17078v1 [cs.LG])
    Partial differential equations (PDEs) are among the most universal and parsimonious descriptions of natural physical laws, capturing a rich variety of phenomenology and multi-scale physics in a compact and symbolic representation. This review will examine several promising avenues of PDE research that are being advanced by machine learning, including: 1) the discovery of new governing PDEs and coarse-grained approximations for complex natural and engineered systems, 2) learning effective coordinate systems and reduced-order models to make PDEs more amenable to analysis, and 3) representing solution operators and improving traditional numerical algorithms. In each of these fields, we summarize key advances, ongoing challenges, and opportunities for further development.
    DPP-based Client Selection for Federated Learning with Non-IID Data. (arXiv:2303.17358v1 [cs.LG])
    This paper proposes a client selection (CS) method to tackle the communication bottleneck of federated learning (FL) while concurrently coping with FL's data heterogeneity issue. Specifically, we first analyze the effect of CS in FL and show that FL training can be accelerated by adequately choosing participants to diversify the training dataset in each round of training. Based on this, we leverage data profiling and determinantal point process (DPP) sampling techniques to develop an algorithm termed Federated Learning with DPP-based Participant Selection (FL-DP$^3$S). This algorithm effectively diversifies the participants' datasets in each round of training while preserving their data privacy. We conduct extensive experiments to examine the efficacy of our proposed method. The results show that our scheme attains a faster convergence rate, as well as a smaller communication overhead than several baselines.  ( 2 min )
    HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Devices. (arXiv:2303.17218v1 [cs.AR])
    For Human Action Recognition tasks (HAR), 3D Convolutional Neural Networks have proven to be highly effective, achieving state-of-the-art results. This study introduces a novel streaming architecture based toolflow for mapping such models onto FPGAs considering the model's inherent characteristics and the features of the targeted FPGA device. The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics, generating a design that minimizes the latency of the computation. The toolflow is comprised of a number of parts, including i) a 3D CNN parser, ii) a performance and resource model, iii) a scheduling algorithm for executing 3D models on the generated hardware, iv) a resource-aware optimization engine tailored for 3D models, v) an automated mapping to synthesizable code for FPGAs. The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs. Furthermore, the toolflow has produced high-performing results for 3D CNN models that have not been mapped to FPGAs before, demonstrating the potential of FPGA-based systems in this space. Overall, HARFLOW3D has demonstrated its ability to deliver competitive latency compared to a range of state-of-the-art hand-tuned approaches being able to achieve up to 5$\times$ better performance compared to some of the existing works.
    Audio-Visual Grouping Network for Sound Localization from Mixtures. (arXiv:2303.17056v1 [cs.CV])
    Sound source localization is a typical and challenging task that predicts the location of sound sources in a video. Previous single-source methods mainly used the audio-visual association as clues to localize sounding objects in each image. Due to the mixed property of multiple sound sources in the original space, there exist rare multi-source approaches to localizing multiple sources simultaneously, except for one recent work using a contrastive random walk in the graph with images and separated sound as nodes. Despite their promising performance, they can only handle a fixed number of sources, and they cannot learn compact class-aware representations for individual sources. To alleviate this shortcoming, in this paper, we propose a novel audio-visual grouping network, namely AVGN, that can directly learn category-wise semantic features for each source from the input audio mixture and image to localize multiple sources simultaneously. Specifically, our AVGN leverages learnable audio-visual class tokens to aggregate class-aware source features. Then, the aggregated semantic features for each source can be used as guidance to localize the corresponding visual regions. Compared to existing multi-source methods, our new framework can localize a flexible number of sources and disentangle category-aware audio-visual representations for individual sound sources. We conduct extensive experiments on MUSIC, VGGSound-Instruments, and VGG-Sound Sources benchmarks. The results demonstrate that the proposed AVGN can achieve state-of-the-art sounding object localization performance on both single-source and multi-source scenarios. Code is available at \url{https://github.com/stoneMo/AVGN}.
    Efficient distributed representations beyond negative sampling. (arXiv:2303.17475v1 [cs.LG])
    This article describes an efficient method to learn distributed representations, also known as embeddings. This is accomplished minimizing an objective function similar to the one introduced in the Word2Vec algorithm and later adopted in several works. The optimization computational bottleneck is the calculation of the softmax normalization constants for which a number of operations scaling quadratically with the sample size is required. This complexity is unsuited for large datasets and negative sampling is a popular workaround, allowing one to obtain distributed representations in linear time with respect to the sample size. Negative sampling consists, however, in a change of the loss function and hence solves a different optimization problem from the one originally proposed. Our contribution is to show that the sotfmax normalization constants can be estimated in linear time, allowing us to design an efficient optimization strategy to learn distributed representations. We test our approximation on two popular applications related to word and node embeddings. The results evidence competing performance in terms of accuracy with respect to negative sampling with a remarkably lower computational time.
    Neglected Free Lunch -- Learning Image Classifiers Using Annotation Byproducts. (arXiv:2303.17595v1 [cs.CV])
    Supervised learning of image classifiers distills human knowledge into a parametric model through pairs of images and corresponding labels (X,Y). We argue that this simple and widely used representation of human knowledge neglects rich auxiliary information from the annotation procedure, such as the time-series of mouse traces and clicks left after image selection. Our insight is that such annotation byproducts Z provide approximate human attention that weakly guides the model to focus on the foreground cues, reducing spurious correlations and discouraging shortcut learning. To verify this, we create ImageNet-AB and COCO-AB. They are ImageNet and COCO training sets enriched with sample-wise annotation byproducts, collected by replicating the respective original annotation tasks. We refer to the new paradigm of training models with annotation byproducts as learning using annotation byproducts (LUAB). We show that a simple multitask loss for regressing Z together with Y already improves the generalisability and robustness of the learned models. Compared to the original supervised learning, LUAB does not require extra annotation costs. ImageNet-AB and COCO-AB are at https://github.com/naver-ai/NeglectedFreeLunch.  ( 2 min )
    MOEF: Modeling Occasion Evolution in Frequency Domain for Promotion-Aware Click-Through Rate Prediction. (arXiv:2112.13747v6 [cs.LG] UPDATED)
    Promotions are becoming more important and prevalent in e-commerce to attract customers and boost sales, leading to frequent changes of occasions, which drives users to behave differently. In such situations, most existing Click-Through Rate (CTR) models can't generalize well to online serving due to distribution uncertainty of the upcoming occasion. In this paper, we propose a novel CTR model named MOEF for recommendations under frequent changes of occasions. Firstly, we design a time series that consists of occasion signals generated from the online business scenario. Since occasion signals are more discriminative in the frequency domain, we apply Fourier Transformation to sliding time windows upon the time series, obtaining a sequence of frequency spectrum which is then processed by Occasion Evolution Layer (OEL). In this way, a high-order occasion representation can be learned to handle the online distribution uncertainty. Moreover, we adopt multiple experts to learn feature representations from multiple aspects, which are guided by the occasion representation via an attention mechanism. Accordingly, a mixture of feature representations is obtained adaptively for different occasions to predict the final CTR. Experimental results on real-world datasets validate the superiority of MOEF and online A/B tests also show MOEF outperforms representative CTR models significantly.  ( 3 min )
    Language Models can Solve Computer Tasks. (arXiv:2303.17491v1 [cs.CL])
    Agents capable of carrying out general tasks on a computer can improve efficiency and productivity by automating repetitive tasks and assisting in complex problem-solving. Ideally, such agents should be able to solve new computer tasks presented to them through natural language commands. However, previous approaches to this problem require large amounts of expert demonstrations and task-specific reward functions, both of which are impractical for new tasks. In this work, we show that a pre-trained large language model (LLM) agent can execute computer tasks guided by natural language using a simple prompting scheme where the agent recursively criticizes and improves its output (RCI). The RCI approach significantly outperforms existing LLM methods for automating computer tasks and surpasses supervised learning (SL) and reinforcement learning (RL) approaches on the MiniWoB++ benchmark. RCI is competitive with the state-of-the-art SL+RL method, using only a handful of demonstrations per task rather than tens of thousands, and without a task-specific reward function. Furthermore, we demonstrate RCI prompting's effectiveness in enhancing LLMs' reasoning abilities on a suite of natural language reasoning tasks, outperforming chain of thought (CoT) prompting. We find that RCI combined with CoT performs better than either separately.  ( 2 min )
    Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions. (arXiv:2303.17396v1 [cs.LG])
    Offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. But how should we ideally finetune agents obtained from offline RL training? While offline RL algorithms can in principle be used for finetuning, in practice, their online performance improves slowly. In contrast, we show that it is possible to use standard online off-policy algorithms for faster improvement. However, we find this approach may suffer from policy collapse, where the policy undergoes severe performance deterioration during initial online learning. We investigate the issue of policy collapse and how it relates to data diversity, algorithm choices and online replay distribution. Based on these insights, we propose a conservative policy optimization procedure that can achieve stable and sample-efficient online learning from offline pretraining.  ( 2 min )
    Leveraging joint sparsity in hierarchical Bayesian learning. (arXiv:2303.16954v1 [stat.ML])
    We present a hierarchical Bayesian learning approach to infer jointly sparse parameter vectors from multiple measurement vectors. Our model uses separate conditionally Gaussian priors for each parameter vector and common gamma-distributed hyper-parameters to enforce joint sparsity. The resulting joint-sparsity-promoting priors are combined with existing Bayesian inference methods to generate a new family of algorithms. Our numerical experiments, which include a multi-coil magnetic resonance imaging application, demonstrate that our new approach consistently outperforms commonly used hierarchical Bayesian methods.
    Using AI to Measure Parkinson's Disease Severity at Home. (arXiv:2303.17573v1 [cs.LG])
    We present an artificial intelligence system to remotely assess the motor performance of individuals with Parkinson's disease (PD). Participants performed a motor task (i.e., tapping fingers) in front of a webcam, and data from 250 global participants were rated by three expert neurologists following the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS). The neurologists' ratings were highly reliable, with an intra-class correlation coefficient (ICC) of 0.88. We developed computer algorithms to obtain objective measurements that align with the MDS-UPDRS guideline and are strongly correlated with the neurologists' ratings. Our machine learning model trained on these measures outperformed an MDS-UPDRS certified rater, with a mean absolute error (MAE) of 0.59 compared to the rater's MAE of 0.79. However, the model performed slightly worse than the expert neurologists (0.53 MAE). The methodology can be replicated for similar motor tasks, providing the possibility of evaluating individuals with PD and other movement disorders remotely, objectively, and in areas with limited access to neurological care.
    Whose Opinions Do Language Models Reflect?. (arXiv:2303.17548v1 [cs.CL])
    Language models (LMs) are increasingly being used in open-ended contexts, where the opinions reflected by LMs in response to subjective queries can have a profound impact, both on user satisfaction, as well as shaping the views of society at large. In this work, we put forth a quantitative framework to investigate the opinions reflected by LMs -- by leveraging high-quality public opinion polls and their associated human responses. Using this framework, we create OpinionsQA, a new dataset for evaluating the alignment of LM opinions with those of 60 US demographic groups over topics ranging from abortion to automation. Across topics, we find substantial misalignment between the views reflected by current LMs and those of US demographic groups: on par with the Democrat-Republican divide on climate change. Notably, this misalignment persists even after explicitly steering the LMs towards particular demographic groups. Our analysis not only confirms prior observations about the left-leaning tendencies of some human feedback-tuned LMs, but also surfaces groups whose opinions are poorly reflected by current LMs (e.g., 65+ and widowed individuals). Our code and data are available at https://github.com/tatsu-lab/opinions_qa.
    Fairness-Aware Data Valuation for Supervised Learning. (arXiv:2303.16963v1 [cs.LG])
    Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose Fairness-Aware Data vauatiOn (FADO), a data valuation framework that can be used to incorporate fairness concerns into a series of ML-related tasks (e.g., data pre-processing, exploratory data analysis, active learning). We propose an entropy-based data valuation metric suited to address our two-pronged goal of maximizing both performance and fairness, which is more computationally efficient than existing metrics. We then show how FADO can be applied as the basis for unfairness mitigation pre-processing techniques. Our methods achieve promising results -- up to a 40 p.p. improvement in fairness at a less than 1 p.p. loss in performance compared to a baseline -- and promote fairness in a data-centric way, where a deeper understanding of data quality takes center stage.
    Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure. (arXiv:2303.17276v1 [cs.AI])
    Increase in computational scale and fine-tuning has seen a dramatic improvement in the quality of outputs of large language models (LLMs) like GPT. Given that both GPT-3 and GPT-4 were trained on large quantities of human-generated text, we might ask to what extent their outputs reflect patterns of human thinking, both for correct and incorrect cases. The Erotetic Theory of Reason (ETR) provides a symbolic generative model of both human success and failure in thinking, across propositional, quantified, and probabilistic reasoning, as well as decision-making. We presented GPT-3, GPT-3.5, and GPT-4 with 61 central inference and judgment problems from a recent book-length presentation of ETR, consisting of experimentally verified data-points on human judgment and extrapolated data-points predicted by ETR, with correct inference patterns as well as fallacies and framing effects (the ETR61 benchmark). ETR61 includes classics like Wason's card task, illusory inferences, the decoy effect, and opportunity-cost neglect, among others. GPT-3 showed evidence of ETR-predicted outputs for 59% of these examples, rising to 77% in GPT-3.5 and 75% in GPT-4. Remarkably, the production of human-like fallacious judgments increased from 18% in GPT-3 to 33% in GPT-3.5 and 34% in GPT-4. This suggests that larger and more advanced LLMs may develop a tendency toward more human-like mistakes, as relevant thought patterns are inherent in human-produced training data. According to ETR, the same fundamental patterns are involved both in successful and unsuccessful ordinary reasoning, so that the "bad" cases could paradoxically be learned from the "good" cases. We further present preliminary evidence that ETR-inspired prompt engineering could reduce instances of these mistakes.
    Removing supervision in semantic segmentation with local-global matching and area balancing. (arXiv:2303.17410v1 [cs.CV])
    Removing supervision in semantic segmentation is still tricky. Current approaches can deal with common categorical patterns yet resort to multi-stage architectures. We design a novel end-to-end model leveraging local-global patch matching to predict categories, good localization, area and shape of objects for semantic segmentation. The local-global matching is, in turn, compelled by optimal transport plans fulfilling area constraints nearing a solution for exact shape prediction. Our model attains state-of-the-art in Weakly Supervised Semantic Segmentation, only image-level labels, with 75% mIoU on PascalVOC2012 val set and 46% on MS-COCO2014 val set. Dropping the image-level labels and clustering self-supervised learned features to yield pseudo-multi-level labels, we obtain an unsupervised model for semantic segmentation. We also attain state-of-the-art on Unsupervised Semantic Segmentation with 43.6% mIoU on PascalVOC2012 val set and 19.4% on MS-COCO2014 val set. The code is available at https://github.com/deepplants/PC2M.
    Three-way causal attribute partial order structure analysis. (arXiv:2303.17482v1 [cs.AI])
    As an emerging concept cognitive learning model, partial order formal structure analysis (POFSA) has been widely used in the field of knowledge processing. In this paper, we propose the method named three-way causal attribute partial order structure (3WCAPOS) to evolve the POFSA from set coverage to causal coverage in order to increase the interpretability and classification performance of the model. First, the concept of causal factor (CF) is proposed to evaluate the causal correlation between attributes and decision attributes in the formal decision context. Then, combining CF with attribute partial order structure, the concept of causal attribute partial order structure is defined and makes set coverage evolve into causal coverage. Finally, combined with the idea of three-way decision, 3WCAPOS is formed, which makes the purity of nodes in the structure clearer and the changes between levels more obviously. In addition, the experiments are carried out from the classification ability and the interpretability of the structure through the six datasets. Through these experiments, it is concluded the accuracy of 3WCAPOS is improved by 1% - 9% compared with classification and regression tree, and more interpretable and the processing of knowledge is more reasonable compared with attribute partial order structure.
    Specification-Guided Data Aggregation for Semantically Aware Imitation Learning. (arXiv:2303.17010v1 [cs.LG])
    Advancements in simulation and formal methods-guided environment sampling have enabled the rigorous evaluation of machine learning models in a number of safety-critical scenarios, such as autonomous driving. Application of these environment sampling techniques towards improving the learned models themselves has yet to be fully exploited. In this work, we introduce a novel method for improving imitation-learned models in a semantically aware fashion by leveraging specification-guided sampling techniques as a means of aggregating expert data in new environments. Specifically, we create a set of formal specifications as a means of partitioning the space of possible environments into semantically similar regions, and identify elements of this partition where our learned imitation behaves most differently from the expert. We then aggregate expert data on environments in these identified regions, leading to more accurate imitation of the expert's behavior semantics. We instantiate our approach in a series of experiments in the CARLA driving simulator, and demonstrate that our approach leads to models that are more accurate than those learned with other environment sampling methods.  ( 2 min )
    De-coupling and De-positioning Dense Self-supervised Learning. (arXiv:2303.16947v1 [cs.CV])
    Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects. Although the dense features extracted by employing segmentation maps and bounding boxes allow networks to perform SSL for each object, we show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding. We address this by introducing three data augmentation strategies, and leveraging them in (i) a decoupling module that aims to robustify the network to variations in the object's surroundings, and (ii) a de-positioning module that encourages the network to discard positional object information. We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection. Our extensive experiments evidence the better generalization of our method compared to the SOTA dense SSL methods  ( 2 min )
    CADM: Confusion Model-based Detection Method for Real-drift in Chunk Data Stream. (arXiv:2303.16906v1 [cs.LG])
    Concept drift detection has attracted considerable attention due to its importance in many real-world applications such as health monitoring and fault diagnosis. Conventionally, most advanced approaches will be of poor performance when the evaluation criteria of the environment has changed (i.e. concept drift), either can only detect and adapt to virtual drift. In this paper, we propose a new approach to detect real-drift in the chunk data stream with limited annotations based on concept confusion. When a new data chunk arrives, we use both real labels and pseudo labels to update the model after prediction and drift detection. In this context, the model will be confused and yields prediction difference once drift occurs. We then adopt cosine similarity to measure the difference. And an adaptive threshold method is proposed to find the abnormal value. Experiments show that our method has a low false alarm rate and false negative rate with the utilization of different classifiers.  ( 2 min )
    Machine learning-based spin structure detection. (arXiv:2303.16905v1 [cs.LG])
    One of the most important magnetic spin structure is the topologically stabilised skyrmion quasi-particle. Its interesting physical properties make them candidates for memory and efficient neuromorphic computation schemes. For the device operation, detection of the position, shape, and size of skyrmions is required and magnetic imaging is typically employed. A frequently used technique is magneto-optical Kerr microscopy where depending on the samples material composition, temperature, material growing procedures, etc., the measurements suffer from noise, low-contrast, intensity gradients, or other optical artifacts. Conventional image analysis packages require manual treatment, and a more automatic solution is required. We report a convolutional neural network specifically designed for segmentation problems to detect the position and shape of skyrmions in our measurements. The network is tuned using selected techniques to optimize predictions and in particular the number of detected classes is found to govern the performance. The results of this study shows that a well-trained network is a viable method of automating data pre-processing in magnetic microscopy. The approach is easily extendable to other spin structures and other magnetic imaging methods.  ( 2 min )
    FeDiSa: A Semi-asynchronous Federated Learning Framework for Power System Fault and Cyberattack Discrimination. (arXiv:2303.16956v1 [cs.CR])
    With growing security and privacy concerns in the Smart Grid domain, intrusion detection on critical energy infrastructure has become a high priority in recent years. To remedy the challenges of privacy preservation and decentralized power zones with strategic data owners, Federated Learning (FL) has contemporarily surfaced as a viable privacy-preserving alternative which enables collaborative training of attack detection models without requiring the sharing of raw data. To address some of the technical challenges associated with conventional synchronous FL, this paper proposes FeDiSa, a novel Semi-asynchronous Federated learning framework for power system faults and cyberattack Discrimination which takes into account communication latency and stragglers. Specifically, we propose a collaborative training of deep auto-encoder by Supervisory Control and Data Acquisition sub-systems which upload their local model updates to a control centre, which then perform a semi-asynchronous model aggregation for a new global model parameters based on a buffer system and a preset cut-off time. Experiments on the proposed framework using publicly available industrial control systems datasets reveal superior attack detection accuracy whilst preserving data confidentiality and minimizing the adverse effects of communication latency and stragglers. Furthermore, we see a 35% improvement in training time, thus validating the robustness of our proposed method.  ( 3 min )
    Investigating and Mitigating the Side Effects of Noisy Views in Multi-view Clustering in Practical Scenarios. (arXiv:2303.17245v1 [cs.LG])
    Multi-view clustering (MvC) aims at exploring the category structure among multi-view data without label supervision. Multiple views provide more information than single views and thus existing MvC methods can achieve satisfactory performance. However, their performance might seriously degenerate when the views are noisy in practical scenarios. In this paper, we first formally investigate the drawback of noisy views and then propose a theoretically grounded deep MvC method (namely MvCAN) to address this issue. Specifically, we propose a novel MvC objective that enables un-shared parameters and inconsistent clustering predictions across multiple views to reduce the side effects of noisy views. Furthermore, a non-parametric iterative process is designed to generate a robust learning target for mining multiple views' useful information. Theoretical analysis reveals that MvCAN works by achieving the multi-view consistency, complementarity, and noise robustness. Finally, experiments on public datasets demonstrate that MvCAN outperforms state-of-the-art methods and is robust against the existence of noisy views.  ( 2 min )
    A Generative Modeling Approach Using Quantum Gates. (arXiv:2303.16955v1 [quant-ph])
    In recent years, quantum computing has emerged as a promising technology for solving complex computational problems. Generative modeling is a technique that allows us to learn and generate new data samples similar to the original dataset. In this paper, we propose a generative modeling approach using quantum gates to generate new samples from a given dataset. We start with a brief introduction to quantum computing and generative modeling. Then, we describe our proposed approach, which involves encoding the dataset into quantum states and using quantum gates to manipulate these states to generate new samples. We also provide mathematical details of our approach and demonstrate its effectiveness through experimental results on various datasets.  ( 2 min )
    Practical self-supervised continual learning with continual fine-tuning. (arXiv:2303.17235v1 [cs.LG])
    Self-supervised learning (SSL) has shown remarkable performance in computer vision tasks when trained offline. However, in a Continual Learning (CL) scenario where new data is introduced progressively, models still suffer from catastrophic forgetting. Retraining a model from scratch to adapt to newly generated data is time-consuming and inefficient. Previous approaches suggested re-purposing self-supervised objectives with knowledge distillation to mitigate forgetting across tasks, assuming that labels from all tasks are available during fine-tuning. In this paper, we generalize self-supervised continual learning in a practical setting where available labels can be leveraged in any step of the SSL process. With an increasing number of continual tasks, this offers more flexibility in the pre-training and fine-tuning phases. With Kaizen, we introduce a training architecture that is able to mitigate catastrophic forgetting for both the feature extractor and classifier with a carefully designed loss function. By using a set of comprehensive evaluation metrics reflecting different aspects of continual learning, we demonstrated that Kaizen significantly outperforms previous SSL models in competitive vision benchmarks, with up to 16.5% accuracy improvement on split CIFAR-100. Kaizen is able to balance the trade-off between knowledge retention and learning from new data with an end-to-end model, paving the way for practical deployment of continual learning systems.  ( 2 min )
    A New Deep Learning and XAI-Based Algorithm for Features Selection in Genomics. (arXiv:2303.16914v1 [q-bio.GN])
    In the field of functional genomics, the analysis of gene expression profiles through Machine and Deep Learning is increasingly providing meaningful insight into a number of diseases. The paper proposes a novel algorithm to perform Feature Selection on genomic-scale data, which exploits the reconstruction capabilities of autoencoders and an ad-hoc defined Explainable Artificial Intelligence-based score in order to select the most informative genes for diagnosis, prognosis, and precision medicine. Results of the application on a Chronic Lymphocytic Leukemia dataset evidence the effectiveness of the algorithm, by identifying and suggesting a set of meaningful genes for further medical investigation.  ( 2 min )
    A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision. (arXiv:2303.17376v1 [cs.CV])
    There has been a recent explosion of computer vision models which perform many tasks and are composed of an image encoder (usually a ViT) and an autoregressive decoder (usually a Transformer). However, most of this work simply presents one system and its results, leaving many questions regarding design decisions and trade-offs of such systems unanswered. In this work, we aim to provide such answers. We take a close look at autoregressive decoders for multi-task learning in multimodal computer vision, including classification, captioning, visual question answering, and optical character recognition. Through extensive systematic experiments, we study the effects of task and data mixture, training and regularization hyperparameters, conditioning type and specificity, modality combination, and more. Importantly, we compare these to well-tuned single-task baselines to highlight the cost incurred by multi-tasking. A key finding is that a small decoder learned on top of a frozen pretrained encoder works surprisingly well. We call this setup locked-image tuning with decoder (LiT-decoder). It can be seen as teaching a decoder to interact with a pretrained vision model via natural language.  ( 2 min )
    The Graphical Nadaraya-Watson Estimator on Latent Position Models. (arXiv:2303.17229v1 [stat.ML])
    Given a graph with a subset of labeled nodes, we are interested in the quality of the averaging estimator which for an unlabeled node predicts the average of the observations of its labeled neighbours. We rigorously study concentration properties, variance bounds and risk bounds in this context. While the estimator itself is very simple and the data generating process is too idealistic for practical applications, we believe that our small steps will contribute towards the theoretical understanding of more sophisticated methods such as Graph Neural Networks.  ( 2 min )
    NN-Copula-CD: A Copula-Guided Interpretable Neural Network for Change Detection in Heterogeneous Remote Sensing Images. (arXiv:2303.17448v1 [cs.CV])
    Change detection (CD) in heterogeneous remote sensing images is a practical and challenging issue for real-life emergencies. In the past decade, the heterogeneous CD problem has significantly benefited from the development of deep neural networks (DNN). However, the data-driven DNNs always perform like a black box where the lack of interpretability limits the trustworthiness and controllability of DNNs in most practical CD applications. As a strong knowledge-driven tool to measure correlation between random variables, Copula theory has been introduced into CD, yet it suffers from non-robust CD performance without manual prior selection for Copula functions. To address the above issues, we propose a knowledge-data-driven heterogeneous CD method (NN-Copula-CD) based on the Copula-guided interpretable neural network. In our NN-Copula-CD, the mathematical characteristics of Copula are designed as the losses to supervise a simple fully connected neural network to learn the correlation between bi-temporal image patches, and then the changed regions are identified via binary classification for the correlation coefficients of all image patch pairs of the bi-temporal images. We conduct in-depth experiments on three datasets with multimodal images (e.g., Optical, SAR, and NIR), where the quantitative results and visualized analysis demonstrate both the effectiveness and interpretability of the proposed NN-Copula-CD.  ( 3 min )
    Sasaki Metric for Spline Models of Manifold-Valued Trajectories. (arXiv:2303.17299v1 [math.DG])
    We propose a generic spatiotemporal framework to analyze manifold-valued measurements, which allows for employing an intrinsic and computationally efficient Riemannian hierarchical model. Particularly, utilizing regression, we represent discrete trajectories in a Riemannian manifold by composite B\' ezier splines, propose a natural metric induced by the Sasaki metric to compare the trajectories, and estimate average trajectories as group-wise trends. We evaluate our framework in comparison to state-of-the-art methods within qualitative and quantitative experiments on hurricane tracks. Notably, our results demonstrate the superiority of spline-based approaches for an intensity classification of the tracks.  ( 2 min )
    Fast inference of latent space dynamics in huge relational event networks. (arXiv:2303.17460v1 [cs.SI])
    Relational events are a type of social interactions, that sometimes are referred to as dynamic networks. Its dynamics typically depends on emerging patterns, so-called endogenous variables, or external forces, referred to as exogenous variables. Comprehensive information on the actors in the network, especially for huge networks, is rare, however. A latent space approach in network analysis has been a popular way to account for unmeasured covariates that are driving network configurations. Bayesian and EM-type algorithms have been proposed for inferring the latent space, but both the sheer size many social network applications as well as the dynamic nature of the process, and therefore the latent space, make computations prohibitively expensive. In this work we propose a likelihood-based algorithm that can deal with huge relational event networks. We propose a hierarchical strategy for inferring network community dynamics embedded into an interpretable latent space. Node dynamics are described by smooth spline processes. To make the framework feasible for large networks we borrow from machine learning optimization methodology. Model-based clustering is carried out via a convex clustering penalization, encouraging shared trajectories for ease of interpretation. We propose a model-based approach for separating macro-microstructures and perform a hierarchical analysis within successive hierarchies. The method can fit millions of nodes on a public Colab GPU in a few minutes. The code and a tutorial are available in a Github repository.  ( 3 min )
    Invertible Convolution with Symmetric Paddings. (arXiv:2303.17361v1 [cs.CV])
    We show that symmetrically padded convolution can be analytically inverted via DFT. We comprehensively analyze several different symmetric and anti-symmetric padding modes and show that multiple cases exist where the inversion can be achieved. The implementation is available at \url{https://github.com/prclibo/iconv_dft}.  ( 2 min )
    DiffCollage: Parallel Generation of Large Content with Diffusion Models. (arXiv:2303.17076v1 [cs.CV])
    We present DiffCollage, a compositional diffusion model that can generate large content by leveraging diffusion models trained on generating pieces of the large content. Our approach is based on a factor graph representation where each factor node represents a portion of the content and a variable node represents their overlap. This representation allows us to aggregate intermediate outputs from diffusion models defined on individual nodes to generate content of arbitrary size and shape in parallel without resorting to an autoregressive generation procedure. We apply DiffCollage to various tasks, including infinite image generation, panorama image generation, and long-duration text-guided motion generation. Extensive experimental results with a comparison to strong autoregressive baselines verify the effectiveness of our approach.  ( 2 min )
    EPG-MGCN: Ego-Planning Guided Multi-Graph Convolutional Network for Heterogeneous Agent Trajectory Prediction. (arXiv:2303.17027v1 [cs.LG])
    To drive safely in complex traffic environments, autonomous vehicles need to make an accurate prediction of the future trajectories of nearby heterogeneous traffic agents (i.e., vehicles, pedestrians, bicyclists, etc). Due to the interactive nature, human drivers are accustomed to infer what the future situations will become if they are going to execute different maneuvers. To fully exploit the impacts of interactions, this paper proposes a ego-planning guided multi-graph convolutional network (EPG-MGCN) to predict the trajectories of heterogeneous agents using both historical trajectory information and ego vehicle's future planning information. The EPG-MGCN first models the social interactions by employing four graph topologies, i.e., distance graphs, visibility graphs, planning graphs and category graphs. Then, the planning information of the ego vehicle is encoded by both the planning graph and the subsequent planning-guided prediction module to reduce uncertainty in the trajectory prediction. Finally, a category-specific gated recurrent unit (CS-GRU) encoder-decoder is designed to generate future trajectories for each specific type of agents. Our network is evaluated on two real-world trajectory datasets: ApolloScape and NGSIM. The experimental results show that the proposed EPG-MGCN achieves state-of-the-art performance compared to existing methods.  ( 2 min )
    HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion. (arXiv:2303.17015v1 [cs.CV])
    Implicit neural fields, typically encoded by a multilayer perceptron (MLP) that maps from coordinates (e.g., xyz) to signals (e.g., signed distances), have shown remarkable promise as a high-fidelity and compact representation. However, the lack of a regular and explicit grid structure also makes it challenging to apply generative modeling directly on implicit neural fields in order to synthesize new data. To this end, we propose HyperDiffusion, a novel approach for unconditional generative modeling of implicit neural fields. HyperDiffusion operates directly on MLP weights and generates new neural implicit fields encoded by synthesized MLP parameters. Specifically, a collection of MLPs is first optimized to faithfully represent individual data samples. Subsequently, a diffusion process is trained in this MLP weight space to model the underlying distribution of neural implicit fields. HyperDiffusion enables diffusion modeling over a implicit, compact, and yet high-fidelity representation of complex signals across 3D shapes and 4D mesh animations within one single unified framework.  ( 2 min )
    Are Neural Architecture Search Benchmarks Well Designed? A Deeper Look Into Operation Importance. (arXiv:2303.16938v1 [cs.LG])
    Neural Architecture Search (NAS) benchmarks significantly improved the capability of developing and comparing NAS methods while at the same time drastically reduced the computational overhead by providing meta-information about thousands of trained neural networks. However, tabular benchmarks have several drawbacks that can hinder fair comparisons and provide unreliable results. These usually focus on providing a small pool of operations in heavily constrained search spaces -- usually cell-based neural networks with pre-defined outer-skeletons. In this work, we conducted an empirical analysis of the widely used NAS-Bench-101, NAS-Bench-201 and TransNAS-Bench-101 benchmarks in terms of their generability and how different operations influence the performance of the generated architectures. We found that only a subset of the operation pool is required to generate architectures close to the upper-bound of the performance range. Also, the performance distribution is negatively skewed, having a higher density of architectures in the upper-bound range. We consistently found convolution layers to have the highest impact on the architecture's performance, and that specific combination of operations favors top-scoring architectures. These findings shed insights on the correct evaluation and comparison of NAS methods using NAS benchmarks, showing that directly searching on NAS-Bench-201, ImageNet16-120 and TransNAS-Bench-101 produces more reliable results than searching only on CIFAR-10. Furthermore, with this work we provide suggestions for future benchmark evaluations and design. The code used to conduct the evaluations is available at https://github.com/VascoLopes/NAS-Benchmark-Evaluation.  ( 3 min )
    Have it your way: Individualized Privacy Assignment for DP-SGD. (arXiv:2303.17046v1 [cs.LG])
    When training a machine learning model with differential privacy, one sets a privacy budget. This budget represents a maximal privacy violation that any user is willing to face by contributing their data to the training set. We argue that this approach is limited because different users may have different privacy expectations. Thus, setting a uniform privacy budget across all points may be overly conservative for some users or, conversely, not sufficiently protective for others. In this paper, we capture these preferences through individualized privacy budgets. To demonstrate their practicality, we introduce a variant of Differentially Private Stochastic Gradient Descent (DP-SGD) which supports such individualized budgets. DP-SGD is the canonical approach to training models with differential privacy. We modify its data sampling and gradient noising mechanisms to arrive at our approach, which we call Individualized DP-SGD (IDP-SGD). Because IDP-SGD provides privacy guarantees tailored to the preferences of individual users and their data points, we find it empirically improves privacy-utility trade-offs.  ( 2 min )
    Multifactor Sequential Disentanglement via Structured Koopman Autoencoders. (arXiv:2303.17264v1 [cs.LG])
    Disentangling complex data to its latent factors of variation is a fundamental task in representation learning. Existing work on sequential disentanglement mostly provides two factor representations, i.e., it separates the data to time-varying and time-invariant factors. In contrast, we consider multifactor disentanglement in which multiple (more than two) semantic disentangled components are generated. Key to our approach is a strong inductive bias where we assume that the underlying dynamics can be represented linearly in the latent space. Under this assumption, it becomes natural to exploit the recently introduced Koopman autoencoder models. However, disentangled representations are not guaranteed in Koopman approaches, and thus we propose a novel spectral loss term which leads to structured Koopman matrices and disentanglement. Overall, we propose a simple and easy to code new deep model that is fully unsupervised and it supports multifactor disentanglement. We showcase new disentangling abilities such as swapping of individual static factors between characters, and an incremental swap of disentangled factors from the source to the target. Moreover, we evaluate our method extensively on two factor standard benchmark tasks where we significantly improve over competing unsupervised approaches, and we perform competitively in comparison to weakly- and self-supervised state-of-the-art approaches. The code is available at https://github.com/azencot-group/SKD.  ( 2 min )
    Training Neural Networks is NP-Hard in Fixed Dimension. (arXiv:2303.17045v1 [cs.CC])
    We study the parameterized complexity of training two-layer neural networks with respect to the dimension of the input data and the number of hidden neurons, considering ReLU and linear threshold activation functions. Albeit the computational complexity of these problems has been studied numerous times in recent years, several questions are still open. We answer questions by Arora et al. [ICLR '18] and Khalife and Basu [IPCO '22] showing that both problems are NP-hard for two dimensions, which excludes any polynomial-time algorithm for constant dimension. We also answer a question by Froese et al. [JAIR '22] proving W[1]-hardness for four ReLUs (or two linear threshold neurons) with zero training error. Finally, in the ReLU case, we show fixed-parameter tractability for the combined parameter number of dimensions and number of ReLUs if the network is assumed to compute a convex map. Our results settle the complexity status regarding these parameters almost completely.  ( 2 min )
    Training Feedforward Neural Networks with Bayesian Hyper-Heuristics. (arXiv:2303.16912v1 [cs.LG])
    The process of training feedforward neural networks (FFNNs) can benefit from an automated process where the best heuristic to train the network is sought out automatically by means of a high-level probabilistic-based heuristic. This research introduces a novel population-based Bayesian hyper-heuristic (BHH) that is used to train feedforward neural networks (FFNNs). The performance of the BHH is compared to that of ten popular low-level heuristics, each with different search behaviours. The chosen heuristic pool consists of classic gradient-based heuristics as well as meta-heuristics (MHs). The empirical process is executed on fourteen datasets consisting of classification and regression problems with varying characteristics. The BHH is shown to be able to train FFNNs well and provide an automated method for finding the best heuristic to train the FFNNs at various stages of the training process.  ( 2 min )
    The secret of immersion: actor driven camera movement generation for auto-cinematography. (arXiv:2303.17041v1 [cs.MM])
    Immersion plays a vital role when designing cinematic creations, yet the difficulty in immersive shooting prevents designers to create satisfactory outputs. In this work, we analyze the specific components that contribute to cinematographic immersion considering spatial, emotional, and aesthetic level, while these components are then combined into a high-level evaluation mechanism. Guided by such a immersion mechanism, we propose a GAN-based camera control system that is able to generate actor-driven camera movements in the 3D virtual environment to obtain immersive film sequences. The proposed encoder-decoder architecture in the generation flow transfers character motion into camera trajectory conditioned on an emotion factor. This ensures spatial and emotional immersion by performing actor-camera synchronization physically and psychologically. The emotional immersion is further strengthened by incorporating regularization that controls camera shakiness for expressing different mental statuses. To achieve aesthetic immersion, we make effort to improve aesthetic frame compositions by modifying the synthesized camera trajectory. Based on a self-supervised adjustor, the adjusted camera placements can project the character to the appropriate on-frame locations following aesthetic rules. The experimental results indicate that our proposed camera control system can efficiently offer immersive cinematic videos, both quantitatively and qualitatively, based on a fine-grained immersive shooting. Live examples are shown in the supplementary video.  ( 2 min )
    Severity classification of ground-glass opacity via 2-D convolutional neural network and lung CT scans: a 3-day exploration. (arXiv:2303.16904v1 [eess.IV])
    Ground-glass opacity is a hallmark of numerous lung diseases, including patients with COVID19 and pneumonia. This brief note presents experimental results of a proof-of-concept framework that got implemented and tested over three days as driven by the third challenge entitled "COVID-19 Competition", hosted at the AI-Enabled Medical Image Analysis Workshop of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023). Using a newly built virtual environment (created on March 17, 2023), we investigated various pre-trained two-dimensional convolutional neural networks (CNN) such as Dense Neural Network, Residual Neural Networks (ResNet), and Vision Transformers, as well as the extent of fine-tuning. Based on empirical experiments, we opted to fine-tune them using ADAM's optimization algorithm with a standard learning rate of 0.001 for all CNN architectures and apply early-stopping whenever the validation loss reached a plateau. For each trained CNN, the model state with the best validation accuracy achieved during training was stored and later reloaded for new classifications of unseen samples drawn from the validation set provided by the challenge organizers. According to the organizers, few of these 2D CNNs yielded performance comparable to an architecture that combined ResNet and Recurrent Neural Network (Gated Recurrent Units). As part of the challenge requirement, the source code produced during the course of this exercise is posted at https://github.com/lisatwyw/cov19. We also hope that other researchers may find this light prototype consisting of few Python files based on PyTorch 1.13.1 and TorchVision 0.14.1 approachable.  ( 3 min )
    Mixed Autoencoder for Self-supervised Visual Representation Learning. (arXiv:2303.17152v1 [cs.CV])
    Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks via randomly masking image patches and reconstruction. However, effective data augmentation strategies for MAE still remain open questions, different from those in contrastive learning that serve as the most important part. This paper studies the prevailing mixing augmentation for MAE. We first demonstrate that naive mixing will in contrast degenerate model performance due to the increase of mutual information (MI). To address, we propose homologous recognition, an auxiliary pretext task, not only to alleviate the MI increasement by explicitly requiring each patch to recognize homologous patches, but also to perform object-aware self-supervised pre-training for better downstream dense perception performance. With extensive experiments, we demonstrate that our proposed Mixed Autoencoder (MixedAE) achieves the state-of-the-art transfer results among masked image modeling (MIM) augmentations on different downstream tasks with significant efficiency. Specifically, our MixedAE outperforms MAE by +0.3% accuracy, +1.7 mIoU and +0.9 AP on ImageNet-1K, ADE20K and COCO respectively with a standard ViT-Base. Moreover, MixedAE surpasses iBOT, a strong MIM method combined with instance discrimination, while accelerating training by 2x. To our best knowledge, this is the very first work to consider mixing for MIM from the perspective of pretext task design. Code will be made available.  ( 2 min )
    Efficient Sampling of Stochastic Differential Equations with Positive Semi-Definite Models. (arXiv:2303.17109v1 [stat.ML])
    This paper deals with the problem of efficient sampling from a stochastic differential equation, given the drift function and the diffusion matrix. The proposed approach leverages a recent model for probabilities \citep{rudi2021psd} (the positive semi-definite -- PSD model) from which it is possible to obtain independent and identically distributed (i.i.d.) samples at precision $\varepsilon$ with a cost that is $m^2 d \log(1/\varepsilon)$ where $m$ is the dimension of the model, $d$ the dimension of the space. The proposed approach consists in: first, computing the PSD model that satisfies the Fokker-Planck equation (or its fractional variant) associated with the SDE, up to error $\varepsilon$, and then sampling from the resulting PSD model. Assuming some regularity of the Fokker-Planck solution (i.e. $\beta$-times differentiability plus some geometric condition on its zeros) We obtain an algorithm that: (a) in the preparatory phase obtains a PSD model with L2 distance $\varepsilon$ from the solution of the equation, with a model of dimension $m = \varepsilon^{-(d+1)/(\beta-2s)} (\log(1/\varepsilon))^{d+1}$ where $0<s\leq1$ is the fractional power to the Laplacian, and total computational complexity of $O(m^{3.5} \log(1/\varepsilon))$ and then (b) for Fokker-Planck equation, it is able to produce i.i.d.\ samples with error $\varepsilon$ in Wasserstein-1 distance, with a cost that is $O(d \varepsilon^{-2(d+1)/\beta-2} \log(1/\varepsilon)^{2d+3})$ per sample. This means that, if the probability associated with the SDE is somewhat regular, i.e. $\beta \geq 4d+2$, then the algorithm requires $O(\varepsilon^{-0.88} \log(1/\varepsilon)^{4.5d})$ in the preparatory phase, and $O(\varepsilon^{-1/2}\log(1/\varepsilon)^{2d+2})$ for each sample. Our results suggest that as the true solution gets smoother, we can circumvent the curse of dimensionality without requiring any sort of convexity.  ( 3 min )
    Does Sparsity Help in Learning Misspecified Linear Bandits?. (arXiv:2303.16998v1 [cs.LG])
    Recently, the study of linear misspecified bandits has generated intriguing implications of the hardness of learning in bandits and reinforcement learning (RL). In particular, Du et al. (2020) show that even if a learner is given linear features in $\mathbb{R}^d$ that approximate the rewards in a bandit or RL with a uniform error of $\varepsilon$, searching for an $O(\varepsilon)$-optimal action requires pulling at least $\Omega(\exp(d))$ queries. Furthermore, Lattimore et al. (2020) show that a degraded $O(\varepsilon\sqrt{d})$-optimal solution can be learned within $\operatorname{poly}(d/\varepsilon)$ queries. Yet it is unknown whether a structural assumption on the ground-truth parameter, such as sparsity, could break the $\varepsilon\sqrt{d}$ barrier. In this paper, we address this question by showing that algorithms can obtain $O(\varepsilon)$-optimal actions by querying $O(\varepsilon^{-s}d^s)$ actions, where $s$ is the sparsity parameter, removing the $\exp(d)$-dependence. We then establish information-theoretical lower bounds, i.e., $\Omega(\exp(s))$, to show that our upper bound on sample complexity is nearly tight if one demands an error $ O(s^{\delta}\varepsilon)$ for $0<\delta<1$. For $\delta\geq 1$, we further show that $\operatorname{poly}(s/\varepsilon)$ queries are possible when the linear features are "good" and even in general settings. These results provide a nearly complete picture of how sparsity can help in misspecified bandit learning and provide a deeper understanding of when linear features are "useful" for bandit and reinforcement learning with misspecification.  ( 2 min )
    Meta-Learning Parameterized First-Order Optimizers using Differentiable Convex Optimization. (arXiv:2303.16952v1 [cs.LG])
    Conventional optimization methods in machine learning and controls rely heavily on first-order update rules. Selecting the right method and hyperparameters for a particular task often involves trial-and-error or practitioner intuition, motivating the field of meta-learning. We generalize a broad family of preexisting update rules by proposing a meta-learning framework in which the inner loop optimization step involves solving a differentiable convex optimization (DCO). We illustrate the theoretical appeal of this approach by showing that it enables one-step optimization of a family of linear least squares problems, given that the meta-learner has sufficient exposure to similar tasks. Various instantiations of the DCO update rule are compared to conventional optimizers on a range of illustrative experimental settings.  ( 2 min )
    Ideal Abstractions for Decision-Focused Learning. (arXiv:2303.17062v1 [cs.LG])
    We present a methodology for formulating simplifying abstractions in machine learning systems by identifying and harnessing the utility structure of decisions. Machine learning tasks commonly involve high-dimensional output spaces (e.g., predictions for every pixel in an image or node in a graph), even though a coarser output would often suffice for downstream decision-making (e.g., regions of an image instead of pixels). Developers often hand-engineer abstractions of the output space, but numerous abstractions are possible and it is unclear how the choice of output space for a model impacts its usefulness in downstream decision-making. We propose a method that configures the output space automatically in order to minimize the loss of decision-relevant information. Taking a geometric perspective, we formulate a step of the algorithm as a projection of the probability simplex, termed fold, that minimizes the total loss of decision-related information in the H-entropy sense. Crucially, learning in the abstracted outcome space requires less data, leading to a net improvement in decision quality. We demonstrate the method in two domains: data acquisition for deep neural network training and a closed-loop wildfire management task.  ( 2 min )
    The G-invariant graph Laplacian. (arXiv:2303.17001v1 [cs.LG])
    Graph Laplacian based algorithms for data lying on a manifold have been proven effective for tasks such as dimensionality reduction, clustering, and denoising. In this work, we consider data sets whose data point not only lie on a manifold, but are also closed under the action of a continuous group. An example of such data set is volumes that line on a low dimensional manifold, where each volume may be rotated in three-dimensional space. We introduce the G-invariant graph Laplacian that generalizes the graph Laplacian by accounting for the action of the group on the data set. We show that like the standard graph Laplacian, the G-invariant graph Laplacian converges to the Laplace-Beltrami operator on the data manifold, but with a significantly improved convergence rate. Furthermore, we show that the eigenfunctions of the G-invariant graph Laplacian admit the form of tensor products between the group elements and eigenvectors of certain matrices, which can be computed efficiently using FFT-type algorithms. We demonstrate our construction and its advantages on the problem of filtering data on a noisy manifold closed under the action of the special unitary group SU(2).  ( 2 min )
    Mole Recruitment: Poisoning of Image Classifiers via Selective Batch Sampling. (arXiv:2303.17080v1 [cs.LG])
    In this work, we present a data poisoning attack that confounds machine learning models without any manipulation of the image or label. This is achieved by simply leveraging the most confounding natural samples found within the training data itself, in a new form of a targeted attack coined "Mole Recruitment." We define moles as the training samples of a class that appear most similar to samples of another class, and show that simply restructuring training batches with an optimal number of moles can lead to significant degradation in the performance of the targeted class. We show the efficacy of this novel attack in an offline setting across several standard image classification datasets, and demonstrate the real-world viability of this attack in a continual learning (CL) setting. Our analysis reveals that state-of-the-art models are susceptible to Mole Recruitment, thereby exposing a previously undetected vulnerability of image classifiers.  ( 2 min )
    Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams. (arXiv:2303.17003v1 [cs.CL])
    The present study aims to explore the capabilities of Language Models (LMs) in tackling high-stakes multiple-choice tests, represented here by the Exame Nacional do Ensino M\'edio (ENEM), a multidisciplinary entrance examination widely adopted by Brazilian universities. This exam poses challenging tasks for LMs, since its questions may span into multiple fields of knowledge, requiring understanding of information from diverse domains. For instance, a question may require comprehension of both statistics and biology to be solved. This work analyzed responses generated by GPT-3.5 and GPT-4 models for questions presented in the 2009-2017 exams, as well as for questions of the 2022 exam, which were made public after the training of the models was completed. Furthermore, different prompt strategies were tested, including the use of Chain-of-Thought (CoT) prompts to generate explanations for answers. On the 2022 edition, the best-performing model, GPT-4 with CoT, achieved an accuracy of 87%, largely surpassing GPT-3.5 by 11 points. The code and data used on experiments are available at https://github.com/piresramon/gpt-4-enem.  ( 2 min )
    Highly Accurate Quantum Chemical Property Prediction with Uni-Mol+. (arXiv:2303.16982v1 [physics.chem-ph])
    Recent developments in deep learning have made remarkable progress in speeding up the prediction of quantum chemical (QC) properties by removing the need for expensive electronic structure calculations like density functional theory. However, previous methods that relied on 1D SMILES sequences or 2D molecular graphs failed to achieve high accuracy as QC properties are primarily dependent on the 3D equilibrium conformations optimized by electronic structure methods. In this paper, we propose a novel approach called Uni-Mol+ to tackle this challenge. Firstly, given a 2D molecular graph, Uni-Mol+ generates an initial 3D conformation from inexpensive methods such as RDKit. Then, the initial conformation is iteratively optimized to its equilibrium conformation, and the optimized conformation is further used to predict the QC properties. All these steps are automatically learned using Transformer models. We observed the quality of the optimized conformation is crucial for QC property prediction performance. To effectively optimize conformation, we introduce a two-track Transformer model backbone in Uni-Mol+ and train it together with the QC property prediction task. We also design a novel training approach called linear trajectory injection to ensure proper supervision for the Uni-Mol+ learning process. Our extensive benchmarking results demonstrate that the proposed Uni-Mol+ significantly improves the accuracy of QC property prediction. We have made the code and model publicly available at \url{https://github.com/dptech-corp/Uni-Mol}.  ( 2 min )
    Sparse joint shift in multinomial classification. (arXiv:2303.16971v1 [stat.ML])
    Sparse joint shift (SJS) was recently proposed as a tractable model for general dataset shift which may cause changes to the marginal distributions of features and labels as well as the posterior probabilities and the class-conditional feature distributions. Fitting SJS for a target dataset without label observations may produce valid predictions of labels and estimates of class prior probabilities. We present new results on the transmission of SJS from sets of features to larger sets of features, a conditional correction formula for the class posterior probabilities under the target distribution, identifiability of SJS, and the relationship between SJS and covariate shift. In addition, we point out inconsistencies in the algorithms which were proposed for estimating the characteristics of SJS, as they could hamper the search for optimal solutions.  ( 2 min )
  • Open

    Packed-Ensembles for Efficient Uncertainty Estimation. (arXiv:2210.09184v2 [cs.LG] UPDATED)
    Deep Ensembles (DE) are a prominent approach for achieving excellent performance on key metrics such as accuracy, calibration, uncertainty estimation, and out-of-distribution detection. However, hardware limitations of real-world systems constrain to smaller ensembles and lower-capacity networks, significantly deteriorating their performance and properties. We introduce Packed-Ensembles (PE), a strategy to design and train lightweight structured ensembles by carefully modulating the dimension of their encoding space. We leverage grouped convolutions to parallelize the ensemble into a single shared backbone and forward pass to improve training and inference speeds. PE is designed to operate within the memory limits of a standard neural network. Our extensive research indicates that PE accurately preserves the properties of DE, such as diversity, and performs equally well in terms of accuracy, calibration, out-of-distribution detection, and robustness to distribution shift. We make our code available at https://github.com/ENSTA-U2IS/torch-uncertainty.
    Physics-informed Information Field Theory for Modeling Physical Systems with Uncertainty Quantification. (arXiv:2301.07609v2 [stat.ML] UPDATED)
    Data-driven approaches coupled with physical knowledge are powerful techniques to model systems. The goal of such models is to efficiently solve for the underlying field by combining measurements with known physical laws. As many systems contain unknown elements, such as missing parameters, noisy data, or incomplete physical laws, this is widely approached as an uncertainty quantification problem. The common techniques to handle all the variables typically depend on the numerical scheme used to approximate the posterior, and it is desirable to have a method which is independent of any such discretization. Information field theory (IFT) provides the tools necessary to perform statistics over fields that are not necessarily Gaussian. We extend IFT to physics-informed IFT (PIFT) by encoding the functional priors with information about the physical laws which describe the field. The posteriors derived from this PIFT remain independent of any numerical scheme and can capture multiple modes, allowing for the solution of problems which are ill-posed. We demonstrate our approach through an analytical example involving the Klein-Gordon equation. We then develop a variant of stochastic gradient Langevin dynamics to draw samples from the joint posterior over the field and model parameters. We apply our method to numerical examples with various degrees of model-form error and to inverse problems involving nonlinear differential equations. As an addendum, the method is equipped with a metric which allows the posterior to automatically quantify model-form uncertainty. Because of this, our numerical experiments show that the method remains robust to even an incorrect representation of the physics given sufficient data. We numerically demonstrate that the method correctly identifies when the physics cannot be trusted, in which case it automatically treats learning the field as a regression problem.
    Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under ManhattanWorld AssumptionWithout Ambiguity. (arXiv:2303.17166v1 [cs.CV])
    In orthogonal world coordinates, a Manhattan world lying along cuboid buildings is widely useful for various computer vision tasks. However, the Manhattan world has much room for improvement because the origin of pan angles from an image is arbitrary, that is, four-fold rotational symmetric ambiguity of pan angles. To address this problem, we propose a definition for the pan-angle origin based on the directions of the roads with respect to a camera and the direction of travel. We propose a learning-based calibration method that uses heatmap regression to remove the ambiguity by each direction of labeled image coordinates, similar to pose estimation keypoints. Simultaneously, our two-branched network recovers the rotation and removes fisheye distortion from a general scene image. To alleviate the lack of vanishing points in images, we introduce auxiliary diagonal points that have the optimal 3D arrangement of spatial uniformity. Extensive experiments demonstrated that our method outperforms conventional methods on large-scale datasets and with off-the-shelf cameras.  ( 2 min )
    The Graphical Nadaraya-Watson Estimator on Latent Position Models. (arXiv:2303.17229v1 [stat.ML])
    Given a graph with a subset of labeled nodes, we are interested in the quality of the averaging estimator which for an unlabeled node predicts the average of the observations of its labeled neighbours. We rigorously study concentration properties, variance bounds and risk bounds in this context. While the estimator itself is very simple and the data generating process is too idealistic for practical applications, we believe that our small steps will contribute towards the theoretical understanding of more sophisticated methods such as Graph Neural Networks.  ( 2 min )
    Validation of uncertainty quantification metrics: a primer based on the consistency and adaptivity concepts. (arXiv:2303.07170v2 [physics.chem-ph] UPDATED)
    The practice of uncertainty quantification (UQ) validation, notably in machine learning for the physico-chemical sciences, rests on several graphical methods (scattering plots, calibration curves, reliability diagrams and confidence curves) which explore complementary aspects of calibration, without covering all the desirable ones. For instance, none of these methods deals with the reliability of UQ metrics across the range of input features (adaptivity). Based on the complementary concepts of consistency and adaptivity, the toolbox of common validation methods for variance- and intervals- based UQ metrics is revisited with the aim to provide a better grasp on their capabilities. This study is conceived as an introduction to UQ validation, and all methods are derived from a few basic rules. The methods are illustrated and tested on synthetic datasets and representative examples extracted from the recent physico-chemical machine learning UQ literature.  ( 2 min )
    Random Manifold Sampling and Joint Sparse Regularization for Multi-label Feature Selection. (arXiv:2204.06445v3 [stat.ML] UPDATED)
    Multi-label learning is usually used to mine the correlation between features and labels, and feature selection can retain as much information as possible through a small number of features. $\ell_{2,1}$ regularization method can get sparse coefficient matrix, but it can not solve multicollinearity problem effectively. The model proposed in this paper can obtain the most relevant few features by solving the joint constrained optimization problems of $\ell_{2,1}$ and $\ell_{F}$ regularization.In manifold regularization, we implement random walk strategy based on joint information matrix, and get a highly robust neighborhood graph.In addition, we given the algorithm for solving the model and proved its convergence.Comparative experiments on real-world data sets show that the proposed method outperforms other methods.  ( 2 min )
    Out-of-sample error estimate for robust M-estimators with convex penalty. (arXiv:2008.11840v5 [math.ST] UPDATED)
    A generic out-of-sample error estimate is proposed for robust $M$-estimators regularized with a convex penalty in high-dimensional linear regression where $(X,y)$ is observed and $p,n$ are of the same order. If $\psi$ is the derivative of the robust data-fitting loss $\rho$, the estimate depends on the observed data only through the quantities $\hat\psi = \psi(y-X\hat\beta)$, $X^\top \hat\psi$ and the derivatives $(\partial/\partial y) \hat\psi$ and $(\partial/\partial y) X\hat\beta$ for fixed $X$. The out-of-sample error estimate enjoys a relative error of order $n^{-1/2}$ in a linear model with Gaussian covariates and independent noise, either non-asymptotically when $p/n\le \gamma$ or asymptotically in the high-dimensional asymptotic regime $p/n\to\gamma'\in(0,\infty)$. General differentiable loss functions $\rho$ are allowed provided that $\psi=\rho'$ is 1-Lipschitz. The validity of the out-of-sample error estimate holds either under a strong convexity assumption, or for the $\ell_1$-penalized Huber M-estimator if the number of corrupted observations and sparsity of the true $\beta$ are bounded from above by $s_*n$ for some small enough constant $s_*\in(0,1)$ independent of $n,p$. For the square loss and in the absence of corruption in the response, the results additionally yield $n^{-1/2}$-consistent estimates of the noise variance and of the generalization error. This generalizes, to arbitrary convex penalty, estimates that were previously known for the Lasso.  ( 3 min )
    Fast inference of latent space dynamics in huge relational event networks. (arXiv:2303.17460v1 [cs.SI])
    Relational events are a type of social interactions, that sometimes are referred to as dynamic networks. Its dynamics typically depends on emerging patterns, so-called endogenous variables, or external forces, referred to as exogenous variables. Comprehensive information on the actors in the network, especially for huge networks, is rare, however. A latent space approach in network analysis has been a popular way to account for unmeasured covariates that are driving network configurations. Bayesian and EM-type algorithms have been proposed for inferring the latent space, but both the sheer size many social network applications as well as the dynamic nature of the process, and therefore the latent space, make computations prohibitively expensive. In this work we propose a likelihood-based algorithm that can deal with huge relational event networks. We propose a hierarchical strategy for inferring network community dynamics embedded into an interpretable latent space. Node dynamics are described by smooth spline processes. To make the framework feasible for large networks we borrow from machine learning optimization methodology. Model-based clustering is carried out via a convex clustering penalization, encouraging shared trajectories for ease of interpretation. We propose a model-based approach for separating macro-microstructures and perform a hierarchical analysis within successive hierarchies. The method can fit millions of nodes on a public Colab GPU in a few minutes. The code and a tutorial are available in a Github repository.  ( 3 min )
    Optimal Experimental Design for Staggered Rollouts. (arXiv:1911.03764v5 [econ.EM] UPDATED)
    In this paper, we study the design and analysis of experiments conducted on a set of units over multiple time periods where the starting time of the treatment may vary by unit. The design problem involves selecting an initial treatment time for each unit in order to most precisely estimate both the instantaneous and cumulative effects of the treatment. We first consider non-adaptive experiments, where all treatment assignment decisions are made prior to the start of the experiment. For this case, we show that the optimization problem is generally NP-hard, and we propose a near-optimal solution. Under this solution, the fraction entering treatment each period is initially low, then high, and finally low again. Next, we study an adaptive experimental design problem, where both the decision to continue the experiment and treatment assignment decisions are updated after each period's data is collected. For the adaptive case, we propose a new algorithm, the Precision-Guided Adaptive Experiment (PGAE) algorithm, that addresses the challenges at both the design stage and at the stage of estimating treatment effects, ensuring valid post-experiment inference accounting for the adaptive nature of the design. Using realistic settings, we demonstrate that our proposed solutions can reduce the opportunity cost of the experiments by over 50%, compared to static design benchmarks.  ( 3 min )
    Training Neural Networks is NP-Hard in Fixed Dimension. (arXiv:2303.17045v1 [cs.CC])
    We study the parameterized complexity of training two-layer neural networks with respect to the dimension of the input data and the number of hidden neurons, considering ReLU and linear threshold activation functions. Albeit the computational complexity of these problems has been studied numerous times in recent years, several questions are still open. We answer questions by Arora et al. [ICLR '18] and Khalife and Basu [IPCO '22] showing that both problems are NP-hard for two dimensions, which excludes any polynomial-time algorithm for constant dimension. We also answer a question by Froese et al. [JAIR '22] proving W[1]-hardness for four ReLUs (or two linear threshold neurons) with zero training error. Finally, in the ReLU case, we show fixed-parameter tractability for the combined parameter number of dimensions and number of ReLUs if the network is assumed to compute a convex map. Our results settle the complexity status regarding these parameters almost completely.  ( 2 min )
    Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers. (arXiv:2107.13163v3 [cs.LG] UPDATED)
    A common lens to theoretically study neural net architectures is to analyze the functions they can approximate. However, constructions from approximation theory may be unrealistic and therefore less meaningful. For example, a common unrealistic trick is to encode target function values using infinite precision. To address these issues, this work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: boolean circuits and Turing machines. We show that overparameterized feedforward neural nets can SM approximate boolean circuits with sample complexity depending only polynomially on the circuit size, not the size of the network. In addition, we show that transformers can SM approximate Turing machines with computation time bounded by $T$ with sample complexity polynomial in the alphabet size, state space size, and $\log (T)$. We also introduce new tools for analyzing generalization which provide much tighter sample complexities than the typical VC-dimension or norm-based bounds, which may be of independent interest.  ( 3 min )
    Variational Wasserstein Barycenters for Geometric Clustering. (arXiv:2002.10543v2 [cs.LG] UPDATED)
    We propose to compute Wasserstein barycenters (WBs) by solving for Monge maps with variational principle. We discuss the metric properties of WBs and explore their connections, especially the connections of Monge WBs, to K-means clustering and co-clustering. We also discuss the feasibility of Monge WBs on unbalanced measures and spherical domains. We propose two new problems -- regularized K-means and Wasserstein barycenter compression. We demonstrate the use of VWBs in solving these clustering-related problems.  ( 2 min )
    Efficient Sampling of Stochastic Differential Equations with Positive Semi-Definite Models. (arXiv:2303.17109v1 [stat.ML])
    This paper deals with the problem of efficient sampling from a stochastic differential equation, given the drift function and the diffusion matrix. The proposed approach leverages a recent model for probabilities \citep{rudi2021psd} (the positive semi-definite -- PSD model) from which it is possible to obtain independent and identically distributed (i.i.d.) samples at precision $\varepsilon$ with a cost that is $m^2 d \log(1/\varepsilon)$ where $m$ is the dimension of the model, $d$ the dimension of the space. The proposed approach consists in: first, computing the PSD model that satisfies the Fokker-Planck equation (or its fractional variant) associated with the SDE, up to error $\varepsilon$, and then sampling from the resulting PSD model. Assuming some regularity of the Fokker-Planck solution (i.e. $\beta$-times differentiability plus some geometric condition on its zeros) We obtain an algorithm that: (a) in the preparatory phase obtains a PSD model with L2 distance $\varepsilon$ from the solution of the equation, with a model of dimension $m = \varepsilon^{-(d+1)/(\beta-2s)} (\log(1/\varepsilon))^{d+1}$ where $0<s\leq1$ is the fractional power to the Laplacian, and total computational complexity of $O(m^{3.5} \log(1/\varepsilon))$ and then (b) for Fokker-Planck equation, it is able to produce i.i.d.\ samples with error $\varepsilon$ in Wasserstein-1 distance, with a cost that is $O(d \varepsilon^{-2(d+1)/\beta-2} \log(1/\varepsilon)^{2d+3})$ per sample. This means that, if the probability associated with the SDE is somewhat regular, i.e. $\beta \geq 4d+2$, then the algorithm requires $O(\varepsilon^{-0.88} \log(1/\varepsilon)^{4.5d})$ in the preparatory phase, and $O(\varepsilon^{-1/2}\log(1/\varepsilon)^{2d+2})$ for each sample. Our results suggest that as the true solution gets smoother, we can circumvent the curse of dimensionality without requiring any sort of convexity.  ( 3 min )
    Clustered Graph Matching for Label Recovery and Graph Classification. (arXiv:2205.03486v2 [stat.ML] UPDATED)
    Given a collection of vertex-aligned networks and an additional label-shuffled network, we propose procedures for leveraging the signal in the vertex-aligned collection to recover the labels of the shuffled network. We consider matching the shuffled network to averages of the networks in the vertex-aligned collection at different levels of granularity. We demonstrate both in theory and practice that if the graphs come from different network classes, then clustering the networks into classes followed by matching the new graph to cluster-averages can yield higher fidelity matching performance than matching to the global average graph. Moreover, by minimizing the graph matching objective function with respect to each cluster average, this approach simultaneously classifies and recovers the vertex labels for the shuffled graph. These theoretical developments are further reinforced via an illuminating real data experiment matching human connectomes.  ( 2 min )
    Approximation bounds for norm constrained neural networks with applications to regression and GANs. (arXiv:2201.09418v3 [cs.LG] UPDATED)
    This paper studies the approximation capacity of ReLU neural networks with norm constraint on the weights. We prove upper and lower bounds on the approximation error of these networks for smooth function classes. The lower bound is derived through the Rademacher complexity of neural networks, which may be of independent interest. We apply these approximation bounds to analyze the convergences of regression using norm constrained neural networks and distribution estimation by GANs. In particular, we obtain convergence rates for over-parameterized neural networks. It is also shown that GANs can achieve optimal rate of learning probability distributions, when the discriminator is a properly chosen norm constrained neural network.  ( 2 min )
    Sublinear Convergence Rates of Extragradient-Type Methods: A Survey on Classical and Recent Developments. (arXiv:2303.17192v1 [math.OC])
    The extragradient (EG), introduced by G. M. Korpelevich in 1976, is a well-known method to approximate solutions of saddle-point problems and their extensions such as variational inequalities and monotone inclusions. Over the years, numerous variants of EG have been proposed and studied in the literature. Recently, these methods have gained popularity due to new applications in machine learning and robust optimization. In this work, we survey the latest developments in the EG method and its variants for approximating solutions of nonlinear equations and inclusions, with a focus on the monotonicity and co-hypomonotonicity settings. We provide a unified convergence analysis for different classes of algorithms, with an emphasis on sublinear best-iterate and last-iterate convergence rates. We also discuss recent accelerated variants of EG based on both Halpern fixed-point iteration and Nesterov's accelerated techniques. Our approach uses simple arguments and basic mathematical tools to make the proofs as elementary as possible, while maintaining generality to cover a broad range of problems.  ( 2 min )
    Are Neural Architecture Search Benchmarks Well Designed? A Deeper Look Into Operation Importance. (arXiv:2303.16938v1 [cs.LG])
    Neural Architecture Search (NAS) benchmarks significantly improved the capability of developing and comparing NAS methods while at the same time drastically reduced the computational overhead by providing meta-information about thousands of trained neural networks. However, tabular benchmarks have several drawbacks that can hinder fair comparisons and provide unreliable results. These usually focus on providing a small pool of operations in heavily constrained search spaces -- usually cell-based neural networks with pre-defined outer-skeletons. In this work, we conducted an empirical analysis of the widely used NAS-Bench-101, NAS-Bench-201 and TransNAS-Bench-101 benchmarks in terms of their generability and how different operations influence the performance of the generated architectures. We found that only a subset of the operation pool is required to generate architectures close to the upper-bound of the performance range. Also, the performance distribution is negatively skewed, having a higher density of architectures in the upper-bound range. We consistently found convolution layers to have the highest impact on the architecture's performance, and that specific combination of operations favors top-scoring architectures. These findings shed insights on the correct evaluation and comparison of NAS methods using NAS benchmarks, showing that directly searching on NAS-Bench-201, ImageNet16-120 and TransNAS-Bench-101 produces more reliable results than searching only on CIFAR-10. Furthermore, with this work we provide suggestions for future benchmark evaluations and design. The code used to conduct the evaluations is available at https://github.com/VascoLopes/NAS-Benchmark-Evaluation.  ( 3 min )
    Leveraging joint sparsity in hierarchical Bayesian learning. (arXiv:2303.16954v1 [stat.ML])
    We present a hierarchical Bayesian learning approach to infer jointly sparse parameter vectors from multiple measurement vectors. Our model uses separate conditionally Gaussian priors for each parameter vector and common gamma-distributed hyper-parameters to enforce joint sparsity. The resulting joint-sparsity-promoting priors are combined with existing Bayesian inference methods to generate a new family of algorithms. Our numerical experiments, which include a multi-coil magnetic resonance imaging application, demonstrate that our new approach consistently outperforms commonly used hierarchical Bayesian methods.  ( 2 min )
    Sparse joint shift in multinomial classification. (arXiv:2303.16971v1 [stat.ML])
    Sparse joint shift (SJS) was recently proposed as a tractable model for general dataset shift which may cause changes to the marginal distributions of features and labels as well as the posterior probabilities and the class-conditional feature distributions. Fitting SJS for a target dataset without label observations may produce valid predictions of labels and estimates of class prior probabilities. We present new results on the transmission of SJS from sets of features to larger sets of features, a conditional correction formula for the class posterior probabilities under the target distribution, identifiability of SJS, and the relationship between SJS and covariate shift. In addition, we point out inconsistencies in the algorithms which were proposed for estimating the characteristics of SJS, as they could hamper the search for optimal solutions.  ( 2 min )
    Contextual Combinatorial Bandits with Probabilistically Triggered Arms. (arXiv:2303.17110v1 [cs.LG])
    We study contextual combinatorial bandits with probabilistically triggered arms (C$^2$MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC$^2$-UCB and derive a regret bound $\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a valuable by-product, we find our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C$^2$MAB~setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.  ( 2 min )
    Efficient distributed representations beyond negative sampling. (arXiv:2303.17475v1 [cs.LG])
    This article describes an efficient method to learn distributed representations, also known as embeddings. This is accomplished minimizing an objective function similar to the one introduced in the Word2Vec algorithm and later adopted in several works. The optimization computational bottleneck is the calculation of the softmax normalization constants for which a number of operations scaling quadratically with the sample size is required. This complexity is unsuited for large datasets and negative sampling is a popular workaround, allowing one to obtain distributed representations in linear time with respect to the sample size. Negative sampling consists, however, in a change of the loss function and hence solves a different optimization problem from the one originally proposed. Our contribution is to show that the sotfmax normalization constants can be estimated in linear time, allowing us to design an efficient optimization strategy to learn distributed representations. We test our approximation on two popular applications related to word and node embeddings. The results evidence competing performance in terms of accuracy with respect to negative sampling with a remarkably lower computational time.  ( 2 min )
    Federated Stochastic Bandit Learning with Unobserved Context. (arXiv:2303.17043v1 [cs.LG])
    We study the problem of federated stochastic multi-arm contextual bandits with unknown contexts, in which M agents are faced with different bandits and collaborate to learn. The communication model consists of a central server and the agents share their estimates with the central server periodically to learn to choose optimal actions in order to minimize the total regret. We assume that the exact contexts are not observable and the agents observe only a distribution of the contexts. Such a situation arises, for instance, when the context itself is a noisy measurement or based on a prediction mechanism. Our goal is to develop a distributed and federated algorithm that facilitates collaborative learning among the agents to select a sequence of optimal actions so as to maximize the cumulative reward. By performing a feature vector transformation, we propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions. Finally, we validated the performance of our algorithm and compared it with another baseline approach using numerical simulations on synthetic data and on the real-world movielens dataset.  ( 2 min )

  • Open

    I am an expert theologian. With ChatGPT expressing theory of mind, we can better understand the relationship between the physical and metaphysical:
    submitted by /u/azlef900 [link] [comments]  ( 42 min )
    Why worry about AI taking over your job? Tech illiterate and "perpetually busy" people will always be a source of income.
    I run a small web hosting company. I make money through monthly subscriptions to host and maintain my clients' websites. Do you know how much it costs me to run 25 out of the 100s of websites I administrate? Just $20/month of shared hosting. And I charge close to $100/month for each website. That's like turning $1 into $100. These are simple Wordpress websites There's tons of FREE resources out there. Articles, videos, PDFs, podcasts, etc. that show you how to buy a domain, setup nameservers, and start building your own website for your business. But all of my clients either don't have the TIME or they just don't have the KNOWLEDGE. You could argue that as soon as AI starts bridging the gap between user friendliness and mass adoption, I might lose the "tech illiterate" clients because they could just make their own websites. But still, there will always be people who do not have the time like lawyers, doctors, engineers, accountants, etc. In fact, I like having these people as clients because they never call my customer service line since they're always so busy. And even when it's easy, there's also a third type of client. The Lazy These are people who are too lazy to set up their own website. The task might seem too daunting for them so they never try. Or they might be chronic procrastinators and never get around to it. They might start watching a Youtube video and go, "nope. Fuck this. I'll just pay someone else to do it," which is where I come in. There will always be money to be made submitted by /u/Abad_Cal_Gawayam [link] [comments]  ( 44 min )
    AI has Humor?
    I asked Perplexity in German whether it can speak german too. Its answer was yes. So far so good. But the sites it linked me made it seem pretty passive aggressive... Here are 3 of the headlines: I’M SORRY, I DIDN’T UNDERSTAND YOU How to politely indicate that you only speak English and would like to continue in it? 11 Better Ways To Say “No Worries” In Professional Emails ​ https://preview.redd.it/nvffwz37ayqa1.png?width=807&format=png&auto=webp&s=d29780ac7814ee6fc3e4ef01f9688f3cd4202cc0 submitted by /u/kabserrr [link] [comments]  ( 42 min )
    Can We Predict Future Appearance Using AI?
    submitted by /u/AviatorPrints [link] [comments]  ( 43 min )
    IF AI start stealing programming jobs it's my boss that should be worried not me
    Close your eyes and think about it. Imagine it step by step. If we become 10X more productive and my boss fires 9 out of 10. what happens next ?? Of course my boss will not know which dev is the best just as they don't know when they give promotions( i'm not saying its me , the guy is not so social ) I will team up with best 3 of those 9 fired people, we are 4 hungry devs vs one spoiled big mouth sooner than you know I will be serious competition for my boss, only stealing 30 % of their business will make me very happy. and them very angry. they will start hiring again to make cool features and keep up with me. submitted by /u/generatedcode [link] [comments]  ( 43 min )
    How long until we can do this with ai.
    “Please quickly make and download an app that is similar to Reddit is fun but on iOS.”? submitted by /u/endrid [link] [comments]  ( 6 min )
    Multi-label NLP: An Analysis of Class Imbalance and Loss Function Approaches
    submitted by /u/DarronFeldstein [link] [comments]  ( 42 min )
    Artificial Intelligence: UNESCO calls on all Governments to implement Global Ethical Framework without delay
    submitted by /u/Linkology [link] [comments]  ( 42 min )
    Can Eliezer Yudkowsky even computer science?
    I keep seeing this guy come up in the news but every time I see his writings, I see him arguing with conceptual knowledge but does not demonstrate application. He seems to substitute mathematical and practical knowledge(something which would help prove his points on algorithms) with big words, graphs, and pretentious behavior. A lot of articles on MIRI(something he co-founded) talk about how smart he . His writings are more philosophy than CS and/or data science but I think if you pan yourself to be THE expert on houses, you should have built houses and maybe be able to write a fucking quick sort. So please someone link me something with him demonstrating calc1 and prove me wrong. I can't get through a highschool dropout thinking they know everything and everyone else is a moron. submitted by /u/puppetmasteria [link] [comments]  ( 7 min )
    Is There Any Tool To Change Your Voice? (Not An AI Voice, but manipulating a voice that’s already there to make it unrecognizable but still sound good)
    Any good voice changers out there? submitted by /u/AnakinRagnarsson66 [link] [comments]  ( 42 min )
    [LAION launches a petition to democratize AI research by establishing an international, publicly funded supercomputing facility equipped with 100,000 state-of-the-art AI accelerators to train open source foundation models.
    submitted by /u/acutelychronicpanic [link] [comments]  ( 43 min )
    Should I get a bachelor's in Computer Science or Artificial intelligence?
    I live in Germany and want to work in AI research in the future. There are some options where you can study artificial intelligence as a bachelor, but is that better than getting a degree in cs first and get into AI in the master? submitted by /u/2Tryhard4You [link] [comments]  ( 44 min )
    Best resource for Ai voice cloning?
    Looking for some different options for AI cloning, preferably something that is cheap or free. What are the best options out there? also I want to be able to use it on youtube submitted by /u/Knowledge-Is-The-Key [link] [comments]  ( 43 min )
    After Bard, Google unveils generative AI tool for Cloud developers
    submitted by /u/LickTempo [link] [comments]  ( 44 min )
    Which tool(s) can do ....
    I was simply looking to take one video and seperate image(s) of a t shirt and have the t shirt edited onto specific individuals in the original video. submitted by /u/usernamech00ser [link] [comments]  ( 42 min )
    Train ChatGPT generate unlimited prompts for you. Prompt: You are GPT-4, OpenAI's advanced language model. Today, your job is to generate prompts for GPT-4. Can you generate the best prompts on ways to
    submitted by /u/friuns [link] [comments]  ( 44 min )
    May God have mercy on our souls
    submitted by /u/thisguywonders [link] [comments]  ( 6 min )
    Ararat in an alternative universe
    [generated via AI] submitted by /u/Aram0070 [link] [comments]  ( 42 min )
    A Rebuttal to the Call for a Six-Month Pause on AI Development: Stifling Progress is Not the Solution (GPT 4)
    In a recent CBS News article, Michael Roppolo reported on an open letter signed by Elon Musk, Steve Wozniak, Andrew Yang, and over a thousand others, which calls for a six-month pause on AI development to address "profound risks to society and humanity." While the concerns raised by the signatories are valid, putting the brakes on AI development is not the most effective solution to the challenges we face. First, it is essential to acknowledge the rapid advancements in AI, as exemplified by OpenAI's GPT-4. However, the development of powerful AI technologies has also resulted in substantial benefits across various industries, such as healthcare, transportation, and agriculture. By calling for a blanket pause on AI development, we risk stifling the progress that could lead to life-saving b…  ( 52 min )
    I asked an AI program to write a script about Batman being a patient at Arkham Asylum, and it gave me a really good story.
    Title: Batman's Stay at Arkham Asylum Fade in: INT. ARKHAM ASYLUM - NIGHT A thunderstorm rages outside as the gates of Arkham Asylum open to reveal the imposing gothic facade. The camera moves in to see a gloomy cell where a figure is sitting on the bed with his head bowed down. The figure is none other than Batman. CUT TO: INT. ARKHAM ASYLUM – DAY Batman is escorted to a room where a doctor is waiting for him. Batman is shivering and sweating. The doctor, Dr. Crane, hands him a glass of water. DR. CRANE How are you feeling today, Batman? BATMAN (sarcastically) Oh, I’m just great. Insane, but great. Dr. Crane smiles. DR. CRANE Tell me, what brings you here, Batman? BATMAN (angrily) I don't belong here, Doctor. I'm not crazy. DR. CRANE We all have different definitions of what's crazy, Batman. CUT TO: INT. ARKHAM ASYLUM – NIGHT Batman is tossing and turning in his bed, unable to shake off the nightmares of his past. Suddenly, he hears a noise and sits up. A shadowy figure appears at the foot of his bed. It's The Joker. THE JOKER (smiling) Well, well, well. Look who we have here! BATMAN (angrily) What do you want, Joker? THE JOKER Oh, just came to say hi to my old friend. You're looking worse for wear, Bats. BATMAN (gritting his teeth) You put me here, Joker. You're the reason I'm in this place. THE JOKER (laughs) Come on, Bats. You know you want to go crazy like the rest of us. Batman lunges at The Joker, but the guards come in and subdue him. CUT TO: INT. ARKHAM ASYLUM – DAY Batman is sitting in front of Dr. Crane once again. DR. CRANE (looking at his notes) Your behavior has been erratic, Batman. You seem to be struggling with your past traumas. BATMAN (defensive) I can handle my own problems, Doctor. DR. CRANE But can you, Batman? You're here because you attacked The Joker. You're not his judge or jury. That's the law's job. Batman looks at Dr. Crane, realizing the truth in his words. FADE OUT. submitted by /u/Lady-Lilith289 [link] [comments]  ( 8 min )
  • Open

    . RFdiffusion: Diffusion model for generating protein backbones
    submitted by /u/nickb [link] [comments]  ( 41 min )
    Universal Speech Model - Towards Automatic Speech Recognition for All
    submitted by /u/nickb [link] [comments]  ( 41 min )
  • Open

    [P] Introducing Vicuna: An open-source language model based on LLaMA 13B
    We introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B achieves more than 90%* quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca in more than 90%* of cases. The cost of training Vicuna-13B is around $300. The training and serving code, along with an online demo, are publicly available for non-commercial use. Training details Vicuna is created by fine-tuning a LLaMA base model using approximately 70K user-shared conversations gathered from ShareGPT.com with public APIs. To ensure data quality, we convert the HTML back to markdown and filter out some inappropriate or low-quality samples. Additional…  ( 48 min )
    [P] Reinforcement Learning: Defining an Action Space
    I have a somewhat atypical RL setting, and I am thinking about how to define the action space, with the intention of trying which representation works best, as well as applying different RL algorithms. I will explain my thoughts and would be thankful for your feedback. The environment essentially consists of items that can be consumed by the agent. The effect of the consumption determines the reward (+1 good, 0 neutral). The amount of existing items is infinite. Each item is represented by an feature vector a, with its elements consisting of real numbers in [-1, 1]. The state is, among other information, a list of consumed items in the order of consumption. Each state can also be represented with a feature vector s. Naturally, there are various approaches to constructing s, but that is a…  ( 9 min )
    [D] What are your top 3 pain points as an ML developer in 2023?
    As an ML developer, what are the top 3 pain points that you would love to see solved. This could be something very specific (Example: I spend a lot of time in building a model architecture using Keras) or something broad (Example: I struggle with Model Serving because of the lack of easy to use infrastructure). Also, please mention how you workaround these paint points. Looking forward to the discussion. Thank you! submitted by /u/General-Wing-785 [link] [comments]  ( 43 min )
    [D] Dataset to figuring out LLM's zero shot memorization capabilities?
    What’s the best dataset to evaluate a foundation model’s capabilities on non-multiple choice trivia memorization? I’m looking for something like Q. When did Obama first become president? A: 2009 Q. What’s the capital of India? A: New Delhi I’ve looked at TriviaQA, MMLU, etc. However, they are multiple choice (or) have reading comprehension, and not always one phrase answer. I’m mostly looking at evaluating memorization capabilities without any external datasources. submitted by /u/Economy-Pipe-6184 [link] [comments]  ( 43 min )
    [R] TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs - Yaobo Liang et al Microsoft 2023
    Paper: https://arxiv.org/abs/2303.16434 Abstract: Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still face difficulties with some specialized tasks because they lack enough domain specific data during pre-training or they often have errors in their neural network computations on those tasks that need accurate executions. On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain …  ( 45 min )
    [D] AI Explainability and Alignment through Natural Language Internal Interfaces
    Has there been much recent literature on leveraging natural language internal interfaces in AI systems, or what are your thoughts in this space? Perhaps not yet implementation, but at least discussion on how it pertains to consciousness and AI alignment? Examples of potential implementations: LLM with a final gate that chooses to output the next word to the external reader or to an internal stream. LLM output is a function of both the internal stream and the external prompt. Collection of LLMs that engage in a dialogue based on external prompt. External output is a function of a final LLM that parses this dialogue + the external prompt. Loss function includes a critic that evaluates the grammatical structure of a hidden state (after passed through a decoder) at at specific points i…  ( 7 min )
    [D] Turns out, Othello-GPT does have a world model.
    There is this research post by Neel Nanda which states that Othello-GPT has a linear emergent world representation. What does it mean (as I am mostly a novice) and what do you all think about it? link :- https://www.neelnanda.io/mechanistic-interpretability/othello submitted by /u/Desi___Gigachad [link] [comments]  ( 6 min )
    [D] Directed Graph-based Machine Learning Pipeline tool?
    I recall seeing a tool where you could visualize a ML pipeline as a graph, but I'm at a loss for finding it again. Do such tools really exist? submitted by /u/Driiper [link] [comments]  ( 43 min )
    [R] AdaSubS: An Adaptive RL Algorithm (top 5% oral ICLR2023)
    We are pleased to announce the publication of our research paper on AdaSubS, an adaptive RL algorithm that ranked in the top 5% at ICLR2023. Paper: [2206.00702] Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search (arxiv.org) Code: AdaptiveSubgoalSearch/adaptive_subs: Adaptive Subgoal Search (github.com) Colab (demo): AdaSubS - demo - Colaboratory (google.com) AdaSubS is based on a subgoal search and adjusts its planning horizon according to the complexity of the problem, making it efficient and precise. It performs well in domains that require combinatorial reasoning, such as proving mathematical inequalities. ​ https://preview.redd.it/3jcwgwc53wqa1.png?width=1612&format=png&auto=webp&s=27394f4c0586f21827a71d83029b8a6303865697 submitted by /u/yyohama [link] [comments]  ( 43 min )
    [D] AI Policy Group CAIDP Asks FTC To Stop OpenAI From Launching New GPT Models
    The Center for AI and Digital Policy (CAIDP), a tech ethics group, has asked the Federal Trade Commission to investigate OpenAI for violating consumer protection rules. CAIDP claims that OpenAI's AI text generation tools have been "biased, deceptive, and a risk to public safety." CAIDP's complaint raises concerns about potential threats from OpenAI's GPT-4 generative text model, which was announced in mid-March. It warns of the potential for GPT-4 to produce malicious code and highly tailored propaganda and the risk that biased training data could result in baked-in stereotypes or unfair race and gender preferences in hiring. The complaint also mentions significant privacy failures with OpenAI's product interface, such as a recent bug that exposed OpenAI ChatGPT histories and possibly payment details of ChatGPT plus subscribers. CAIDP seeks to hold OpenAI accountable for violating Section 5 of the FTC Act, which prohibits unfair and deceptive trade practices. The complaint claims that OpenAI knowingly released GPT-4 to the public for commercial use despite the risks, including potential bias and harmful behavior. Source | Case| PDF submitted by /u/vadhavaniyafaijan [link] [comments]  ( 55 min )
    [D] Best deal with varying number of inputs each with variable size using and RNN? (for an NLP task)
    Hollo, I am tying to do personality trait prediction using Facebook posts and I'm currently facing an issue, as I have multiple users each with a different number of posts and each post has different lenght as well. I am using BERT to get embeddings for each word in the post and using multiple other feature extraction methods to get additional features per post (sentiment analysis, TF-IDF, etc). So the prediction would be made for each user, the input having the size of the number of posts and each of those would be comprised of N embeddings (N = number of words in a post) as well as the additional features. The issue I'm facing is that I don't know how to design my prediction model in order to deal with these 2 varying inputs sizes. If I had variable number of inputs of the same size I could simply use an RNN, but here that doesn't work since the number of words per post varies. What architecture could I use for this? I considered using an RNN to process the word embeddings and TF-IDF scores (the features which size varies) to process into a fixed size output which would be combined with the other features and inserted into a second RNN to predict the personality scores. Another option that I consideres is simply padding the input but I don't know if this will decrease the accuracy in a significant manner. submitted by /u/danilo62 [link] [comments]  ( 45 min )
    [R] Interview for degree project ML/GD/AI
    I'm currently writing my bachelor thesis. My group and I are looking for candidates that could participate in an interview about GD/AI. If anyone here has experience working with Generative Design/AI systems within a company or if you have evaluated working with it but decided not to, we would love to have you participate. submitted by /u/Personal-Resolve8632 [link] [comments]  ( 7 min )
    [R] [P] Alzheimer's Disease Identification and Classification: Should you use .nii or .jpg format for ADNI Dataset?
    Hey everyone, I'm currently working on Alzheimer's Disease identification and classification using the ADNI (Alzheimer's Disease Neuroimaging Initiative) dataset. While working with the dataset, I faced a question: should I use the .nii format or convert it to .jpg? submitted by /u/Yaciine826 [link] [comments]  ( 43 min )
    [P] Quick and easy assessment of table dataset predictability
    How easy? As easy as: python usap_csv_eval.py data/credit-approval.csv If your dataset is in csv format you can use this tool to get an initial indication of how predictable a target feature is. No need to sort attributes, look for missing cells etc. The tool uses "deodel" as a robust mixed attribute classifier. Get more details at: csv_dataset_eval.ipynb submitted by /u/zx2zx [link] [comments]  ( 44 min )
    [N] Object detection with YOLO and extreme clicking in a semi automatic combination
    Extreme clicking is a powerful tool that can save you time on machine learning tasks.To test further optimizations, I combined this tool with a tiny YOLO v4. The results show that the semi-automatic method can save much more time. The method was tested on a dataset from Open Image. There were 443 images containing 689 objects in five different classes. The results show that compared to a manual approach using bounding boxes, 56% of time can be saved. For 689 objects, the needed time was reduced from 191 to 84 minutes. At the same time, the data quality achieved by semi-automatic extreme clicking reached a mAP of 76%. Furthermore, 579 of the annotated objects achieved an IOU of more than 70%. For more results and a step-by-step guide to use a semi-automatic annotation tool follow the link to medium blog. https://preview.redd.it/8m07whlmguqa1.png?width=968&format=png&auto=webp&s=92468c728fcb7703ea1211e94d465e2e627fc292 submitted by /u/FriendlyHelper0707 [link] [comments]  ( 44 min )
    [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
    submitted by /u/floppy_llama [link] [comments]  ( 43 min )
  • Open

    Defining an Action Space
    I have a somewhat atypical RL setting, and I am thinking about how to define the action space, with the intention of trying which representation works best, as well as applying different RL algorithms. I will explain my thoughts and would be thankful for your feedback. The environment essentially consists of items that can be consumed by the agent. The effect of the consumption determines the reward (+1 good, 0 neutral). The amount of existing items is infinite. Each item is represented by an feature vector a, with its elements consisting of real numbers in [-1, 1]. The state is, among other information, a list of consumed items in the order of consumption. Each state can also be represented with a feature vector s. Naturally, there are various approaches to constructing s, but that is a…  ( 47 min )
    (Newbie question)How to solve using reinforcement learning 2x2 rubik's cube which has 2^336 states without ValueError?
    I made 6x2x2 numpy array representing a 2x2 rubik's cube which has size of 336 bits. So there is 2^336 states(,right?) Then I tried creating q table with 2^336(states) and 12(actions) dimension And got ValueError: Maximum allowed dimension exceeded on python(numpy error) How do I do it without the error? Or number of states isn't 2^336? ,Thank you submitted by /u/UWUggAh [link] [comments]  ( 44 min )
    Recurrent actor and critic in MARL
    In MARL, is it best to have a recurrent net after which we separate actor and critic, or rather create two different classes with two different recurrent nets? Can you help me understand the differences? submitted by /u/No_Possibility_7588 [link] [comments]  ( 6 min )
    Resources for 2D Hopper locomotion using RL
    Please provide any know any github repos or any other code sources which guide on how to make a 2D hopper like the one from open AI gym to hop/locomote using RL. Really appreciate any help. submitted by /u/danny_varma [link] [comments]  ( 41 min )
  • Open

    A method for designing neural networks optimally suited for certain tasks
    With the right building blocks, machine-learning models can more accurately perform tasks like fraud detection or spam filtering.  ( 9 min )
  • Open

    Snapper provides machine learning-assisted labeling for pixel-perfect image object detection
    Bounding box annotation is a time-consuming and tedious task that requires annotators to create annotations that tightly fit an object’s boundaries. Bounding box annotation tasks, for example, require annotators to ensure that all edges of an annotated object are enclosed in the annotation. In practice, creating annotations that are precise and well-aligned to object edges […]  ( 7 min )
    Recommend top trending items to your users using the new Amazon Personalize recipe
    Amazon Personalize is excited to announce the new Trending-Now recipe to help you recommend items gaining popularity at the fastest pace among your users. Amazon Personalize is a fully managed machine learning (ML) service that makes it easy for developers to deliver personalized experiences to their users. It enables you to improve customer engagement by […]  ( 10 min )
    Bundesliga Match Fact Ball Recovery Time: Quantifying teams’ success in pressing opponents on AWS
    In football, ball possession is a strong predictor for team success. It’s hard to control the game without having control over the ball. In the past three Bundesliga seasons, as well as in the current season (at the time of this writing), Bayern Munich is ranked first in the table and in ball possession percentage, […]  ( 8 min )
    Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS
    The Bundesliga is renowned for its exceptional goalkeepers, making it potentially the most prominent among Europe’s top five leagues in this regard. Apart from the widely recognized Manuel Neuer, the Bundesliga has produced remarkable goalkeepers who have excelled in other leagues, including the likes of Marc-André ter Stegen, who is a superstar at Barcelona. In […]  ( 9 min )
  • Open

    AI and the Future of Health
    Read about some of our research team’s work to make healthcare more data-driven, predictive, and precise – ultimately, empowering every person on the planet to live a healthier future. The post AI and the Future of Health appeared first on Microsoft Research.  ( 14 min )
    AI Frontiers: AI for health and the future of research with Peter Lee
    Powerful new large-scale AI models like GPT-4 are showing dramatic improvements in reasoning, problem-solving, and language capabilities. This marks a phase change for artificial intelligence—and a signal of accelerating progress to come. In this new Microsoft Research Podcast series, AI scientist and engineer Ashley Llorens hosts conversations with his collaborators and colleagues about what these […] The post AI Frontiers: AI for health and the future of research with Peter Lee appeared first on Microsoft Research.  ( 27 min )
  • Open

    “There is no privacy. Get over it.”
    Back in the 2000s, a mentor of mine often quoted then-CEO Scott McNeely of Sun Microsystems (now part of Oracle), who said, ”There is no privacy. Get over it.” My mentor often said as well that many people would give away their data for a “free” T-shirt. We brainstormed about personal data security by imagining… Read More »“There is no privacy. Get over it.” The post “There is no privacy. Get over it.” appeared first on Data Science Central.  ( 21 min )
  • Open

    Data-centric ML benchmarking: Announcing DataPerf’s 2023 challenges
    Posted by Peter Mattson, Senior Staff Engineer, ML Performance, and Praveen Paritosh, Senior Research Scientist, Google Research, Brain Team Machine learning (ML) offers tremendous potential, from diagnosing cancer to engineering safe self-driving cars to amplifying human productivity. To realize this potential, however, organizations need ML solutions to be reliable with ML solution development that is predictable and tractable. The key to both is a deeper understanding of ML data — how to engineer training datasets that produce high quality models and test datasets that deliver accurate indicators of how close we are to solving the target problem. The process of creating high quality datasets is complicated and error-prone, from the initial selection and cleaning of raw data, to lab…  ( 92 min )
  • Open

    April Showers Bring 23 New GeForce NOW Games Including ‘Have a Nice Death’
    It’s another rewarding GFN Thursday, with 23 new games for April on top of 11 joining the cloud this week and a new Marvel’s Midnight Suns reward now available first for GeForce NOW Premium members. Newark, N.J., is next to complete its upgrade to RTX 4080 SuperPODs, making it the 12th region worldwide to bring Read article >  ( 6 min )
  • Open

    Autoregressive Models, OOD Prompts and the Interpolation Regime
    A few years ago I was very much into maximum likelihood-based generative modeling and autoregressive models (see this, this or this). More recently, my focus shifted to characterising inductive biases of gradient-based optimization focussing mostly on supervised learning. I only very recently started combining the two ideas, revisiting autoregressive models  ( 6 min )
  • Open

    A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI. (arXiv:2303.16376v1 [cs.LG])
    Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relied on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences.  ( 3 min )
    Dice Semimetric Losses: Optimizing the Dice Score with Soft Labels. (arXiv:2303.16296v1 [cs.CV])
    The soft Dice loss (SDL) has taken a pivotal role in many automated segmentation pipelines in the medical imaging community. Over the last years, some reasons behind its superior functioning have been uncovered and further optimizations have been explored. However, there is currently no implementation that supports its direct use in settings with soft labels. Hence, a synergy between the use of SDL and research leveraging the use of soft labels, also in the context of model calibration, is still missing. In this work, we introduce Dice semimetric losses (DMLs), which (i) are by design identical to SDL in a standard setting with hard labels, but (ii) can be used in settings with soft labels. Our experiments on the public QUBIQ, LiTS and KiTS benchmarks confirm the potential synergy of DMLs with soft labels (e.g. averaging, label smoothing, and knowledge distillation) over hard labels (e.g. majority voting and random selection). As a result, we obtain superior Dice scores and model calibration, which supports the wider adoption of DMLs in practice. Code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.  ( 2 min )
    Operator learning with PCA-Net: upper and lower complexity bounds. (arXiv:2303.16317v1 [cs.LG])
    Neural operators are gaining attention in computational science and engineering. PCA-Net is a recently proposed neural operator architecture which combines principal component analysis (PCA) with neural networks to approximate an underlying operator. The present work develops approximation theory for this approach, improving and significantly extending previous work in this direction. In terms of qualitative bounds, this paper derives a novel universal approximation result, under minimal assumptions on the underlying operator and the data-generating distribution. In terms of quantitative bounds, two potential obstacles to efficient operator learning with PCA-Net are identified, and made rigorous through the derivation of lower complexity bounds; the first relates to the complexity of the output distribution, measured by a slow decay of the PCA eigenvalues. The other obstacle relates the inherent complexity of the space of operators between infinite-dimensional input and output spaces, resulting in a rigorous and quantifiable statement of the curse of dimensionality. In addition to these lower bounds, upper complexity bounds are derived; first, a suitable smoothness criterion is shown to ensure a algebraic decay of the PCA eigenvalues. Then, it is shown that PCA-Net can overcome the general curse of dimensionality for specific operators of interest, arising from the Darcy flow and Navier-Stokes equations.  ( 2 min )
    When to Pre-Train Graph Neural Networks? An Answer from Data Generation Perspective!. (arXiv:2303.16458v1 [cs.LG])
    Recently, graph pre-training has attracted wide research attention, which aims to learn transferable knowledge from unlabeled graph data so as to improve downstream performance. Despite these recent attempts, the negative transfer is a major issue when applying graph pre-trained models to downstream tasks. Existing works made great efforts on the issue of what to pre-train and how to pre-train by designing a number of graph pre-training and fine-tuning strategies. However, there are indeed cases where no matter how advanced the strategy is, the "pre-train and fine-tune" paradigm still cannot achieve clear benefits. This paper introduces a generic framework W2PGNN to answer the crucial question of when to pre-train (i.e., in what situations could we take advantage of graph pre-training) before performing effortful pre-training or fine-tuning. We start from a new perspective to explore the complex generative mechanisms from the pre-training data to downstream data. In particular, W2PGNN first fits the pre-training data into graphon bases, each element of graphon basis (i.e., a graphon) identifies a fundamental transferable pattern shared by a collection of pre-training graphs. All convex combinations of graphon bases give rise to a generator space, from which graphs generated form the solution space for those downstream data that can benefit from pre-training. In this manner, the feasibility of pre-training can be quantified as the generation probability of the downstream data from any generator in the generator space. W2PGNN provides three broad applications, including providing the application scope of graph pre-trained models, quantifying the feasibility of performing pre-training, and helping select pre-training data to enhance downstream performance. We give a theoretically sound solution for the first application and extensive empirical justifications for the latter two applications.  ( 3 min )
    Maximum likelihood smoothing estimation in state-space models: An incomplete-information based approach. (arXiv:2303.16364v1 [stat.ME])
    This paper revisits classical works of Rauch (1963, et al. 1965) and develops a novel method for maximum likelihood (ML) smoothing estimation from incomplete information/data of stochastic state-space systems. Score function and conditional observed information matrices of incomplete data are introduced and their distributional identities are established. Using these identities, the ML smoother $\widehat{x}_{k\vert n}^s =\argmax_{x_k} \log f(x_k,\widehat{x}_{k+1\vert n}^s, y_{0:n}\vert\theta)$, $k\leq n-1$, is presented. The result shows that the ML smoother gives an estimate of state $x_k$ with more adherence of loglikehood having less standard errors than that of the ML state estimator $\widehat{x}_k=\argmax_{x_k} \log f(x_k,y_{0:k}\vert\theta)$, with $\widehat{x}_{n\vert n}^s=\widehat{x}_n$. Recursive estimation is given in terms of an EM-gradient-particle algorithm which extends the work of \cite{Lange} for ML smoothing estimation. The algorithm has an explicit iteration update which lacks in (\cite{Ramadan}) EM-algorithm for smoothing. A sequential Monte Carlo method is developed for valuation of the score function and observed information matrices. A recursive equation for the covariance matrix of estimation error is developed to calculate the standard errors. In the case of linear systems, the method shows that the Rauch-Tung-Striebel (RTS) smoother is a fully efficient smoothing state-estimator whose covariance matrix coincides with the Cram\'er-Rao lower bound, the inverse of expected information matrix. Furthermore, the RTS smoother coincides with the Kalman filter having less covariance matrix. Numerical studies are performed, confirming the accuracy of the main results.  ( 3 min )
    GNNBuilder: An Automated Framework for Generic Graph Neural Network Accelerator Generation, Simulation, and Optimization. (arXiv:2303.16459v1 [cs.AR])
    There are plenty of graph neural network (GNN) accelerators being proposed. However, they highly rely on users' hardware expertise and are usually optimized for one specific GNN model, making them challenging for practical use . Therefore, in this work, we propose GNNBuilder, the first automated, generic, end-to-end GNN accelerator generation framework. It features four advantages: (1) GNNBuilder can automatically generate GNN accelerators for a wide range of GNN models arbitrarily defined by users; (2) GNNBuilder takes standard PyTorch programming interface, introducing zero overhead for algorithm developers; (3) GNNBuilder supports end-to-end code generation, simulation, accelerator optimization, and hardware deployment, realizing a push-button fashion for GNN accelerator design; (4) GNNBuilder is equipped with accurate performance models of its generated accelerator, enabling fast and flexible design space exploration (DSE). In the experiments, first, we show that our accelerator performance model has errors within $36\%$ for latency prediction and $18\%$ for BRAM count prediction. Second, we show that our generated accelerators can outperform CPU by $6.33\times$ and GPU by $6.87\times$. This framework is open-source, and the code is available at https://anonymous.4open.science/r/gnn-builder-83B4/.  ( 2 min )
    A Comprehensive and Versatile Multimodal Deep Learning Approach for Predicting Diverse Properties of Advanced Materials. (arXiv:2303.16412v1 [cond-mat.soft])
    We present a multimodal deep learning (MDL) framework for predicting physical properties of a 10-dimensional acrylic polymer composite material by merging physical attributes and chemical data. Our MDL model comprises four modules, including three generative deep learning models for material structure characterization and a fourth model for property prediction. Our approach handles an 18-dimensional complexity, with 10 compositional inputs and 8 property outputs, successfully predicting 913,680 property data points across 114,210 composition conditions. This level of complexity is unprecedented in computational materials science, particularly for materials with undefined structures. We propose a framework to analyze the high-dimensional information space for inverse material design, demonstrating flexibility and adaptability to various materials and scales, provided sufficient data is available. This study advances future research on different materials and the development of more sophisticated models, drawing us closer to the ultimate goal of predicting all properties of all materials.  ( 2 min )
    Lifting uniform learners via distributional decomposition. (arXiv:2303.16208v1 [stat.ML])
    We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution $\mathcal{D}$. The efficiency of our transformation scales with the inherent complexity of $\mathcal{D}$, running in $\mathrm{poly}(n, (md)^d)$ time for distributions over $\{\pm 1\}^n$ whose pmfs are computed by depth-$d$ decision trees, where $m$ is the sample complexity of the original algorithm. For monotone distributions our transformation uses only samples from $\mathcal{D}$, and for general ones it uses subcube conditioning samples. A key technical ingredient is an algorithm which, given the aforementioned access to $\mathcal{D}$, produces an optimal decision tree decomposition of $\mathcal{D}$: an approximation of $\mathcal{D}$ as a mixture of uniform distributions over disjoint subcubes. With this decomposition in hand, we run the uniform-distribution learner on each subcube and combine the hypotheses using the decision tree. This algorithmic decomposition lemma also yields new algorithms for learning decision tree distributions with runtimes that exponentially improve on the prior state of the art -- results of independent interest in distribution learning.  ( 2 min )
    Conductivity Imaging from Internal Measurements with Mixed Least-Squares Deep Neural Networks. (arXiv:2303.16454v1 [math.NA])
    In this work we develop a novel approach using deep neural networks to reconstruct the conductivity distribution in elliptic problems from one internal measurement. The approach is based on a mixed reformulation of the governing equation and utilizes the standard least-squares objective to approximate the conductivity and flux simultaneously, with deep neural networks as ansatz functions. We provide a thorough analysis of the neural network approximations for both continuous and empirical losses, including rigorous error estimates that are explicit in terms of the noise level, various penalty parameters and neural network architectural parameters (depth, width and parameter bound). We also provide extensive numerical experiments in two- and multi-dimensions to illustrate distinct features of the approach, e.g., excellent stability with respect to data noise and capability of solving high-dimensional problems.  ( 2 min )
    Reinforcement learning for optimization of energy trading strategy. (arXiv:2303.16266v1 [cs.LG])
    An increasing part of energy is produced from renewable sources by a large number of small producers. The efficiency of these sources is volatile and, to some extent, random, exacerbating the energy market balance problem. In many countries, that balancing is performed on day-ahead (DA) energy markets. In this paper, we consider automated trading on a DA energy market by a medium size prosumer. We model this activity as a Markov Decision Process and formalize a framework in which a ready-to-use strategy can be optimized with real-life data. We synthesize parametric trading strategies and optimize them with an evolutionary algorithm. We also use state-of-the-art reinforcement learning algorithms to optimize a black-box trading strategy fed with available information from the environment that can impact future prices.  ( 2 min )
    Hard Regularization to Prevent Collapse in Online Deep Clustering without Data Augmentation. (arXiv:2303.16521v1 [cs.LG])
    Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed. While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster. Successful existing models have employed various techniques to avoid this problem, most of which require data augmentation or which aim to make the average soft assignment across the dataset the same for each cluster. We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments. Using a Bayesian framework, we derive an intuitive optimization objective that can be straightforwardly included in the training of the encoder network. Tested on four image datasets, we show that it consistently avoids collapse more robustly than other methods and that it leads to more accurate clustering. We also conduct further experiments and analyses justifying our choice to regularize the hard cluster assignments.  ( 2 min )
    Unified analysis of SGD-type methods. (arXiv:2303.16502v1 [math.OC])
    This note focuses on a simple approach to the unified analysis of SGD-type methods from (Gorbunov et al., 2020) for strongly convex smooth optimization problems. The similarities in the analyses of different stochastic first-order methods are discussed along with the existing extensions of the framework. The limitations of the analysis and several alternative approaches are mentioned as well.  ( 2 min )
    Lipschitzness Effect of a Loss Function on Generalization Performance of Deep Neural Networks Trained by Adam and AdamW Optimizers. (arXiv:2303.16464v1 [cs.LG])
    The generalization performance of deep neural networks with regard to the optimization algorithm is one of the major concerns in machine learning. This performance can be affected by various factors. In this paper, we theoretically prove that the Lipschitz constant of a loss function is an important factor to diminish the generalization error of the output model obtained by Adam or AdamW. The results can be used as a guideline for choosing the loss function when the optimization algorithm is Adam or AdamW. In addition, to evaluate the theoretical bound in a practical setting, we choose the human age estimation problem in computer vision. For assessing the generalization better, the training and test datasets are drawn from different distributions. Our experimental evaluation shows that the loss function with lower Lipschitz constant and maximum value improves the generalization of the model trained by Adam or AdamW.  ( 2 min )
    Parameterizing the cost function of Dynamic Time Warping with application to time series classification. (arXiv:2301.10350v2 [cs.LG] UPDATED)
    Dynamic Time Warping (DTW) is a popular time series distance measure that aligns the points in two series with one another. These alignments support warping of the time dimension to allow for processes that unfold at differing rates. The distance is the minimum sum of costs of the resulting alignments over any allowable warping of the time dimension. The cost of an alignment of two points is a function of the difference in the values of those points. The original cost function was the absolute value of this difference. Other cost functions have been proposed. A popular alternative is the square of the difference. However, to our knowledge, this is the first investigation of both the relative impacts of using different cost functions and the potential to tune cost functions to different tasks. We do so in this paper by using a tunable cost function {\lambda}{\gamma} with parameter {\gamma}. We show that higher values of {\gamma} place greater weight on larger pairwise differences, while lower values place greater weight on smaller pairwise differences. We demonstrate that training {\gamma} significantly improves the accuracy of both the DTW nearest neighbor and Proximity Forest classifiers.
    On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective. (arXiv:2302.12095v4 [cs.AI] UPDATED)
    ChatGPT is a recent chatbot service released by OpenAI and is receiving increasing attention over the past few months. While evaluations of various aspects of ChatGPT have been done, its robustness, i.e., the performance to unexpected inputs, is still unclear to the public. Robustness is of particular concern in responsible AI, especially for safety-critical applications. In this paper, we conduct a thorough evaluation of the robustness of ChatGPT from the adversarial and out-of-distribution (OOD) perspective. To do so, we employ the AdvGLUE and ANLI benchmarks to assess adversarial robustness and the Flipkart review and DDXPlus medical diagnosis datasets for OOD evaluation. We select several popular foundation models as baselines. Results show that ChatGPT shows consistent advantages on most adversarial and OOD classification and translation tasks. However, the absolute performance is far from perfection, which suggests that adversarial and OOD robustness remains a significant threat to foundation models. Moreover, ChatGPT shows astounding performance in understanding dialogue-related texts and we find that it tends to provide informal suggestions for medical tasks instead of definitive answers. Finally, we present in-depth discussions of possible research directions.
    Beyond Empirical Risk Minimization: Local Structure Preserving Regularization for Improving Adversarial Robustness. (arXiv:2303.16861v1 [cs.LG])
    It is broadly known that deep neural networks are susceptible to being fooled by adversarial examples with perturbations imperceptible by humans. Various defenses have been proposed to improve adversarial robustness, among which adversarial training methods are most effective. However, most of these methods treat the training samples independently and demand a tremendous amount of samples to train a robust network, while ignoring the latent structural information among these samples. In this work, we propose a novel Local Structure Preserving (LSP) regularization, which aims to preserve the local structure of the input space in the learned embedding space. In this manner, the attacking effect of adversarial samples lying in the vicinity of clean samples can be alleviated. We show strong empirical evidence that with or without adversarial training, our method consistently improves the performance of adversarial robustness on several image classification datasets compared to the baselines and some state-of-the-art approaches, thus providing promising direction for future research.
    Generalizable Denoising of Microscopy Images using Generative Adversarial Networks and Contrastive Learning. (arXiv:2303.15214v2 [eess.IV] UPDATED)
    Microscopy images often suffer from high levels of noise, which can hinder further analysis and interpretation. Content-aware image restoration (CARE) methods have been proposed to address this issue, but they often require large amounts of training data and suffer from over-fitting. To overcome these challenges, we propose a novel framework for few-shot microscopy image denoising. Our approach combines a generative adversarial network (GAN) trained via contrastive learning (CL) with two structure preserving loss terms (Structural Similarity Index and Total Variation loss) to further improve the quality of the denoised images using little data. We demonstrate the effectiveness of our method on three well-known microscopy imaging datasets, and show that we can drastically reduce the amount of training data while retaining the quality of the denoising, thus alleviating the burden of acquiring paired data and enabling few-shot learning. The proposed framework can be easily extended to other image restoration tasks and has the potential to significantly advance the field of microscopy image analysis.
    Instance-Aware Image Completion. (arXiv:2210.12350v2 [cs.CV] UPDATED)
    Image completion is a task that aims to fill in the missing region of a masked image with plausible contents. However, existing image completion methods tend to fill in the missing region with the surrounding texture instead of hallucinating a visual instance that is suitable in accordance with the context of the scene. In this work, we propose a novel image completion model, dubbed ImComplete, that hallucinates the missing instance that harmonizes well with - and thus preserves - the original context. ImComplete first adopts a transformer architecture that considers the visible instances and the location of the missing region. Then, ImComplete completes the semantic segmentation masks within the missing region, providing pixel-level semantic and structural guidance. Finally, the image synthesis blocks generate photo-realistic content. We perform a comprehensive evaluation of the results in terms of visual quality (LPIPS and FID) and contextual preservation scores (CLIPscore and object detection accuracy) with COCO-panoptic and Visual Genome datasets. Experimental results show the superiority of ImComplete on various natural images.
    Selective experience replay compression using coresets for lifelong deep reinforcement learning in medical imaging. (arXiv:2302.11510v3 [cs.LG] UPDATED)
    Selective experience replay is a popular strategy for integrating lifelong learning with deep reinforcement learning. Selective experience replay aims to recount selected experiences from previous tasks to avoid catastrophic forgetting. Furthermore, selective experience replay based techniques are model agnostic and allow experiences to be shared across different models. However, storing experiences from all previous tasks make lifelong learning using selective experience replay computationally very expensive and impractical as the number of tasks increase. To that end, we propose a reward distribution-preserving coreset compression technique for compressing experience replay buffers stored for selective experience replay. We evaluated the coreset compression technique on the brain tumor segmentation (BRATS) dataset for the task of ventricle localization and on the whole-body MRI for localization of left knee cap, left kidney, right trochanter, left lung, and spleen. The coreset lifelong learning models trained on a sequence of 10 different brain MR imaging environments demonstrated excellent performance localizing the ventricle with a mean pixel error distance of 12.93 for the compression ratio of 10x. In comparison, the conventional lifelong learning model localized the ventricle with a mean pixel distance of 10.87. Similarly, the coreset lifelong learning models trained on whole-body MRI demonstrated no significant difference (p=0.28) between the 10x compressed coreset lifelong learning models and conventional lifelong learning models for all the landmarks. The mean pixel distance for the 10x compressed models across all the landmarks was 25.30, compared to 19.24 for the conventional lifelong learning models. Our results demonstrate that the potential of the coreset-based ERB compression method for compressing experiences without a significant drop in performance.
    Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments. (arXiv:2106.10365v2 [cs.LG] UPDATED)
    The capability of a reinforcement learning (RL) agent heavily depends on the diversity of the learning scenarios generated by the environment. Generation of diverse realistic scenarios is challenging for real-time strategy (RTS) environments. The RTS environments are characterized by intelligent entities/non-RL agents cooperating and competing with the RL agents with large state and action spaces over a long period of time, resulting in an infinite space of feasible, but not necessarily realistic, scenarios involving complex interaction among different RL and non-RL agents. Yet, most of the existing simulators rely on randomly generating the environments based on predefined settings/layouts and offer limited flexibility and control over the environment dynamics for researchers to generate diverse, realistic scenarios as per their demand. To address this issue, for the first time, we formally introduce the benefits of adopting an existing formal scenario specification language, SCENIC, to assist researchers to model and generate diverse scenarios in an RTS environment in a flexible, systematic, and programmatic manner. To showcase the benefits, we interfaced SCENIC to an existing RTS environment Google Research Football(GRF) simulator and introduced a benchmark consisting of 32 realistic scenarios, encoded in SCENIC, to train RL agents and testing their generalization capabilities. We also show how researchers/RL practitioners can incorporate their domain knowledge to expedite the training process by intuitively modeling stochastic programmatic policies with SCENIC.
    Cooperative Retriever and Ranker in Deep Recommenders. (arXiv:2206.14649v2 [cs.IR] UPDATED)
    Deep recommender systems (DRS) are intensively applied in modern web services. To deal with the massive web contents, DRS employs a two-stage workflow: retrieval and ranking, to generate its recommendation results. The retriever aims to select a small set of relevant candidates from the entire items with high efficiency; while the ranker, usually more precise but time-consuming, is supposed to further refine the best items from the retrieved candidates. Traditionally, the two components are trained either independently or within a simple cascading pipeline, which is prone to poor collaboration effect. Though some latest works suggested to train retriever and ranker jointly, there still exist many severe limitations: item distribution shift between training and inference, false negative, and misalignment of ranking order. As such, it remains to explore effective collaborations between retriever and ranker.
    Editing Models with Task Arithmetic. (arXiv:2212.04089v2 [cs.LG] UPDATED)
    Changing how pre-trained models behave -- e.g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine learning systems. In this work, we propose a new paradigm for steering the behavior of neural networks, centered around \textit{task vectors}. A task vector specifies a direction in the weight space of a pre-trained model, such that movement in that direction improves performance on the task. We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task. We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition, and the behavior of the resulting model is steered accordingly. Negating a task vector decreases performance on the target task, with little change in model behavior on control tasks. Moreover, adding task vectors together can improve performance on multiple tasks at once. Finally, when tasks are linked by an analogy relationship of the form ``A is to B as C is to D", combining task vectors from three of the tasks can improve performance on the fourth, even when no data from the fourth task is used for training. Overall, our experiments with several models, modalities and tasks show that task arithmetic is a simple, efficient and effective way of editing models.
    Computationally Efficient Labeling of Cancer Related Forum Posts by Non-Clinical Text Information Retrieval. (arXiv:2303.16766v1 [cs.IR])
    An abundance of information about cancer exists online, but categorizing and extracting useful information from it is difficult. Almost all research within healthcare data processing is concerned with formal clinical data, but there is valuable information in non-clinical data too. The present study combines methods within distributed computing, text retrieval, clustering, and classification into a coherent and computationally efficient system, that can clarify cancer patient trajectories based on non-clinical and freely available information. We produce a fully-functional prototype that can retrieve, cluster and present information about cancer trajectories from non-clinical forum posts. We evaluate three clustering algorithms (MR-DBSCAN, DBSCAN, and HDBSCAN) and compare them in terms of Adjusted Rand Index and total run time as a function of the number of posts retrieved and the neighborhood radius. Clustering results show that neighborhood radius has the most significant impact on clustering performance. For small values, the data set is split accordingly, but high values produce a large number of possible partitions and searching for the best partition is hereby time-consuming. With a proper estimated radius, MR-DBSCAN can cluster 50000 forum posts in 46.1 seconds, compared to DBSCAN (143.4) and HDBSCAN (282.3). We conduct an interview with the Danish Cancer Society and present our software prototype. The organization sees a potential in software that can democratize online information about cancer and foresee that such systems will be required in the future.
    Link Prediction with Non-Contrastive Learning. (arXiv:2211.14394v2 [cs.LG] UPDATED)
    A recent focal area in the space of graph neural networks (GNNs) is graph self-supervised learning (SSL), which aims to derive useful node representations without labeled data. Notably, many state-of-the-art graph SSL methods are contrastive methods, which use a combination of positive and negative samples to learn node representations. Owing to challenges in negative sampling (slowness and model sensitivity), recent literature introduced non-contrastive methods, which instead only use positive samples. Though such methods have shown promising performance in node-level tasks, their suitability for link prediction tasks, which are concerned with predicting link existence between pairs of nodes (and have broad applicability to recommendation systems contexts) is yet unexplored. In this work, we extensively evaluate the performance of existing non-contrastive methods for link prediction in both transductive and inductive settings. While most existing non-contrastive methods perform poorly overall, we find that, surprisingly, BGRL generally performs well in transductive settings. However, it performs poorly in the more realistic inductive settings where the model has to generalize to links to/from unseen nodes. We find that non-contrastive models tend to overfit to the training graph and use this analysis to propose T-BGRL, a novel non-contrastive framework that incorporates cheap corruptions to improve the generalization ability of the model. This simple modification strongly improves inductive performance in 5/6 of our datasets, with up to a 120% improvement in Hits@50--all with comparable speed to other non-contrastive baselines and up to 14x faster than the best-performing contrastive baseline. Our work imparts interesting findings about non-contrastive learning for link prediction and paves the way for future researchers to further expand upon this area.
    Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels. (arXiv:2302.05666v2 [cs.CV] UPDATED)
    IoU losses are surrogates that directly optimize the Jaccard index. In semantic segmentation, leveraging IoU losses as part of the loss function is shown to perform better with respect to the Jaccard index measure than optimizing pixel-wise losses such as the cross-entropy loss alone. The most notable IoU losses are the soft Jaccard loss and the Lovasz-Softmax loss. However, these losses are incompatible with soft labels which are ubiquitous in machine learning. In this paper, we propose Jaccard metric losses (JMLs), which are identical to the soft Jaccard loss in a standard setting with hard labels, but are compatible with soft labels. With JMLs, we study two of the most popular use cases of soft labels: label smoothing and knowledge distillation. With a variety of architectures, our experiments show significant improvements over the cross-entropy loss on three semantic segmentation datasets (Cityscapes, PASCAL VOC and DeepGlobe Land), and our simple approach outperforms state-of-the-art knowledge distillation methods by a large margin. Code is available at: \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.
    ALUM: Adversarial Data Uncertainty Modeling from Latent Model Uncertainty Compensation. (arXiv:2303.16866v1 [cs.LG])
    It is critical that the models pay attention not only to accuracy but also to the certainty of prediction. Uncertain predictions of deep models caused by noisy data raise significant concerns in trustworthy AI areas. To explore and handle uncertainty due to intrinsic data noise, we propose a novel method called ALUM to simultaneously handle the model uncertainty and data uncertainty in a unified scheme. Rather than solely modeling data uncertainty in the ultimate layer of a deep model based on randomly selected training data, we propose to explore mined adversarial triplets to facilitate data uncertainty modeling and non-parametric uncertainty estimations to compensate for the insufficiently trained latent model layers. Thus, the critical data uncertainty and model uncertainty caused by noisy data can be readily quantified for improving model robustness. Our proposed ALUM is model-agnostic which can be easily implemented into any existing deep model with little extra computation overhead. Extensive experiments on various noisy learning tasks validate the superior robustness and generalization ability of our method. The code is released at https://github.com/wwzjer/ALUM.
    A Simple Baseline that Questions the Use of Pretrained-Models in Continual Learning. (arXiv:2210.04428v2 [cs.CV] UPDATED)
    With the success of pretraining techniques in representation learning, a number of continual learning methods based on pretrained models have been proposed. Some of these methods design continual learning mechanisms on the pre-trained representations and only allow minimum updates or even no updates of the backbone models during the training of continual learning. In this paper, we question whether the complexity of these models is needed to achieve good performance by comparing them to a simple baseline that we designed. We argue that the pretrained feature extractor itself can be strong enough to achieve a competitive or even better continual learning performance on Split-CIFAR100 and CoRe 50 benchmarks. To validate this, we conduct a very simple baseline that 1) use the frozen pretrained model to extract image features for every class encountered during the continual learning stage and compute their corresponding mean features on training data, and 2) predict the class of the input based on the nearest neighbor distance between test samples and mean features of the classes; i.e., Nearest Mean Classifier (NMC). This baseline is single-headed, exemplar-free, and can be task-free (by updating the means continually). This baseline achieved 88.53% on 10-Split-CIFAR-100, surpassing most state-of-the-art continual learning methods that are all initialized using the same pretrained transformer model. We hope our baseline may encourage future progress in designing learning systems that can continually add quality to the learning representations even if they started from some pretrained weights.
    COVID-19 Detection Using Segmentation, Region Extraction and Classification Pipeline. (arXiv:2210.02992v4 [eess.IV] UPDATED)
    The main purpose of this study is to develop a pipeline for COVID-19 detection from a big and challenging database of Computed Tomography (CT) images. The proposed pipeline includes a segmentation part, a lung extraction part, and a classifier part. Optional slice removal techniques after UNet-based segmentation of slices were also tried. The methodologies tried in the segmentation part are traditional segmentation methods as well as UNet-based methods. In the classification part, a Convolutional Neural Network (CNN) was used to take the final diagnosis decisions. In terms of the results: in the segmentation part, the proposed segmentation methods show high dice scores on a publicly available dataset. In the classification part, the results were compared at slice-level and at patient-level as well. At slice-level, methods were compared and showed high validation accuracy indicating efficiency in predicting 2D slices. At patient level, the proposed methods were also compared in terms of validation accuracy and macro F1 score on the validation set. The dataset used for classification is COV-19CT Database. The method proposed here showed improvement from our precious results on the same dataset. In Conclusion, the improved work in this paper has potential clinical usages for COVID-19 detection and diagnosis via CT images. The code is on github at https://github.com/IDU-CVLab/COV19D_3rd
    Machine Learning for Synthetic Data Generation: A Review. (arXiv:2302.04062v2 [cs.LG] UPDATED)
    Data plays a crucial role in machine learning. However, in real-world applications, there are several problems with data, e.g., data are of low quality; a limited number of data points lead to under-fitting of the machine learning model; it is hard to access the data due to privacy, safety and regulatory concerns. Synthetic data generation offers a promising new avenue, as it can be shared and used in ways that real-world data cannot. This paper systematically reviews the existing works that leverage machine learning models for synthetic data generation. Specifically, we discuss the synthetic data generation works from several perspectives: (i) applications, including computer vision, speech, natural language, healthcare, and business; (ii) machine learning methods, particularly neural network architectures and deep generative models; (iii) privacy and fairness issue. In addition, we identify the challenges and opportunities in this emerging field and suggest future research directions.
    Privacy-Preserving Logistic Regression Training with A Faster Gradient Variant. (arXiv:2201.10838v3 [cs.CR] UPDATED)
    Logistic regression training over encrypted data has been an attractive idea to security concerns for years. In this paper, we propose a faster gradient variant called $\texttt{quadratic gradient}$ to implement logistic regression training in a homomorphic encryption domain, the core of which can be seen as an extension of the simplified fixed Hessian. We enhance Nesterov's accelerated gradient (NAG) and Adaptive Gradient Algorithm (Adagrad) respectively with this gradient variant and evaluate the enhanced algorithms on several datasets. Experimental results show that the enhanced methods have a state-of-the-art performance in convergence speed compared to the naive first-order gradient methods. We then adopt the enhanced NAG method to implement homomorphic logistic regression training and obtain a comparable result by only $3$ iterations.
    Minimal Width for Universal Property of Deep RNN. (arXiv:2211.13866v2 [stat.ML] UPDATED)
    A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data. Imitating a dynamical system, an infinite-width RNN can approximate any open dynamical system in a compact domain. In general, deep networks with bounded widths are more effective than wide networks in practice; however, the universal approximation theorem for deep narrow structures has yet to be extensively studied. In this study, we prove the universality of deep narrow RNNs and show that the upper bound of the minimum width for universality can be independent of the length of the data. Specifically, we show that a deep RNN with ReLU activation can approximate any continuous function or $L^p$ function with the widths $d_x+d_y+2$ and $\max\{d_x+1,d_y\}$, respectively, where the target function maps a finite sequence of vectors in $\mathbb{R}^{d_x}$ to a finite sequence of vectors in $\mathbb{R}^{d_y}$. We also compute the additional width required if the activation function is $\tanh$ or more. In addition, we prove the universality of other recurrent networks, such as bidirectional RNNs. Bridging a multi-layer perceptron and an RNN, our theory and proof technique can be an initial step toward further research on deep RNNs.
    Improved Kernel Alignment Regret Bound for Online Kernel Learning. (arXiv:2212.12989v2 [cs.LG] UPDATED)
    In this paper, we improve the kernel alignment regret bound for online kernel learning in the regime of the Hinge loss function. Previous algorithm achieves a regret of $O((\mathcal{A}_TT\ln{T})^{\frac{1}{4}})$ at a computational complexity (space and per-round time) of $O(\sqrt{\mathcal{A}_TT\ln{T}})$, where $\mathcal{A}_T$ is called \textit{kernel alignment}. We propose an algorithm whose regret bound and computational complexity are better than previous results. Our results depend on the decay rate of eigenvalues of the kernel matrix. If the eigenvalues of the kernel matrix decay exponentially, then our algorithm enjoys a regret of $O(\sqrt{\mathcal{A}_T})$ at a computational complexity of $O(\ln^2{T})$. Otherwise, our algorithm enjoys a regret of $O((\mathcal{A}_TT)^{\frac{1}{4}})$ at a computational complexity of $O(\sqrt{\mathcal{A}_TT})$. We extend our algorithm to batch learning and obtain a $O(\frac{1}{T}\sqrt{\mathbb{E}[\mathcal{A}_T]})$ excess risk bound which improves the previous $O(1/\sqrt{T})$ bound.
    Emergent Linguistic Structures in Neural Networks are Fragile. (arXiv:2210.17406v7 [cs.LG] UPDATED)
    Large Language Models (LLMs) have been reported to have strong performance on natural language processing tasks. However, performance metrics such as accuracy do not measure the quality of the model in terms of its ability to robustly represent complex linguistic structure. In this paper, focusing on the ability of language models to represent syntax, we propose a framework to assess the consistency and robustness of linguistic representations. To this end, we introduce measures of robustness of neural network models that leverage recent advances in extracting linguistic constructs from LLMs via probing tasks, i.e., simple tasks used to extract meaningful information about a single facet of a language model, such as syntax reconstruction and root identification. Empirically, we study the performance of four LLMs across six different corpora on the proposed robustness measures by analysing their performance and robustness with respect to syntax-preserving perturbations. We provide evidence that context-free representation (e.g., GloVe) are in some cases competitive with context-dependent representations from modern LLMs (e.g., BERT), yet equally brittle to syntax-preserving perturbations. Our key observation is that emergent syntactic representations in neural networks are brittle. We make the code, trained models and logs available to the community as a contribution to the debate about the capabilities of LLMs.
    A Complete Expressiveness Hierarchy for Subgraph GNNs via Subgraph Weisfeiler-Lehman Tests. (arXiv:2302.07090v2 [cs.LG] UPDATED)
    Recently, subgraph GNNs have emerged as an important direction for developing expressive graph neural networks (GNNs). While numerous architectures have been proposed, so far there is still a limited understanding of how various design paradigms differ in terms of expressive power, nor is it clear what design principle achieves maximal expressiveness with minimal architectural complexity. To address these fundamental questions, this paper conducts a systematic study of general node-based subgraph GNNs through the lens of Subgraph Weisfeiler-Lehman Tests (SWL). Our central result is to build a complete hierarchy of SWL with strictly growing expressivity. Concretely, we prove that any node-based subgraph GNN falls into one of the six SWL equivalence classes, among which $\mathsf{SSWL}$ achieves the maximal expressive power. We also study how these equivalence classes differ in terms of their practical expressiveness such as encoding graph distance and biconnectivity. Furthermore, we give a tight expressivity upper bound of all SWL algorithms by establishing a close relation with localized versions of WL and Folklore WL (FWL) tests. Our results provide insights into the power of existing subgraph GNNs, guide the design of new architectures, and point out their limitations by revealing an inherent gap with the 2-FWL test. Finally, experiments demonstrate that $\mathsf{SSWL}$-inspired subgraph GNNs can significantly outperform prior architectures on multiple benchmarks despite great simplicity.
    Multinomial Logistic Regression Algorithms via Quadratic Gradient. (arXiv:2208.06828v2 [cs.LG] UPDATED)
    Multinomial logistic regression, also known by other names such as multiclass logistic regression and softmax regression, is a fundamental classification method that generalizes binary logistic regression to multiclass problems. A recently work proposed a faster gradient called $\texttt{quadratic gradient}$ that can accelerate the binary logistic regression training, and presented an enhanced Nesterov's accelerated gradient (NAG) method for binary logistic regression. In this paper, we extend this work to multiclass logistic regression and propose an enhanced Adaptive Gradient Algorithm (Adagrad) that can accelerate the original Adagrad method. We test the enhanced NAG method and the enhanced Adagrad method on some multiclass-problem datasets. Experimental results show that both enhanced methods converge faster than their original ones respectively.
    Multi-Viewpoint and Multi-Evaluation with Felicitous Inductive Bias Boost Machine Abstract Reasoning Ability. (arXiv:2210.14914v2 [cs.LG] UPDATED)
    Great endeavors have been made to study AI's ability in abstract reasoning, along with which different versions of RAVEN's progressive matrices (RPM) are proposed as benchmarks. Previous works give inkling that without sophisticated design or extra meta-data containing semantic information, neural networks may still be indecisive in making decisions regarding RPM problems, after relentless training. Evidenced by thorough experiments and ablation studies, we showcase that end-to-end neural networks embodied with felicitous inductive bias, intentionally design or serendipitously match, can solve RPM problems elegantly, without the augment of any extra meta-data or preferences of any specific backbone. Our work also reveals that multi-viewpoint with multi-evaluation is a key learning strategy for successful reasoning. Finally, potential explanations for the failure of connectionist models in generalization are provided. We hope that these results will serve as inspections of AI's ability beyond perception and toward abstract reasoning. Source code can be found in https://github.com/QinglaiWeiCASIA/RavenSolver.
    TextMI: Textualize Multimodal Information for Integrating Non-verbal Cues in Pre-trained Language Models. (arXiv:2303.15430v2 [cs.CL] UPDATED)
    Pre-trained large language models have recently achieved ground-breaking performance in a wide variety of language understanding tasks. However, the same model can not be applied to multimodal behavior understanding tasks (e.g., video sentiment/humor detection) unless non-verbal features (e.g., acoustic and visual) can be integrated with language. Jointly modeling multiple modalities significantly increases the model complexity, and makes the training process data-hungry. While an enormous amount of text data is available via the web, collecting large-scale multimodal behavioral video datasets is extremely expensive, both in terms of time and money. In this paper, we investigate whether large language models alone can successfully incorporate non-verbal information when they are presented in textual form. We present a way to convert the acoustic and visual information into corresponding textual descriptions and concatenate them with the spoken text. We feed this augmented input to a pre-trained BERT model and fine-tune it on three downstream multimodal tasks: sentiment, humor, and sarcasm detection. Our approach, TextMI, significantly reduces model complexity, adds interpretability to the model's decision, and can be applied for a diverse set of tasks while achieving superior (multimodal sarcasm detection) or near SOTA (multimodal sentiment analysis and multimodal humor detection) performance. We propose TextMI as a general, competitive baseline for multimodal behavioral analysis tasks, particularly in a low-resource setting.
    Diffusion Denoised Smoothing for Certified and Adversarial Robust Out-Of-Distribution Detection. (arXiv:2303.14961v2 [cs.LG] UPDATED)
    As the use of machine learning continues to expand, the importance of ensuring its safety cannot be overstated. A key concern in this regard is the ability to identify whether a given sample is from the training distribution, or is an "Out-Of-Distribution" (OOD) sample. In addition, adversaries can manipulate OOD samples in ways that lead a classifier to make a confident prediction. In this study, we present a novel approach for certifying the robustness of OOD detection within a $\ell_2$-norm around the input, regardless of network architecture and without the need for specific components or additional training. Further, we improve current techniques for detecting adversarial attacks on OOD samples, while providing high levels of certified and adversarial robustness on in-distribution samples. The average of all OOD detection metrics on CIFAR10/100 shows an increase of $\sim 13 \% / 5\%$ relative to previous approaches.
    Foundation Models and Fair Use. (arXiv:2303.15715v1 [cs.CY] CROSS LISTED)
    Existing foundation models are trained on copyrighted material. Deploying these models can pose both legal and ethical risks when data creators fail to receive appropriate attribution or compensation. In the United States and several other countries, copyrighted content may be used to build foundation models without incurring liability due to the fair use doctrine. However, there is a caveat: If the model produces output that is similar to copyrighted data, particularly in scenarios that affect the market of that data, fair use may no longer apply to the output of the model. In this work, we emphasize that fair use is not guaranteed, and additional work may be necessary to keep model development and deployment squarely in the realm of fair use. First, we survey the potential risks of developing and deploying foundation models based on copyrighted content. We review relevant U.S. case law, drawing parallels to existing and potential applications for generating text, source code, and visual art. Experiments confirm that popular foundation models can generate content considerably similar to copyrighted material. Second, we discuss technical mitigations that can help foundation models stay in line with fair use. We argue that more research is needed to align mitigation strategies with the current state of the law. Lastly, we suggest that the law and technical mitigations should co-evolve. For example, coupled with other policy mechanisms, the law could more explicitly consider safe harbors when strong technical tools are used to mitigate infringement harms. This co-evolution may help strike a balance between intellectual property and innovation, which speaks to the original goal of fair use. But we emphasize that the strategies we describe here are not a panacea and more work is needed to develop policies that address the potential harms of foundation models.
    Beyond RMSE: Do machine-learned models of road user interaction produce human-like behavior?. (arXiv:2206.11110v2 [cs.LG] UPDATED)
    Autonomous vehicles use a variety of sensors and machine-learned models to predict the behavior of surrounding road users. Most of the machine-learned models in the literature focus on quantitative error metrics like the root mean square error (RMSE) to learn and report their models' capabilities. This focus on quantitative error metrics tends to ignore the more important behavioral aspect of the models, raising the question of whether these models really predict human-like behavior. Thus, we propose to analyze the output of machine-learned models much like we would analyze human data in conventional behavioral research. We introduce quantitative metrics to demonstrate presence of three different behavioral phenomena in a naturalistic highway driving dataset: 1) The kinematics-dependence of who passes a merging point first 2) Lane change by an on-highway vehicle to accommodate an on-ramp vehicle 3) Lane changes by vehicles on the highway to avoid lead vehicle conflicts. Then, we analyze the behavior of three machine-learned models using the same metrics. Even though the models' RMSE value differed, all the models captured the kinematic-dependent merging behavior but struggled at varying degrees to capture the more nuanced courtesy lane change and highway lane change behavior. Additionally, the collision aversion analysis during lane changes showed that the models struggled to capture the physical aspect of human driving: leaving adequate gap between the vehicles. Thus, our analysis highlighted the inadequacy of simple quantitative metrics and the need to take a broader behavioral perspective when analyzing machine-learned models of human driving predictions.
    Improving Transfer Learning with a Dual Image and Video Transformer for Multi-label Movie Trailer Genre Classification. (arXiv:2210.07983v4 [cs.CV] UPDATED)
    In this paper, we study the transferability of ImageNet spatial and Kinetics spatio-temporal representations to multi-label Movie Trailer Genre Classification (MTGC). In particular, we present an extensive evaluation of the transferability of ConvNet and Transformer models pretrained on ImageNet and Kinetics to Trailers12k, a new manually-curated movie trailer dataset composed of 12,000 videos labeled with 10 different genres and associated metadata. We analyze different aspects that can influence transferability, such as frame rate, input video extension, and spatio-temporal modeling. In order to reduce the spatio-temporal structure gap between ImageNet/Kinetics and Trailers12k, we propose Dual Image and Video Transformer Architecture (DIViTA), which performs shot detection so as to segment the trailer into highly correlated clips, providing a more cohesive input for pretrained backbones and improving transferability (a 1.83% increase for ImageNet and 3.75% for Kinetics). Our results demonstrate that representations learned on either ImageNet or Kinetics are comparatively transferable to Trailers12k. Moreover, both datasets provide complementary information that can be combined to improve classification performance (a 2.91% gain compared to the top single pretraining). Interestingly, using lightweight ConvNets as pretrained backbones resulted in only a 3.46% drop in classification performance compared with the top Transformer while requiring only 11.82% of its parameters and 0.81% of its FLOPS.
    Motif-aware temporal GCN for fraud detection in signed cryptocurrency trust networks. (arXiv:2211.13123v2 [cs.LG] UPDATED)
    Graph convolutional networks (GCNs) is a class of artificial neural networks for processing data that can be represented as graphs. Since financial transactions can naturally be constructed as graphs, GCNs are widely applied in the financial industry, especially for financial fraud detection. In this paper, we focus on fraud detection on cryptocurrency truct networks. In the literature, most works focus on static networks. Whereas in this study, we consider the evolving nature of cryptocurrency networks, and use local structural as well as the balance theory to guide the training process. More specifically, we compute motif matrices to capture the local topological information, then use them in the GCN aggregation process. The generated embedding at each snapshot is a weighted average of embeddings within a time window, where the weights are learnable parameters. Since the trust networks is signed on each edge, balance theory is used to guide the training process. Experimental results on bitcoin-alpha and bitcoin-otc datasets show that the proposed model outperforms those in the literature.
    From {Solution Synthesis} to {Student Attempt Synthesis} for Block-Based Visual Programming Tasks. (arXiv:2205.01265v3 [cs.AI] UPDATED)
    Block-based visual programming environments are increasingly used to introduce computing concepts to beginners. Given that programming tasks are open-ended and conceptual, novice students often struggle when learning in these environments. AI-driven programming tutors hold great promise in automatically assisting struggling students, and need several components to realize this potential. We investigate the crucial component of student modeling, in particular, the ability to automatically infer students' misconceptions for predicting (synthesizing) their behavior. We introduce a novel benchmark, StudentSyn, centered around the following challenge: For a given student, synthesize the student's attempt on a new target task after observing the student's attempt on a fixed reference task. This challenge is akin to that of program synthesis; however, instead of synthesizing a {solution} (i.e., program an expert would write), the goal here is to synthesize a {student attempt} (i.e., program that a given student would write). We first show that human experts (TutorSS) can achieve high performance on the benchmark, whereas simple baselines perform poorly. Then, we develop two neuro/symbolic techniques (NeurSS and SymSS) in a quest to close this gap with TutorSS.
    Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation. (arXiv:2207.04475v2 [stat.ML] UPDATED)
    This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a $d$-dimensional linear system $\bar{\mathbf{A}} \theta = \bar{\mathbf{b}}$ for which $(\bar{\mathbf{A}}, \bar{\mathbf{b}})$ can only be estimated by (asymptotically) unbiased observations $\{(\mathbf{A}(Z_n),\mathbf{b}(Z_n))\}_{n \in \mathbb{N}}$. We consider here the case where $\{Z_n\}_{n \in \mathbb{N}}$ is an i.i.d. sequence or a uniformly geometrically ergodic Markov chain. We derive $p$-th moment and high-probability deviation bounds for the iterates defined by LSA and its Polyak-Ruppert-averaged version. Our finite-time instance-dependent bounds for the averaged LSA iterates are sharp in the sense that the leading term we obtain coincides with the local asymptotic minimax limit. Moreover, the remainder terms of our bounds admit a tight dependence on the mixing time $t_{\operatorname{mix}}$ of the underlying chain and the norm of the noise variables. We emphasize that our result requires the SA step size to scale only with logarithm of the problem dimension $d$.
    Quadratic Gradient: Combining Gradient Algorithms and Newton's Method as One. (arXiv:2209.03282v2 [math.OC] UPDATED)
    It might be inadequate for the line search technique for Newton's method to use only one floating point number. A column vector of the same size as the gradient might be better than a mere float number to accelerate each of the gradient elements with different rates. Moreover, a square matrix of the same order as the Hessian matrix might be helpful to correct the Hessian matrix. Chiang applied something between a column vector and a square matrix, namely a diagonal matrix, to accelerate the gradient and further proposed a faster gradient variant called quadratic gradient. In this paper, we present a new way to build a new version of the quadratic gradient. This new quadratic gradient doesn't satisfy the convergence conditions of the fixed Hessian Newton's method. However, experimental results show that it sometimes has a better performance than the original one in convergence rate. Also, Chiang speculates that there might be a relation between the Hessian matrix and the learning rate for the first-order gradient descent method. We prove that the floating number $\frac{1}{\epsilon + \max \{| \lambda_i | \}}$ can be a good learning rate of the gradient methods, where $\epsilon$ is a number to avoid division by zero and $\lambda_i$ the eigenvalues of the Hessian matrix.
    Attention in a family of Boltzmann machines emerging from modern Hopfield networks. (arXiv:2212.04692v2 [cs.LG] UPDATED)
    Hopfield networks and Boltzmann machines (BMs) are fundamental energy-based neural network models. Recent studies on modern Hopfield networks have broaden the class of energy functions and led to a unified perspective on general Hopfield networks including an attention module. In this letter, we consider the BM counterparts of modern Hopfield networks using the associated energy functions, and study their salient properties from a trainability perspective. In particular, the energy function corresponding to the attention module naturally introduces a novel BM, which we refer to as the attentional BM (AttnBM). We verify that AttnBM has a tractable likelihood function and gradient for certain special cases and is easy to train. Moreover, we reveal the hidden connections between AttnBM and some single-layer models, namely the Gaussian--Bernoulli restricted BM and the denoising autoencoder with softmax units coming from denoising score matching. We also investigate BMs introduced by other energy functions and show that the energy function of dense associative memory models gives BMs belonging to Exponential Family Harmoniums.
    MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks. (arXiv:2303.16839v1 [cs.CV])
    The development of language models have moved from encoder-decoder to decoder-only designs. In addition, the common knowledge has it that the two most popular multimodal tasks, the generative and contrastive tasks, tend to conflict with one another, are hard to accommodate in one architecture, and further need complex adaptations for downstream tasks. We propose a novel paradigm of training with a decoder-only model for multimodal tasks, which is surprisingly effective in jointly learning of these disparate vision-language tasks. This is done with a simple model, called MaMMUT. It consists of a single vision encoder and a text decoder, and is able to accommodate contrastive and generative learning by a novel two-pass approach on the text decoder. We demonstrate that joint training of these diverse-objective tasks is simple, effective, and maximizes the weight-sharing of the model. Furthermore, the same architecture enables straightforward extensions to open-vocabulary object detection and video-language tasks. The model tackles a diverse range of tasks, while being modest in capacity. Our model achieves the SOTA on image-text and text-image retrieval, video question answering and open-vocabulary detection tasks, outperforming much larger and more extensively trained foundational models. It shows competitive results on VQA and Video Captioning, especially considering its size. Ablations confirm the flexibility and advantages of our approach.
    Inflation forecasting with attention based transformer neural networks. (arXiv:2303.15364v2 [econ.EM] UPDATED)
    Inflation is a major determinant for allocation decisions and its forecast is a fundamental aim of governments and central banks. However, forecasting inflation is not a trivial task, as its prediction relies on low frequency, highly fluctuating data with unclear explanatory variables. While classical models show some possibility of predicting inflation, reliably beating the random walk benchmark remains difficult. Recently, (deep) neural networks have shown impressive results in a multitude of applications, increasingly setting the new state-of-the-art. This paper investigates the potential of the transformer deep neural network architecture to forecast different inflation rates. The results are compared to a study on classical time series and machine learning models. We show that our adapted transformer, on average, outperforms the baseline in 6 out of 16 experiments, showing best scores in two out of four investigated inflation rates. Our results demonstrate that a transformer based neural network can outperform classical regression and machine learning models in certain inflation rates and forecasting horizons.
    Bi-directional Training for Composed Image Retrieval via Text Prompt Learning. (arXiv:2303.16604v1 [cs.CV])
    Composed image retrieval searches for a target image based on a multi-modal user query comprised of a reference image and modification text describing the desired changes. Existing approaches to solving this challenging task learn a mapping from the (reference image, modification text)-pair to an image embedding that is then matched against a large image corpus. One area that has not yet been explored is the reverse direction, which asks the question, what reference image when modified as describe by the text would produce the given target image? In this work we propose a bi-directional training scheme that leverages such reversed queries and can be applied to existing composed image retrieval architectures. To encode the bi-directional query we prepend a learnable token to the modification text that designates the direction of the query and then finetune the parameters of the text embedding module. We make no other changes to the network architecture. Experiments on two standard datasets show that our novel approach achieves improved performance over a baseline BLIP-based model that itself already achieves state-of-the-art performance.
    InceptionNeXt: When Inception Meets ConvNeXt. (arXiv:2303.16900v1 [cs.CV])
    Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7x7 depthwise convolution. Although such depthwise operator only consumes a few FLOPs, it largely harms the model efficiency on powerful computing devices due to the high memory access costs. For example, ConvNeXt-T has similar FLOPs with ResNet-50 but only achieves 60% throughputs when trained on A100 GPUs with full precision. Although reducing the kernel size of ConvNeXt can improve speed, it results in significant performance degradation. It is still unclear how to speed up large-kernel-based CNN models while preserving their performance. To tackle this issue, inspired by Inceptions, we propose to decompose large-kernel depthwise convolution into four parallel branches along channel dimension, i.e. small square kernel, two orthogonal band kernels, and an identity mapping. With this new Inception depthwise convolution, we build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance. For instance, InceptionNeXt-T achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K. We anticipate InceptionNeXt can serve as an economical baseline for future architecture design to reduce carbon footprint. Code is available at https://github.com/sail-sg/inceptionnext.
    Not cool, calm or collected: Using emotional language to detect COVID-19 misinformation. (arXiv:2303.16777v1 [cs.CL])
    COVID-19 misinformation on social media platforms such as twitter is a threat to effective pandemic management. Prior works on tweet COVID-19 misinformation negates the role of semantic features common to twitter such as charged emotions. Thus, we present a novel COVID-19 misinformation model, which uses both a tweet emotion encoder and COVID-19 misinformation encoder to predict whether a tweet contains COVID-19 misinformation. Our emotion encoder was fine-tuned on a novel annotated dataset and our COVID-19 misinformation encoder was fine-tuned on a subset of the COVID-HeRA dataset. Experimental results show superior results using the combination of emotion and misinformation encoders as opposed to a misinformation classifier alone. Furthermore, extensive result analysis was conducted, highlighting low quality labels and mismatched label distributions as key limitations to our study.
    GRAF: Graph Attention-aware Fusion Networks. (arXiv:2303.16781v1 [cs.LG])
    A large number of real-world networks include multiple types of nodes and edges. Graph Neural Network (GNN) emerged as a deep learning framework to utilize node features on graph-structured data showing superior performance. However, popular GNN-based architectures operate on one homogeneous network. Enabling them to work on multiple networks brings additional challenges due to the heterogeneity of the networks and the multiplicity of the existing associations. In this study, we present a computational approach named GRAF utilizing GNN-based approaches on multiple networks with the help of attention mechanisms and network fusion. Using attention-based neighborhood aggregation, GRAF learns the importance of each neighbor per node (called node-level attention) followed by the importance of association (called association-level attention) in a hierarchical way. Then, GRAF processes a network fusion step weighing each edge according to learned node- and association-level attention, which results in a fused enriched network. Considering that the fused network could be a highly dense network with many weak edges depending on the given input networks, we included an edge elimination step with respect to edges' weights. Finally, GRAF utilizes Graph Convolutional Network (GCN) on the fused network and incorporates the node features on the graph-structured data for the prediction task or any other downstream analysis. Our extensive evaluations of prediction tasks from different domains showed that GRAF outperformed the state-of-the-art methods. Utilization of learned node-level and association-level attention allowed us to prioritize the edges properly. The source code for our tool is publicly available at https://github.com/bozdaglab/GRAF.
    IOP-FL: Inside-Outside Personalization for Federated Medical Image Segmentation. (arXiv:2204.08467v2 [eess.IV] UPDATED)
    Federated learning (FL) allows multiple medical institutions to collaboratively learn a global model without centralizing client data. It is difficult, if possible at all, for such a global model to commonly achieve optimal performance for each individual client, due to the heterogeneity of medical images from various scanners and patient demographics. This problem becomes even more significant when deploying the global model to unseen clients outside the FL with unseen distributions not presented during federated training. To optimize the prediction accuracy of each individual client for medical imaging tasks, we propose a novel unified framework for both \textit{Inside and Outside model Personalization in FL} (IOP-FL). Our inside personalization uses a lightweight gradient-based approach that exploits the local adapted model for each client, by accumulating both the global gradients for common knowledge and the local gradients for client-specific optimization. Moreover, and importantly, the obtained local personalized models and the global model can form a diverse and informative routing space to personalize an adapted model for outside FL clients. Hence, we design a new test-time routing scheme using the consistency loss with a shape constraint to dynamically incorporate the models, given the distribution information conveyed by the test data. Our extensive experimental results on two medical image segmentation tasks present significant improvements over SOTA methods on both inside and outside personalization, demonstrating the potential of our IOP-FL scheme for clinical practice.
    Global Convergence of Over-parameterized Deep Equilibrium Models. (arXiv:2205.13814v2 [cs.LG] UPDATED)
    A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on the initial equilibrium point, we show that the unique equilibrium point always exists during the training process, and the gradient descent is proved to converge to a globally optimal solution at a linear convergence rate for the quadratic loss function. In order to show that the required initial condition is satisfied via mild over-parameterization, we perform a fine-grained analysis on random DEQs. We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.
    Interpolating Discriminant Functions in High-Dimensional Gaussian Latent Mixtures. (arXiv:2210.14347v2 [stat.ML] UPDATED)
    This paper considers binary classification of high-dimensional features under a postulated model with a low-dimensional latent Gaussian mixture structure and non-vanishing noise. A generalized least squares estimator is used to estimate the direction of the optimal separating hyperplane. The estimated hyperplane is shown to interpolate on the training data. While the direction vector can be consistently estimated as could be expected from recent results in linear regression, a naive plug-in estimate fails to consistently estimate the intercept. A simple correction, that requires an independent hold-out sample, renders the procedure minimax optimal in many scenarios. The interpolation property of the latter procedure can be retained, but surprisingly depends on the way the labels are encoded.
    Towards Understanding the Effect of Pretraining Label Granularity. (arXiv:2303.16887v1 [cs.CV])
    In this paper, we study how pretraining label granularity affects the generalization of deep neural networks in image classification tasks. We focus on the "fine-to-coarse" transfer learning setting where the pretraining label is more fine-grained than that of the target problem. We experiment with this method using the label hierarchy of iNaturalist 2021, and observe a 8.76% relative improvement of the error rate over the baseline. We find the following conditions are key for the improvement: 1) the pretraining dataset has a strong and meaningful label hierarchy, 2) its label function strongly aligns with that of the target task, and most importantly, 3) an appropriate level of pretraining label granularity is chosen. The importance of pretraining label granularity is further corroborated by our transfer learning experiments on ImageNet. Most notably, we show that pretraining at the leaf labels of ImageNet21k produces better transfer results on ImageNet1k than pretraining at other coarser granularity levels, which supports the common practice. Theoretically, through an analysis on a two-layer convolutional ReLU network, we prove that: 1) models trained on coarse-grained labels only respond strongly to the common or "easy-to-learn" features; 2) with the dataset satisfying the right conditions, fine-grained pretraining encourages the model to also learn rarer or "harder-to-learn" features well, thus improving the model's generalization.
    Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness. (arXiv:2006.13726v4 [cs.CV] UPDATED)
    Evaluating the robustness of a defense model is a challenging task in adversarial robustness research. Obfuscated gradients have previously been found to exist in many defense methods and cause a false signal of robustness. In this paper, we identify a more subtle situation called Imbalanced Gradients that can also cause overestimated adversarial robustness. The phenomenon of imbalanced gradients occurs when the gradient of one term of the margin loss dominates and pushes the attack towards to a suboptimal direction. To exploit imbalanced gradients, we formulate a Margin Decomposition (MD) attack that decomposes a margin loss into individual terms and then explores the attackability of these terms separately via a two-stage process. We also propose a multi-targeted and ensemble version of our MD attack. By investigating 24 defense models proposed since 2018, we find that 11 models are susceptible to a certain degree of imbalanced gradients and our MD attack can decrease their robustness evaluated by the best standalone baseline attack by more than 1%. We also provide an in-depth investigation on the likely causes of imbalanced gradients and effective countermeasures. Our code is available at https://github.com/HanxunH/MDAttack.
    Compute and Energy Consumption Trends in Deep Learning Inference. (arXiv:2109.05472v2 [cs.LG] UPDATED)
    The progress of some AI paradigms such as deep learning is said to be linked to an exponential growth in the number of parameters. There are many studies corroborating these trends, but does this translate into an exponential increase in energy consumption? In order to answer this question we focus on inference costs rather than training costs, as the former account for most of the computing effort, solely because of the multiplicative factors. Also, apart from algorithmic innovations, we account for more specific and powerful hardware (leading to higher FLOPS) that is usually accompanied with important energy efficiency optimisations. We also move the focus from the first implementation of a breakthrough paper towards the consolidated version of the techniques one or two year later. Under this distinctive and comprehensive perspective, we study relevant models in the areas of computer vision and natural language processing: for a sustained increase in performance we see a much softer growth in energy consumption than previously anticipated. The only caveat is, yet again, the multiplicative factor, as future AI increases penetration and becomes more pervasive.
    Single-Pass Contrastive Learning Can Work for Both Homophilic and Heterophilic Graph. (arXiv:2211.10890v2 [cs.LG] UPDATED)
    Existing graph contrastive learning (GCL) techniques typically require two forward passes for a single instance to construct the contrastive loss, which is effective for capturing the low-frequency signals of node features. Such a dual-pass design has shown empirical success on homophilic graphs, but its effectiveness on heterophilic graphs, where directly connected nodes typically have different labels, is unknown. In addition, existing GCL approaches fail to provide strong performance guarantees. Coupled with the unpredictability of GCL approaches on heterophilic graphs, their applicability in real-world contexts is limited. Then, a natural question arises: Can we design a GCL method that works for both homophilic and heterophilic graphs with a performance guarantee? To answer this question, we theoretically study the concentration property of features obtained by neighborhood aggregation on homophilic and heterophilic graphs, introduce the single-pass graph contrastive learning loss based on the property, and provide performance guarantees for the minimizer of the loss on downstream tasks. As a direct consequence of our analysis, we implement the Single-Pass Graph Contrastive Learning method (SP-GCL). Empirically, on 14 benchmark datasets with varying degrees of homophily, the features learned by the SP-GCL can match or outperform existing strong baselines with significantly less computational overhead, which demonstrates the usefulness of our findings in real-world cases.
    A Hierarchical Game-Theoretic Decision-Making for Cooperative Multi-Agent Systems Under the Presence of Adversarial Agents. (arXiv:2303.16641v1 [cs.MA])
    Underlying relationships among Multi-Agent Systems (MAS) in hazardous scenarios can be represented as Game-theoretic models. This paper proposes a new hierarchical network-based model called Game-theoretic Utility Tree (GUT), which decomposes high-level strategies into executable low-level actions for cooperative MAS decisions. It combines with a new payoff measure based on agent needs for real-time strategy games. We present an Explore game domain, where we measure the performance of MAS achieving tasks from the perspective of balancing the success probability and system costs. We evaluate the GUT approach against state-of-the-art methods that greedily rely on rewards of the composite actions. Conclusive results on extensive numerical simulations indicate that GUT can organize more complex relationships among MAS cooperation, helping the group achieve challenging tasks with lower costs and higher winning rates. Furthermore, we demonstrated the applicability of the GUT using the simulator-hardware testbed - Robotarium. The performances verified the effectiveness of the GUT in the real robot application and validated that the GUT could effectively organize MAS cooperation strategies, helping the group with fewer advantages achieve higher performance.
    Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data. (arXiv:2206.08330v2 [cs.LG] UPDATED)
    We propose Compressed Vertical Federated Learning (C-VFL) for communication-efficient training on vertically partitioned data. In C-VFL, a server and multiple parties collaboratively train a model on their respective features utilizing several local iterations and sharing compressed intermediate results periodically. Our work provides the first theoretical analysis of the effect message compression has on distributed training over vertically partitioned data. We prove convergence of non-convex objectives at a rate of $O(\frac{1}{\sqrt{T}})$ when the compression error is bounded over the course of training. We provide specific requirements for convergence with common compression techniques, such as quantization and top-$k$ sparsification. Finally, we experimentally show compression can reduce communication by over $90\%$ without a significant decrease in accuracy over VFL without compression.
    The Hidden-Manifold Hopfield Model and a learning phase transition. (arXiv:2303.16880v1 [cond-mat.dis-nn])
    The Hopfield model has a long-standing tradition in statistical physics, being one of the few neural networks for which a theory is available. Extending the theory of Hopfield models for correlated data could help understand the success of deep neural networks, for instance describing how they extract features from data. Motivated by this, we propose and investigate a generalized Hopfield model that we name Hidden-Manifold Hopfield Model: we generate the couplings from $P=\alpha N$ examples with the Hebb rule using a non-linear transformation of $D=\alpha_D N$ random vectors that we call factors, with $N$ the number of neurons. Using the replica method, we obtain a phase diagram for the model that shows a phase transition where the factors hidden in the examples become attractors of the dynamics; this phase exists above a critical value of $\alpha$ and below a critical value of $\alpha_D$. We call this behaviour learning transition.
    The Prominence of Artificial Intelligence in COVID-19. (arXiv:2111.09537v3 [cs.LG] UPDATED)
    In December 2019, a novel virus called COVID-19 had caused an enormous number of causalities to date. The battle with the novel Coronavirus is baffling and horrifying after the Spanish Flu 2019. While the front-line doctors and medical researchers have made significant progress in controlling the spread of the highly contiguous virus, technology has also proved its significance in the battle. Moreover, Artificial Intelligence has been adopted in many medical applications to diagnose many diseases, even baffling experienced doctors. Therefore, this survey paper explores the methodologies proposed that can aid doctors and researchers in early and inexpensive methods of diagnosis of the disease. Most developing countries have difficulties carrying out tests using the conventional manner, but a significant way can be adopted with Machine and Deep Learning. On the other hand, the access to different types of medical images has motivated the researchers. As a result, a mammoth number of techniques are proposed. This paper first details the background knowledge of the conventional methods in the Artificial Intelligence domain. Following that, we gather the commonly used datasets and their use cases to date. In addition, we also show the percentage of researchers adopting Machine Learning over Deep Learning. Thus we provide a thorough analysis of this scenario. Lastly, in the research challenges, we elaborate on the problems faced in COVID-19 research, and we address the issues with our understanding to build a bright and healthy environment.
    PAC-Bayesian bounds for learning LTI-ss systems with input from empirical loss. (arXiv:2303.16816v1 [stat.ML])
    In this paper we derive a Probably Approxilmately Correct(PAC)-Bayesian error bound for linear time-invariant (LTI) stochastic dynamical systems with inputs. Such bounds are widespread in machine learning, and they are useful for characterizing the predictive power of models learned from finitely many data points. In particular, with the bound derived in this paper relates future average prediction errors with the prediction error generated by the model on the data used for learning. In turn, this allows us to provide finite-sample error bounds for a wide class of learning/system identification algorithms. Furthermore, as LTI systems are a sub-class of recurrent neural networks (RNNs), these error bounds could be a first step towards PAC-Bayesian bounds for RNNs.
    What Does the Gradient Tell When Attacking the Graph Structure. (arXiv:2208.12815v2 [cs.LG] UPDATED)
    Recent research has revealed that Graph Neural Networks (GNNs) are susceptible to adversarial attacks targeting the graph structure. A malicious attacker can manipulate a limited number of edges, given the training labels, to impair the victim model's performance. Previous empirical studies indicate that gradient-based attackers tend to add edges rather than remove them. In this paper, we present a theoretical demonstration revealing that attackers tend to increase inter-class edges due to the message passing mechanism of GNNs, which explains some previous empirical observations. By connecting dissimilar nodes, attackers can more effectively corrupt node features, making such attacks more advantageous. However, we demonstrate that the inherent smoothness of GNN's message passing tends to blur node dissimilarity in the feature space, leading to the loss of crucial information during the forward process. To address this issue, we propose a novel surrogate model with multi-level propagation that preserves the node dissimilarity information. This model parallelizes the propagation of unaggregated raw features and multi-hop aggregated features, while introducing batch normalization to enhance the dissimilarity in node representations and counteract the smoothness resulting from topological aggregation. Our experiments show significant improvement with our approach.Furthermore, both theoretical and experimental evidence suggest that adding inter-class edges constitutes an easily observable attack pattern. We propose an innovative attack loss that balances attack effectiveness and imperceptibility, sacrificing some attack effectiveness to attain greater imperceptibility. We also provide experiments to validate the compromise performance achieved through this attack loss.
    Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits. (arXiv:2206.05404v3 [stat.ML] UPDATED)
    We propose a linear contextual bandit algorithm with $O(\sqrt{dT\log T})$ regret bound, where $d$ is the dimension of contexts and $T$ isthe time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization. Depending on the randomization, our proposed estimator takes contributions either from contexts of all arms or from selected contexts. We establish a self-normalized bound for our estimator, which allows a novel decomposition of the cumulative regret into \textit{additive} dimension-dependent terms instead of multiplicative terms. We also prove a novel lower bound of $\Omega(\sqrt{dT})$ under our problem setting. Hence, the regret of our proposed algorithm matches the lower bound up to logarithmic factors. The numerical experiments support the theoretical guarantees and show that our proposed method outperforms the existing linear bandit algorithms.
    Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations. (arXiv:2209.11908v5 [cs.LG] UPDATED)
    Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p<.05) and personalization (p<.05) performance.
    Deep Reinforcement Learning Based Joint Downlink Beamforming and RIS Configuration in RIS-aided MU-MISO Systems Under Hardware Impairments and Imperfect CSI. (arXiv:2211.09702v2 [cs.NI] UPDATED)
    We introduce a novel deep reinforcement learning (DRL) approach to jointly optimize transmit beamforming and reconfigurable intelligent surface (RIS) phase shifts in a multiuser multiple input single output (MU-MISO) system to maximize the sum downlink rate under the phase-dependent reflection amplitude model. Our approach addresses the challenge of imperfect channel state information (CSI) and hardware impairments by considering a practical RIS amplitude model. We compare the performance of our approach against a vanilla DRL agent in two scenarios: perfect CSI and phase-dependent RIS amplitudes, and mismatched CSI and ideal RIS reflections. The results demonstrate that the proposed framework significantly outperforms the vanilla DRL agent under mismatch and approaches the golden standard. Our contributions include modifications to the DRL approach to address the joint design of transmit beamforming and phase shifts and the phase-dependent amplitude model. To the best of our knowledge, our method is the first DRL-based approach for the phase-dependent reflection amplitude model in RIS-aided MU-MISO systems. Our findings in this study highlight the potential of our approach as a promising solution to overcome hardware impairments in RIS-aided wireless communication systems.
    Learning Identity-Preserving Transformations on Data Manifolds. (arXiv:2106.12096v2 [cs.LG] UPDATED)
    Many machine learning techniques incorporate identity-preserving transformations into their models to generalize their performance to previously unseen data. These transformations are typically selected from a set of functions that are known to maintain the identity of an input when applied (e.g., rotation, translation, flipping, and scaling). However, there are many natural variations that cannot be labeled for supervision or defined through examination of the data. As suggested by the manifold hypothesis, many of these natural variations live on or near a low-dimensional, nonlinear manifold. Several techniques represent manifold variations through a set of learned Lie group operators that define directions of motion on the manifold. However, these approaches are limited because they require transformation labels when training their models and they lack a method for determining which regions of the manifold are appropriate for applying each specific operator. We address these limitations by introducing a learning strategy that does not require transformation labels and developing a method that learns the local regions where each operator is likely to be used while preserving the identity of inputs. Experiments on MNIST and Fashion MNIST highlight our model's ability to learn identity-preserving transformations on multi-class datasets. Additionally, we train on CelebA to showcase our model's ability to learn semantically meaningful transformations on complex datasets in an unsupervised manner.
    Systematic human learning and generalization from a brief tutorial with explanatory feedback. (arXiv:2107.06994v2 [cs.LG] UPDATED)
    Neural networks have long been used to model human intelligence, capturing elements of behavior and cognition, and their neural basis. Recent advancements in deep learning have enabled neural network models to reach and even surpass human levels of intelligence in many respects, yet unlike humans, their ability to learn new tasks quickly remains a challenge. People can reason not only in familiar domains, but can also rapidly learn to reason through novel problems and situations, raising the question of how well modern neural network models capture human intelligence and in which ways they diverge. In this work, we explore this gap by investigating human adults' ability to learn an abstract reasoning task based on Sudoku from a brief instructional tutorial with explanatory feedback for incorrect responses using a narrow range of training examples. We find that participants who master the task do so within a small number of trials and generalize well to puzzles outside of the training range. We also find that most of those who master the task can describe a valid solution strategy, and such participants perform better on transfer puzzles than those whose strategy descriptions are vague or incomplete. Interestingly, fewer than half of our human participants were successful in acquiring a valid solution strategy, and this ability is associated with high school mathematics education. We consider the challenges these findings pose for building computational models that capture all aspects of our findings and point toward a possible role for learning to engage in explanation-based reasoning to support rapid learning and generalization.
    Maximum likelihood method revisited: Gauge symmetry in Kullback -- Leibler divergence and performance-guaranteed regularization. (arXiv:2303.16721v1 [stat.ML])
    The maximum likelihood method is the best-known method for estimating the probabilities behind the data. However, the conventional method obtains the probability model closest to the empirical distribution, resulting in overfitting. Then regularization methods prevent the model from being excessively close to the wrong probability, but little is known systematically about their performance. The idea of regularization is similar to error-correcting codes, which obtain optimal decoding by mixing suboptimal solutions with an incorrectly received code. The optimal decoding in error-correcting codes is achieved based on gauge symmetry. We propose a theoretically guaranteed regularization in the maximum likelihood method by focusing on a gauge symmetry in Kullback -- Leibler divergence. In our approach, we obtain the optimal model without the need to search for hyperparameters frequently appearing in regularization.
    ACO-tagger: A Novel Method for Part-of-Speech Tagging using Ant Colony Optimization. (arXiv:2303.16760v1 [cs.CL])
    Swarm Intelligence algorithms have gained significant attention in recent years as a means of solving complex and non-deterministic problems. These algorithms are inspired by the collective behavior of natural creatures, and they simulate this behavior to develop intelligent agents for computational tasks. One such algorithm is Ant Colony Optimization (ACO), which is inspired by the foraging behavior of ants and their pheromone laying mechanism. ACO is used for solving difficult problems that are discrete and combinatorial in nature. Part-of-Speech (POS) tagging is a fundamental task in natural language processing that aims to assign a part-of-speech role to each word in a sentence. In this research paper, proposed a high-performance POS-tagging method based on ACO called ACO-tagger. This method achieved a high accuracy rate of 96.867%, outperforming several state-of-the-art methods. The proposed method is fast and efficient, making it a viable option for practical applications.
    Optimal approximation of $C^k$-functions using shallow complex-valued neural networks. (arXiv:2303.16813v1 [math.FA])
    We prove a quantitative result for the approximation of functions of regularity $C^k$ (in the sense of real variables) defined on the complex cube $\Omega_n := [-1,1]^n +i[-1,1]^n\subseteq \mathbb{C}^n$ using shallow complex-valued neural networks. Precisely, we consider neural networks with a single hidden layer and $m$ neurons, i.e., networks of the form $z \mapsto \sum_{j=1}^m \sigma_j \cdot \phi\big(\rho_j^T z + b_j\big)$ and show that one can approximate every function in $C^k \left( \Omega_n; \mathbb{C}\right)$ using a function of that form with error of the order $m^{-k/(2n)}$ as $m \to \infty$, provided that the activation function $\phi: \mathbb{C} \to \mathbb{C}$ is smooth but not polyharmonic on some non-empty open set. Furthermore, we show that the selection of the weights $\sigma_j, b_j \in \mathbb{C}$ and $\rho_j \in \mathbb{C}^n$ is continuous with respect to $f$ and prove that the derived rate of approximation is optimal under this continuity assumption. We also discuss the optimality of the result for a possibly discontinuous choice of the weights.
    Machine Learning for Uncovering Biological Insights in Spatial Transcriptomics Data. (arXiv:2303.16725v1 [q-bio.QM])
    Development and homeostasis in multicellular systems both require exquisite control over spatial molecular pattern formation and maintenance. Advances in spatially-resolved and high-throughput molecular imaging methods such as multiplexed immunofluorescence and spatial transcriptomics (ST) provide exciting new opportunities to augment our fundamental understanding of these processes in health and disease. The large and complex datasets resulting from these techniques, particularly ST, have led to rapid development of innovative machine learning (ML) tools primarily based on deep learning techniques. These ML tools are now increasingly featured in integrated experimental and computational workflows to disentangle signals from noise in complex biological systems. However, it can be difficult to understand and balance the different implicit assumptions and methodologies of a rapidly expanding toolbox of analytical tools in ST. To address this, we summarize major ST analysis goals that ML can help address and current analysis trends. We also describe four major data science concepts and related heuristics that can help guide practitioners in their choices of the right tools for the right biological questions.
    Decision Making for Autonomous Driving in Interactive Merge Scenarios via Learning-based Prediction. (arXiv:2303.16821v1 [cs.RO])
    Autonomous agents that drive on roads shared with human drivers must reason about the nuanced interactions among traffic participants. This poses a highly challenging decision making problem since human behavior is influenced by a multitude of factors (e.g., human intentions and emotions) that are hard to model. This paper presents a decision making approach for autonomous driving, focusing on the complex task of merging into moving traffic where uncertainty emanates from the behavior of other drivers and imperfect sensor measurements. We frame the problem as a partially observable Markov decision process (POMDP) and solve it online with Monte Carlo tree search. The solution to the POMDP is a policy that performs high-level driving maneuvers, such as giving way to an approaching car, keeping a safe distance from the vehicle in front or merging into traffic. Our method leverages a model learned from data to predict the future states of traffic while explicitly accounting for interactions among the surrounding agents. From these predictions, the autonomous vehicle can anticipate the future consequences of its actions on the environment and optimize its trajectory accordingly. We thoroughly test our approach in simulation, showing that the autonomous vehicle can adapt its behavior to different situations. We also compare against other methods, demonstrating an improvement with respect to the considered performance metrics.
    Multi-View Clustering via Semi-non-negative Tensor Factorization. (arXiv:2303.16748v1 [cs.LG])
    Multi-view clustering (MVC) based on non-negative matrix factorization (NMF) and its variants have received a huge amount of attention in recent years due to their advantages in clustering interpretability. However, existing NMF-based multi-view clustering methods perform NMF on each view data respectively and ignore the impact of between-view. Thus, they can't well exploit the within-view spatial structure and between-view complementary information. To resolve this issue, we present semi-non-negative tensor factorization (Semi-NTF) and develop a novel multi-view clustering based on Semi-NTF with one-side orthogonal constraint. Our model directly performs Semi-NTF on the 3rd-order tensor which is composed of anchor graphs of views. Thus, our model directly considers the between-view relationship. Moreover, we use the tensor Schatten p-norm regularization as a rank approximation of the 3rd-order tensor which characterizes the cluster structure of multi-view data and exploits the between-view complementary information. In addition, we provide an optimization algorithm for the proposed method and prove mathematically that the algorithm always converges to the stationary KKT point. Extensive experiments on various benchmark datasets indicate that our proposed method is able to achieve satisfactory clustering performance.
    Context-aware Bayesian Mixed Multinomial Logit Model. (arXiv:2210.05737v2 [stat.ML] UPDATED)
    The mixed multinomial logit model assumes constant preference parameters of a decision-maker throughout different choice situations, which may be considered too strong for certain choice modelling applications. This paper proposes an effective approach to model context-dependent intra-respondent heterogeneity, thereby introducing the concept of the Context-aware Bayesian mixed multinomial logit model, where a neural network maps contextual information to interpretable shifts in the preference parameters of each individual in each choice occasion. The proposed model offers several key advantages. First, it supports both continuous and discrete variables, as well as complex non-linear interactions between both types of variables. Secondly, each context specification is considered jointly as a whole by the neural network rather than each variable being considered independently. Finally, since the neural network parameters are shared across all decision-makers, it can leverage information from other decision-makers to infer the effect of a particular context on a particular decision-maker. Even though the context-aware Bayesian mixed multinomial logit model allows for flexible interactions between attributes, the increase in computational complexity is minor, compared to the mixed multinomial logit model. We illustrate the concept and interpretation of the proposed model in a simulation study. We furthermore present a real-world case study from the travel behaviour domain - a bicycle route choice model, based on a large-scale, crowdsourced dataset of GPS trajectories including 119,448 trips made by 8,555 cyclists.
    Look, Radiate, and Learn: Self-Supervised Localisation via Radio-Visual Correspondence. (arXiv:2206.06424v4 [cs.LG] UPDATED)
    Next generation cellular networks will implement radio sensing functions alongside customary communications, thereby enabling unprecedented worldwide sensing coverage outdoors. Deep learning has revolutionised computer vision but has had limited application to radio perception tasks, in part due to lack of systematic datasets and benchmarks dedicated to the study of the performance and promise of radio sensing. To address this gap, we present MaxRay: a synthetic radio-visual dataset and benchmark that facilitate precise target localisation in radio. We further propose to learn to localise targets in radio without supervision by extracting self-coordinates from radio-visual correspondence. We use such self-supervised coordinates to train a radio localiser network. We characterise our performance against a number of state-of-the-art baselines. Our results indicate that accurate radio target localisation can be automatically learned from paired radio-visual data without labels, which is important for empirical data. This opens the door for vast data scalability and may prove key to realising the promise of robust radio sensing atop a unified communication-perception cellular infrastructure. Dataset will be hosted on IEEE DataPort.
    Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos. (arXiv:2303.16897v1 [cs.CV])
    Modeling sounds emitted from physical object interactions is critical for immersive perceptual experiences in real and virtual worlds. Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound. However, they require fine details of both the object geometries and impact locations, which are rarely available in the real world and can not be applied to synthesize impact sounds from common videos. On the other hand, existing video-driven deep learning-based approaches could only capture the weak correspondence between visual content and impact sounds since they lack of physics knowledge. In this work, we propose a physics-driven diffusion model that can synthesize high-fidelity impact sound for a silent video clip. In addition to the video content, we propose to use additional physics priors to guide the impact sound synthesis procedure. The physics priors include both physics parameters that are directly estimated from noisy real-world impact sound examples without sophisticated setup and learned residual parameters that interpret the sound environment via neural networks. We further implement a novel diffusion model with specific training and inference strategies to combine physics priors and visual information for impact sound synthesis. Experimental results show that our model outperforms several existing systems in generating realistic impact sounds. More importantly, the physics-based representations are fully interpretable and transparent, thus enabling us to perform sound editing flexibly.
    Topological Point Cloud Clustering. (arXiv:2303.16716v1 [math.AT])
    We present Topological Point Cloud Clustering (TPCC), a new method to cluster points in an arbitrary point cloud based on their contribution to global topological features. TPCC synthesizes desirable features from spectral clustering and topological data analysis and is based on considering the spectral properties of a simplicial complex associated to the considered point cloud. As it is based on considering sparse eigenvector computations, TPCC is similarly easy to interpret and implement as spectral clustering. However, by focusing not just on a single matrix associated to a graph created from the point cloud data, but on a whole set of Hodge-Laplacians associated to an appropriately constructed simplicial complex, we can leverage a far richer set of topological features to characterize the data points within the point cloud and benefit from the relative robustness of topological techniques against noise. We test the performance of TPCC on both synthetic and real-world data and compare it with classical spectral clustering.
    Exploring celebrity influence on public attitude towards the COVID-19 pandemic: social media shared sentiment analysis. (arXiv:2303.16759v1 [cs.CL])
    The COVID-19 pandemic has introduced new opportunities for health communication, including an increase in the public use of online outlets for health-related emotions. People have turned to social media networks to share sentiments related to the impacts of the COVID-19 pandemic. In this paper we examine the role of social messaging shared by Persons in the Public Eye (i.e. athletes, politicians, news personnel) in determining overall public discourse direction. We harvested approximately 13 million tweets ranging from 1 January 2020 to 1 March 2022. The sentiment was calculated for each tweet using a fine-tuned DistilRoBERTa model, which was used to compare COVID-19 vaccine-related Twitter posts (tweets) that co-occurred with mentions of People in the Public Eye. Our findings suggest the presence of consistent patterns of emotional content co-occurring with messaging shared by Persons in the Public Eye for the first two years of the COVID-19 pandemic influenced public opinion and largely stimulated online public discourse. We demonstrate that as the pandemic progressed, public sentiment shared on social networks was shaped by risk perceptions, political ideologies and health-protective behaviours shared by Persons in the Public Eye, often in a negative light.
    Ensemble Learning Model on Artificial Neural Network-Backpropagation (ANN-BP) Architecture for Coal Pillar Stability Classification. (arXiv:2303.16524v1 [cs.LG])
    Pillars are important structural units used to ensure mining safety in underground hard rock mines. Therefore, precise predictions regarding the stability of underground pillars are required. One common index that is often used to assess pillar stability is the Safety Factor (SF). Unfortunately, such crisp boundaries in pillar stability assessment using SF are unreliable. This paper presents a novel application of Artificial Neural Network-Backpropagation (ANN-BP) and Deep Ensemble Learning for pillar stability classification. There are three types of ANN-BP used for the classification of pillar stability distinguished by their activation functions: ANN-BP ReLU, ANN-BP ELU, and ANN-BP GELU. This research also presents a new labeling alternative for pillar stability by considering its suitability with the SF. Thus, pillar stability is expanded into four categories: failed with a suitable safety factor, intact with a suitable safety factor, failed without a suitable safety factor, and intact without a suitable safety factor. There are five inputs used for each model: pillar width, mining height, bord width, depth to floor, and ratio. The results showed that the ANN-BP model with Ensemble Learning could improve ANN-BP performance with an average accuracy of 86.48% and an F_2-score of 96.35% for the category of failed with a suitable safety factor.
    PMAA: A Progressive Multi-scale Attention Autoencoder Model for High-Performance Cloud Removal from Multi-temporal Satellite Imagery. (arXiv:2303.16565v1 [cs.CV])
    Satellite imagery analysis plays a vital role in remote sensing, but the information loss caused by cloud cover seriously hinders its application. This study presents a high-performance cloud removal architecture called Progressive Multi-scale Attention Autoencoder (PMAA), which simultaneously leverages global and local information. It mainly consists of a cloud detection backbone and a cloud removal module. The cloud detection backbone uses cloud masks to reinforce cloudy areas to prompt the cloud removal module. The cloud removal module mainly comprises a novel Multi-scale Attention Module (MAM) and a Local Interaction Module (LIM). PMAA establishes the long-range dependency of multi-scale features using MAM and modulates the reconstruction of the fine-grained details using LIM, allowing for the simultaneous representation of fine- and coarse-grained features at the same level. With the help of diverse and multi-scale feature representation, PMAA outperforms the previous state-of-the-art model CTGAN consistently on the Sen2_MTC_Old and Sen2_MTC_New datasets. Furthermore, PMAA has a considerable efficiency advantage, with only 0.5% and 14.6% of the parameters and computational complexity of CTGAN, respectively. These extensive results highlight the potential of PMAA as a lightweight cloud removal network suitable for deployment on edge devices. We will release the code and trained models to facilitate the study in this direction.
    Deep transfer learning for detecting Covid-19, Pneumonia and Tuberculosis using CXR images -- A Review. (arXiv:2303.16754v1 [eess.IV])
    Chest X-rays remains to be the most common imaging modality used to diagnose lung diseases. However, they necessitate the interpretation of experts (radiologists and pulmonologists), who are few. This review paper investigates the use of deep transfer learning techniques to detect COVID-19, pneumonia, and tuberculosis in chest X-ray (CXR) images. It provides an overview of current state-of-the-art CXR image classification techniques and discusses the challenges and opportunities in applying transfer learning to this domain. The paper provides a thorough examination of recent research studies that used deep transfer learning algorithms for COVID-19, pneumonia, and tuberculosis detection, highlighting the advantages and disadvantages of these approaches. Finally, the review paper discusses future research directions in the field of deep transfer learning for CXR image classification, as well as the potential for these techniques to aid in the diagnosis and treatment of lung diseases.
    Supervised Learning for Table Tennis Match Prediction. (arXiv:2303.16776v1 [cs.LG])
    Machine learning, classification and prediction models have applications across a range of fields. Sport analytics is an increasingly popular application, but most existing work is focused on automated refereeing in mainstream sports and injury prevention. Research on other sports, such as table tennis, has only recently started gaining more traction. This paper proposes the use of machine learning to predict the outcome of table tennis single matches. We use player and match statistics as features and evaluate their relative importance in an ablation study. In terms of models, a number of popular models were explored. We found that 5-fold cross-validation and hyperparameter tuning was crucial to improve model performance. We investigated different feature aggregation strategies in our ablation study to demonstrate the robustness of the models. Different models performed comparably, with the accuracy of the results (61-70%) matching state-of-the-art models in comparable sports, such as tennis. The results can serve as a baseline for future table tennis prediction models, and can feed back to prediction research in similar ball sports.
    Randomly Projected Convex Clustering Model: Motivation, Realization, and Cluster Recovery Guarantees. (arXiv:2303.16841v1 [cs.LG])
    In this paper, we propose a randomly projected convex clustering model for clustering a collection of $n$ high dimensional data points in $\mathbb{R}^d$ with $K$ hidden clusters. Compared to the convex clustering model for clustering original data with dimension $d$, we prove that, under some mild conditions, the perfect recovery of the cluster membership assignments of the convex clustering model, if exists, can be preserved by the randomly projected convex clustering model with embedding dimension $m = O(\epsilon^{-2}\log(n))$, where $0 < \epsilon < 1$ is some given parameter. We further prove that the embedding dimension can be improved to be $O(\epsilon^{-2}\log(K))$, which is independent of the number of data points. Extensive numerical experiment results will be presented in this paper to demonstrate the robustness and superior performance of the randomly projected convex clustering model. The numerical results presented in this paper also demonstrate that the randomly projected convex clustering model can outperform the randomly projected K-means model in practice.
    Application of probabilistic modeling and automated machine learning framework for high-dimensional stress field. (arXiv:2303.16869v1 [cs.CE])
    Modern computational methods, involving highly sophisticated mathematical formulations, enable several tasks like modeling complex physical phenomenon, predicting key properties and design optimization. The higher fidelity in these computer models makes it computationally intensive to query them hundreds of times for optimization and one usually relies on a simplified model albeit at the cost of losing predictive accuracy and precision. Towards this, data-driven surrogate modeling methods have shown a lot of promise in emulating the behavior of the expensive computer models. However, a major bottleneck in such methods is the inability to deal with high input dimensionality and the need for relatively large datasets. With such problems, the input and output quantity of interest are tensors of high dimensionality. Commonly used surrogate modeling methods for such problems, suffer from requirements like high number of computational evaluations that precludes one from performing other numerical tasks like uncertainty quantification and statistical analysis. In this work, we propose an end-to-end approach that maps a high-dimensional image like input to an output of high dimensionality or its key statistics. Our approach uses two main framework that perform three steps: a) reduce the input and output from a high-dimensional space to a reduced or low-dimensional space, b) model the input-output relationship in the low-dimensional space, and c) enable the incorporation of domain-specific physical constraints as masks. In order to accomplish the task of reducing input dimensionality we leverage principal component analysis, that is coupled with two surrogate modeling methods namely: a) Bayesian hybrid modeling, and b) DeepHyper's deep neural networks. We demonstrate the applicability of the approach on a problem of a linear elastic stress field data.
    Physical Deep Reinforcement Learning Towards Safety Guarantee. (arXiv:2303.16860v1 [cs.LG])
    Deep reinforcement learning (DRL) has achieved tremendous success in many complex decision-making tasks of autonomous systems with high-dimensional state and/or action spaces. However, the safety and stability still remain major concerns that hinder the applications of DRL to safety-critical autonomous systems. To address the concerns, we proposed the Phy-DRL: a physical deep reinforcement learning framework. The Phy-DRL is novel in two architectural designs: i) Lyapunov-like reward, and ii) residual control (i.e., integration of physics-model-based control and data-driven control). The concurrent physical reward and residual control empower the Phy-DRL the (mathematically) provable safety and stability guarantees. Through experiments on the inverted pendulum, we show that the Phy-DRL features guaranteed safety and stability and enhanced robustness, while offering remarkably accelerated training and enlarged reward.
    Probabilistic inverse optimal control with local linearization for non-linear partially observable systems. (arXiv:2303.16698v1 [cs.LG])
    Inverse optimal control methods can be used to characterize behavior in sequential decision-making tasks. Most existing work, however, requires the control signals to be known, or is limited to fully-observable or linear systems. This paper introduces a probabilistic approach to inverse optimal control for stochastic non-linear systems with missing control signals and partial observability that unifies existing approaches. By using an explicit model of the noise characteristics of the sensory and control systems of the agent in conjunction with local linearization techniques, we derive an approximate likelihood for the model parameters, which can be computed within a single forward pass. We evaluate our proposed method on stochastic and partially observable version of classic control tasks, a navigation task, and a manual reaching task. The proposed method has broad applicability, ranging from imitation learning to sensorimotor neuroscience.
    Module-based regularization improves Gaussian graphical models when observing noisy data. (arXiv:2303.16796v1 [physics.data-an])
    Researchers often represent relations in multi-variate correlational data using Gaussian graphical models, which require regularization to sparsify the models. Acknowledging that they often study the modular structure of the inferred network, we suggest integrating it in the cross-validation of the regularization strength to balance under- and overfitting. Using synthetic and real data, we show that this approach allows us to better recover and infer modular structure in noisy data compared with the graphical lasso, a standard approach using the Gaussian log-likelihood when cross-validating the regularization strength.
    Zero-shot Entailment of Leaderboards for Empirical AI Research. (arXiv:2303.16835v1 [cs.CL])
    We present a large-scale empirical investigation of the zero-shot learning phenomena in a specific recognizing textual entailment (RTE) task category, i.e. the automated mining of leaderboards for Empirical AI Research. The prior reported state-of-the-art models for leaderboards extraction formulated as an RTE task, in a non-zero-shot setting, are promising with above 90% reported performances. However, a central research question remains unexamined: did the models actually learn entailment? Thus, for the experiments in this paper, two prior reported state-of-the-art models are tested out-of-the-box for their ability to generalize or their capacity for entailment, given leaderboard labels that were unseen during training. We hypothesize that if the models learned entailment, their zero-shot performances can be expected to be moderately high as well--perhaps, concretely, better than chance. As a result of this work, a zero-shot labeled dataset is created via distant labeling formulating the leaderboard extraction RTE task.
    Language Models Trained on Media Diets Can Predict Public Opinion. (arXiv:2303.16779v1 [cs.CL])
    Public opinion reflects and shapes societal behavior, but the traditional survey-based tools to measure it are limited. We introduce a novel approach to probe media diet models -- language models adapted to online news, TV broadcast, or radio show content -- that can emulate the opinions of subpopulations that have consumed a set of media. To validate this method, we use as ground truth the opinions expressed in U.S. nationally representative surveys on COVID-19 and consumer confidence. Our studies indicate that this approach is (1) predictive of human judgements found in survey response distributions and robust to phrasing and channels of media exposure, (2) more accurate at modeling people who follow media more closely, and (3) aligned with literature on which types of opinions are affected by media consumption. Probing language models provides a powerful new method for investigating media effects, has practical applications in supplementing polls and forecasting public opinion, and suggests a need for further study of the surprising fidelity with which neural language models can predict human responses.
    Fairlearn: Assessing and Improving Fairness of AI Systems. (arXiv:2303.16626v1 [cs.LG])
    Fairlearn is an open source project to help practitioners assess and improve fairness of artificial intelligence (AI) systems. The associated Python library, also named fairlearn, supports evaluation of a model's output across affected populations and includes several algorithms for mitigating fairness issues. Grounded in the understanding that fairness is a sociotechnical challenge, the project integrates learning resources that aid practitioners in considering a system's broader societal context.
    Multi-Agent Reinforcement Learning with Action Masking for UAV-enabled Mobile Communications. (arXiv:2303.16737v1 [cs.MA])
    Unmanned Aerial Vehicles (UAVs) are increasingly used as aerial base stations to provide ad hoc communications infrastructure. Building upon prior research efforts which consider either static nodes, 2D trajectories or single UAV systems, this paper focuses on the use of multiple UAVs for providing wireless communication to mobile users in the absence of terrestrial communications infrastructure. In particular, we jointly optimize UAV 3D trajectory and NOMA power allocation to maximize system throughput. Firstly, a weighted K-means-based clustering algorithm establishes UAV-user associations at regular intervals. The efficacy of training a novel Shared Deep Q-Network (SDQN) with action masking is then explored. Unlike training each UAV separately using DQN, the SDQN reduces training time by using the experiences of multiple UAVs instead of a single agent. We also show that SDQN can be used to train a multi-agent system with differing action spaces. Simulation results confirm that: 1) training a shared DQN outperforms a conventional DQN in terms of maximum system throughput (+20%) and training time (-10%); 2) it can converge for agents with different action spaces, yielding a 9% increase in throughput compared to mutual learning algorithms; and 3) combining NOMA with an SDQN architecture enables the network to achieve a better sum rate compared with existing baseline schemes.
    Diffusion Schr\"odinger Bridge Matching. (arXiv:2303.16852v1 [stat.ML])
    Solving transport problems, i.e. finding a map transporting one given distribution to another, has numerous applications in machine learning. Novel mass transport methods motivated by generative modeling have recently been proposed, e.g. Denoising Diffusion Models (DDMs) and Flow Matching Models (FMMs) implement such a transport through a Stochastic Differential Equation (SDE) or an Ordinary Differential Equation (ODE). However, while it is desirable in many applications to approximate the deterministic dynamic Optimal Transport (OT) map which admits attractive properties, DDMs and FMMs are not guaranteed to provide transports close to the OT map. In contrast, Schr\"odinger bridges (SBs) compute stochastic dynamic mappings which recover entropy-regularized versions of OT. Unfortunately, existing numerical methods approximating SBs either scale poorly with dimension or accumulate errors across iterations. In this work, we introduce Iterative Markovian Fitting, a new methodology for solving SB problems, and Diffusion Schr\"odinger Bridge Matching (DSBM), a novel numerical algorithm for computing IMF iterates. DSBM significantly improves over previous SB numerics and recovers as special/limiting cases various recent transport methods. We demonstrate the performance of DSBM on a variety of problems.
    ProtFIM: Fill-in-Middle Protein Sequence Design via Protein Language Models. (arXiv:2303.16452v1 [cs.LG])
    Protein language models (pLMs), pre-trained via causal language modeling on protein sequences, have been a promising tool for protein sequence design. In real-world protein engineering, there are many cases where the amino acids in the middle of a protein sequence are optimized while maintaining other residues. Unfortunately, because of the left-to-right nature of pLMs, existing pLMs modify suffix residues by prompting prefix residues, which are insufficient for the infilling task that considers the whole surrounding context. To find the more effective pLMs for protein engineering, we design a new benchmark, Secondary structureE InFilling rEcoveRy, SEIFER, which approximates infilling sequence design scenarios. With the evaluation of existing models on the benchmark, we reveal the weakness of existing language models and show that language models trained via fill-in-middle transformation, called ProtFIM, are more appropriate for protein engineering. Also, we prove that ProtFIM generates protein sequences with decent protein representations through exhaustive experiments and visualizations.
    Futures Quantitative Investment with Heterogeneous Continual Graph Neural Network. (arXiv:2303.16532v1 [cs.LG])
    It is a challenging problem to predict trends of futures prices with traditional econometric models as one needs to consider not only futures' historical data but also correlations among different futures. Spatial-temporal graph neural networks (STGNNs) have great advantages in dealing with such kind of spatial-temporal data. However, we cannot directly apply STGNNs to high-frequency future data because future investors have to consider both the long-term and short-term characteristics when doing decision-making. To capture both the long-term and short-term features, we exploit more label information by designing four heterogeneous tasks: price regression, price moving average regression, price gap regression (within a short interval), and change-point detection, which involve both long-term and short-term scenes. To make full use of these labels, we train our model in a continual manner. Traditional continual GNNs define the gradient of prices as the parameter important to overcome catastrophic forgetting (CF). Unfortunately, the losses of the four heterogeneous tasks lie in different spaces. Hence it is improper to calculate the parameter importance with their losses. We propose to calculate parameter importance with mutual information between original observations and the extracted features. The empirical results based on 49 commodity futures demonstrate that our model has higher prediction performance on capturing long-term or short-term dynamic change.
    Quantum Deep Hedging. (arXiv:2303.16585v1 [quant-ph])
    Quantum machine learning has the potential for a transformative impact across industry sectors and in particular in finance. In our work we look at the problem of hedging where deep reinforcement learning offers a powerful framework for real markets. We develop quantum reinforcement learning methods based on policy-search and distributional actor-critic algorithms that use quantum neural network architectures with orthogonal and compound layers for the policy and value functions. We prove that the quantum neural networks we use are trainable, and we perform extensive simulations that show that quantum models can reduce the number of trainable parameters while achieving comparable performance and that the distributional approach obtains better performance than other standard approaches, both classical and quantum. We successfully implement the proposed models on a trapped-ion quantum processor, utilizing circuits with up to $16$ qubits, and observe performance that agrees well with noiseless simulation. Our quantum techniques are general and can be applied to other reinforcement learning problems beyond hedging.
    VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking. (arXiv:2303.16727v1 [cs.CV])
    Scale is the primary factor for building a powerful foundation model that could well generalize to a variety of downstream tasks. However, it is still challenging to train video foundation models with billions of parameters. This paper shows that video masked autoencoder (VideoMAE) is a scalable and general self-supervised pre-trainer for building video foundation models. We scale the VideoMAE in both model and data with a core design. Specifically, we present a dual masking strategy for efficient pre-training, with an encoder operating on a subset of video tokens and a decoder processing another subset of video tokens. Although VideoMAE is very efficient due to high masking ratio in encoder, masking decoder can still further reduce the overall computational cost. This enables the efficient pre-training of billion-level models in video. We also use a progressive training paradigm that involves an initial pre-training on a diverse multi-sourced unlabeled dataset, followed by a post-pre-training on a mixed labeled dataset. Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90.0% on K400 and 89.9% on K600) and Something-Something (68.7% on V1 and 77.0% on V2). In addition, we extensively verify the pre-trained video ViT models on a variety of downstream tasks, demonstrating its effectiveness as a general video representation learner.
    Nonlinear Independent Component Analysis for Principled Disentanglement in Unsupervised Deep Learning. (arXiv:2303.16535v1 [cs.LG])
    A central problem in unsupervised deep learning is how to find useful representations of high-dimensional data, sometimes called "disentanglement". Most approaches are heuristic and lack a proper theoretical foundation. In linear representation learning, independent component analysis (ICA) has been successful in many applications areas, and it is principled, i.e. based on a well-defined probabilistic model. However, extension of ICA to the nonlinear case has been problematic due to the lack of identifiability, i.e. uniqueness of the representation. Recently, nonlinear extensions that utilize temporal structure or some auxiliary information have been proposed. Such models are in fact identifiable, and consequently, an increasing number of algorithms have been developed. In particular, some self-supervised algorithms can be shown to estimate nonlinear ICA, even though they have initially been proposed from heuristic perspectives. This paper reviews the state-of-the-art of nonlinear ICA theory and algorithms.
    Learning Flow Functions from Data with Applications to Nonlinear Oscillators. (arXiv:2303.16656v1 [eess.SY])
    We describe a recurrent neural network (RNN) based architecture to learn the flow function of a causal, time-invariant and continuous-time control system from trajectory data. By restricting the class of control inputs to piecewise constant functions, we show that learning the flow function is equivalent to learning the input-to-state map of a discrete-time dynamical system. This motivates the use of an RNN together with encoder and decoder networks which map the state of the system to the hidden state of the RNN and back. We show that the proposed architecture is able to approximate the flow function by exploiting the system's causality and time-invariance. The output of the learned flow function model can be queried at any time instant. We experimentally validate the proposed method using models of the Van der Pol and FitzHugh Nagumo oscillators. In both cases, the results demonstrate that the architecture is able to closely reproduce the trajectories of these two systems. For the Van der Pol oscillator, we further show that the trained model generalises to the system's response with a prolonged prediction time horizon as well as control inputs outside the training distribution. For the FitzHugh-Nagumo oscillator, we show that the model accurately captures the input-dependent phenomena of excitability.
    A Byzantine-Resilient Aggregation Scheme for Federated Learning via Matrix Autoregression on Client Updates. (arXiv:2303.16668v1 [cs.LG])
    In this work, we propose FLANDERS, a novel federated learning (FL) aggregation scheme robust to Byzantine attacks. FLANDERS considers the local model updates sent by clients at each FL round as a matrix-valued time series. Then, it identifies malicious clients as outliers of this time series by comparing actual observations with those estimated by a matrix autoregressive forecasting model. Experiments conducted on several datasets under different FL settings demonstrate that FLANDERS matches the robustness of the most powerful baselines against Byzantine clients. Furthermore, FLANDERS remains highly effective even under extremely severe attack scenarios, as opposed to existing defense strategies.
    A Machine Learning Outlook: Post-processing of Global Medium-range Forecasts. (arXiv:2303.16301v1 [cs.LG])
    Post-processing typically takes the outputs of a Numerical Weather Prediction (NWP) model and applies linear statistical techniques to produce improve localized forecasts, by including additional observations, or determining systematic errors at a finer scale. In this pilot study, we investigate the benefits and challenges of using non-linear neural network (NN) based methods to post-process multiple weather features -- temperature, moisture, wind, geopotential height, precipitable water -- at 30 vertical levels, globally and at lead times up to 7 days. We show that we can achieve accuracy improvements of up to 12% (RMSE) in a field such as temperature at 850hPa for a 7 day forecast. However, we recognize the need to strengthen foundational work on objectively measuring a sharp and correct forecast. We discuss the challenges of using standard metrics such as root mean squared error (RMSE) or anomaly correlation coefficient (ACC) as we move from linear statistical models to more complex non-linear machine learning approaches for post-processing global weather forecasts.
    Policy Reuse for Communication Load Balancing in Unseen Traffic Scenarios. (arXiv:2303.16685v1 [cs.NI])
    With the continuous growth in communication network complexity and traffic volume, communication load balancing solutions are receiving increasing attention. Specifically, reinforcement learning (RL)-based methods have shown impressive performance compared with traditional rule-based methods. However, standard RL methods generally require an enormous amount of data to train, and generalize poorly to scenarios that are not encountered during training. We propose a policy reuse framework in which a policy selector chooses the most suitable pre-trained RL policy to execute based on the current traffic condition. Our method hinges on a policy bank composed of policies trained on a diverse set of traffic scenarios. When deploying to an unknown traffic scenario, we select a policy from the policy bank based on the similarity between the previous-day traffic of the current scenario and the traffic observed during training. Experiments demonstrate that this framework can outperform classical and adaptive rule-based methods by a large margin.
    TraVaG: Differentially Private Trace Variant Generation Using GANs. (arXiv:2303.16704v1 [cs.LG])
    Process mining is rapidly growing in the industry. Consequently, privacy concerns regarding sensitive and private information included in event data, used by process mining algorithms, are becoming increasingly relevant. State-of-the-art research mainly focuses on providing privacy guarantees, e.g., differential privacy, for trace variants that are used by the main process mining techniques, e.g., process discovery. However, privacy preservation techniques for releasing trace variants still do not fulfill all the requirements of industry-scale usage. Moreover, providing privacy guarantees when there exists a high rate of infrequent trace variants is still a challenge. In this paper, we introduce TraVaG as a new approach for releasing differentially private trace variants based on \text{Generative Adversarial Networks} (GANs) that provides industry-scale benefits and enhances the level of privacy guarantees when there exists a high ratio of infrequent variants. Moreover, TraVaG overcomes shortcomings of conventional privacy preservation techniques such as bounding the length of variants and introducing fake variants. Experimental results on real-life event data show that our approach outperforms state-of-the-art techniques in terms of privacy guarantees, plain data utility preservation, and result utility preservation.
    Targeted Adversarial Attacks on Wind Power Forecasts. (arXiv:2303.16633v1 [cs.LG])
    In recent years, researchers proposed a variety of deep learning models for wind power forecasting. These models predict the wind power generation of wind farms or entire regions more accurately than traditional machine learning algorithms or physical models. However, latest research has shown that deep learning models can often be manipulated by adversarial attacks. Since wind power forecasts are essential for the stability of modern power systems, it is important to protect them from this threat. In this work, we investigate the vulnerability of two different forecasting models to targeted, semitargeted, and untargeted adversarial attacks. We consider a Long Short-Term Memory (LSTM) network for predicting the power generation of a wind farm and a Convolutional Neural Network (CNN) for forecasting the wind power generation throughout Germany. Moreover, we propose the Total Adversarial Robustness Score (TARS), an evaluation metric for quantifying the robustness of regression models to targeted and semi-targeted adversarial attacks. It assesses the impact of attacks on the model's performance, as well as the extent to which the attacker's goal was achieved, by assigning a score between 0 (very vulnerable) and 1 (very robust). In our experiments, the LSTM forecasting model was fairly robust and achieved a TARS value of over 0.81 for all adversarial attacks investigated. The CNN forecasting model only achieved TARS values below 0.06 when trained ordinarily, and was thus very vulnerable. Yet, its robustness could be significantly improved by adversarial training, which always resulted in a TARS above 0.46.
    Importance Sampling for Stochastic Gradient Descent in Deep Neural Networks. (arXiv:2303.16529v1 [cs.LG])
    Stochastic gradient descent samples uniformly the training set to build an unbiased gradient estimate with a limited number of samples. However, at a given step of the training process, some data are more helpful than others to continue learning. Importance sampling for training deep neural networks has been widely studied to propose sampling schemes yielding better performance than the uniform sampling scheme. After recalling the theory of importance sampling for deep learning, this paper reviews the challenges inherent to this research area. In particular, we propose a metric allowing the assessment of the quality of a given sampling scheme; and we study the interplay between the sampling scheme and the optimizer used.
    Communication Load Balancing via Efficient Inverse Reinforcement Learning. (arXiv:2303.16686v1 [cs.NI])
    Communication load balancing aims to balance the load between different available resources, and thus improve the quality of service for network systems. After formulating the load balancing (LB) as a Markov decision process problem, reinforcement learning (RL) has recently proven effective in addressing the LB problem. To leverage the benefits of classical RL for load balancing, however, we need an explicit reward definition. Engineering this reward function is challenging, because it involves the need for expert knowledge and there lacks a general consensus on the form of an optimal reward function. In this work, we tackle the communication load balancing problem from an inverse reinforcement learning (IRL) approach. To the best of our knowledge, this is the first time IRL has been successfully applied in the field of communication load balancing. Specifically, first, we infer a reward function from a set of demonstrations, and then learn a reinforcement learning load balancing policy with the inferred reward function. Compared to classical RL-based solution, the proposed solution can be more general and more suitable for real-world scenarios. Experimental evaluations implemented on different simulated traffic scenarios have shown our method to be effective and better than other baselines by a considerable margin.
    An EMO Joint Pruning with Multiple Sub-networks: Fast and Effect. (arXiv:2303.16212v1 [cs.LG])
    The network pruning algorithm based on evolutionary multi-objective (EMO) can balance the pruning rate and performance of the network. However, its population-based nature often suffers from the complex pruning optimization space and the highly resource-consuming pruning structure verification process, which limits its application. To this end, this paper proposes an EMO joint pruning with multiple sub-networks (EMO-PMS) to reduce space complexity and resource consumption. First, a divide-and-conquer EMO network pruning framework is proposed, which decomposes the complex EMO pruning task on the whole network into easier sub-tasks on multiple sub-networks. On the one hand, this decomposition reduces the pruning optimization space and decreases the optimization difficulty; on the other hand, the smaller network structure converges faster, so the computational resource consumption of the proposed algorithm is lower. Secondly, a sub-network training method based on cross-network constraints is designed so that the sub-network can process the features generated by the previous one through feature constraints. This method allows sub-networks optimized independently to collaborate better and improves the overall performance of the pruned network. Finally, a multiple sub-networks joint pruning method based on EMO is proposed. For one thing, it can accurately measure the feature processing capability of the sub-networks with the pre-trained feature selector. For another, it can combine multi-objective pruning results on multiple sub-networks through global performance impairment ranking to design a joint pruning scheme. The proposed algorithm is validated on three datasets with different challenging. Compared with fifteen advanced pruning algorithms, the experiment results exhibit the effectiveness and efficiency of the proposed algorithm.
    Fair Federated Medical Image Segmentation via Client Contribution Estimation. (arXiv:2303.16520v1 [cs.LG])
    How to ensure fairness is an important topic in federated learning (FL). Recent studies have investigated how to reward clients based on their contribution (collaboration fairness), and how to achieve uniformity of performance across clients (performance fairness). Despite achieving progress on either one, we argue that it is critical to consider them together, in order to engage and motivate more diverse clients joining FL to derive a high-quality global model. In this work, we propose a novel method to optimize both types of fairness simultaneously. Specifically, we propose to estimate client contribution in gradient and data space. In gradient space, we monitor the gradient direction differences of each client with respect to others. And in data space, we measure the prediction error on client data using an auxiliary model. Based on this contribution estimation, we propose a FL method, federated training via contribution estimation (FedCE), i.e., using estimation as global model aggregation weights. We have theoretically analyzed our method and empirically evaluated it on two real-world medical datasets. The effectiveness of our approach has been validated with significant performance improvements, better collaboration fairness, better performance fairness, and comprehensive analytical studies.
    Facial recognition technology can expose political orientation from facial images even when controlling for demographics and self-presentation. (arXiv:2303.16343v1 [cs.CV])
    A facial recognition algorithm was used to extract face descriptors from carefully standardized images of 591 neutral faces taken in the laboratory setting. Face descriptors were entered into a cross-validated linear regression to predict participants' scores on a political orientation scale (Cronbach's alpha=.94) while controlling for age, gender, and ethnicity. The model's performance exceeded r=.20: much better than that of human raters and on par with how well job interviews predict job success, alcohol drives aggressiveness, or psychological therapy improves mental health. Moreover, the model derived from standardized images performed well (r=.12) in a sample of naturalistic images of 3,401 politicians from the U.S., UK, and Canada, suggesting that the associations between facial appearance and political orientation generalize beyond our sample. The analysis of facial features associated with political orientation revealed that conservatives had larger lower faces, although political orientation was only weakly associated with body mass index (BMI). The predictability of political orientation from standardized images has critical implications for privacy, regulation of facial recognition technology, as well as the understanding the origins and consequences of political orientation.
    Towards Quantifying Calibrated Uncertainty via Deep Ensembles in Multi-output Regression Task. (arXiv:2303.16210v1 [cs.LG])
    Deep ensemble is a simple and straightforward approach for approximating Bayesian inference and has been successfully applied to many classification tasks. This study aims to comprehensively investigate this approach in the multi-output regression task to predict the aerodynamic performance of a missile configuration. By scrutinizing the effect of the number of neural networks used in the ensemble, an obvious trend toward underconfidence in estimated uncertainty is observed. In this context, we propose the deep ensemble framework that applies the post-hoc calibration method, and its improved uncertainty quantification performance is demonstrated. It is compared with Gaussian process regression, the most prevalent model for uncertainty quantification in engineering, and is proven to have superior performance in terms of regression accuracy, reliability of estimated uncertainty, and training efficiency. Finally, the impact of the suggested framework on the results of Bayesian optimization is examined, showing that whether or not the deep ensemble is calibrated can result in completely different exploration characteristics. This framework can be seamlessly applied and extended to any regression task, as no special assumptions have been made for the specific problem used in this study.
    Using Connected Vehicle Trajectory Data to Evaluate the Effects of Speeding. (arXiv:2303.16396v1 [cs.LG])
    Speeding has been and continues to be a major contributing factor to traffic fatalities. Various transportation agencies have proposed speed management strategies to reduce the amount of speeding on arterials. While there have been various studies done on the analysis of speeding proportions above the speed limit, few studies have considered the effect on the individual's journey. Many studies utilized speed data from detectors, which is limited in that there is no information of the route that the driver took. This study aims to explore the effects of various roadway features an individual experiences for a given journey on speeding proportions. Connected vehicle trajectory data was utilized to identify the path that a driver took, along with the vehicle related variables. The level of speeding proportion is predicted using multiple learning models. The model with the best performance, Extreme Gradient Boosting, achieved an accuracy of 0.756. The proposed model can be used to understand how the environment and vehicle's path effects the drivers' speeding behavior, as well as predict the areas with high levels of speeding proportions. The results suggested that features related to an individual driver's trip, i.e., total travel time, has a significant contribution towards speeding. Features that are related to the environment of the individual driver's trip, i.e., proportion of residential area, also had a significant effect on reducing speeding proportions. It is expected that the findings could help inform transportation agencies more on the factors related to speeding for an individual driver's trip.
    Hierarchical Video-Moment Retrieval and Step-Captioning. (arXiv:2303.16406v1 [cs.CV])
    There is growing interest in searching for information from large video corpora. Prior works have studied relevant tasks, such as text-based video retrieval, moment retrieval, video summarization, and video captioning in isolation, without an end-to-end setup that can jointly search from video corpora and generate summaries. Such an end-to-end setup would allow for many interesting applications, e.g., a text-based search that finds a relevant video from a video corpus, extracts the most relevant moment from that video, and segments the moment into important steps with captions. To address this, we present the HiREST (HIerarchical REtrieval and STep-captioning) dataset and propose a new benchmark that covers hierarchical information retrieval and visual/textual stepwise summarization from an instructional video corpus. HiREST consists of 3.4K text-video pairs from an instructional video dataset, where 1.1K videos have annotations of moment spans relevant to text query and breakdown of each moment into key instruction steps with caption and timestamps (totaling 8.6K step captions). Our hierarchical benchmark consists of video retrieval, moment retrieval, and two novel moment segmentation and step captioning tasks. In moment segmentation, models break down a video moment into instruction steps and identify start-end boundaries. In step captioning, models generate a textual summary for each step. We also present starting point task-specific and end-to-end joint baseline models for our new benchmark. While the baseline models show some promising results, there still exists large room for future improvement by the community. Project website: https://hirest-cvpr2023.github.io
    mHealth hyperspectral learning for instantaneous spatiospectral imaging of hemodynamics. (arXiv:2303.16205v1 [eess.IV])
    Hyperspectral imaging acquires data in both the spatial and frequency domains to offer abundant physical or biological information. However, conventional hyperspectral imaging has intrinsic limitations of bulky instruments, slow data acquisition rate, and spatiospectral tradeoff. Here we introduce hyperspectral learning for snapshot hyperspectral imaging in which sampled hyperspectral data in a small subarea are incorporated into a learning algorithm to recover the hypercube. Hyperspectral learning exploits the idea that a photograph is more than merely a picture and contains detailed spectral information. A small sampling of hyperspectral data enables spectrally informed learning to recover a hypercube from an RGB image. Hyperspectral learning is capable of recovering full spectroscopic resolution in the hypercube, comparable to high spectral resolutions of scientific spectrometers. Hyperspectral learning also enables ultrafast dynamic imaging, leveraging ultraslow video recording in an off-the-shelf smartphone, given that a video comprises a time series of multiple RGB images. To demonstrate its versatility, an experimental model of vascular development is used to extract hemodynamic parameters via statistical and deep-learning approaches. Subsequently, the hemodynamics of peripheral microcirculation is assessed at an ultrafast temporal resolution up to a millisecond, using a conventional smartphone camera. This spectrally informed learning method is analogous to compressed sensing; however, it further allows for reliable hypercube recovery and key feature extractions with a transparent learning algorithm. This learning-powered snapshot hyperspectral imaging method yields high spectral and temporal resolutions and eliminates the spatiospectral tradeoff, offering simple hardware requirements and potential applications of various machine-learning techniques.
    Improving Code Generation by Training with Natural Language Feedback. (arXiv:2303.16749v1 [cs.SE])
    The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient. We further show that ILF can be seen as a form of minimizing the KL divergence to the ground truth distribution and demonstrate a proof-of-concept on a neural program synthesis task. We use ILF to improve a Codegen-Mono 6.1B model's pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python Problems (MBPP) benchmark, outperforming both fine-tuning on MBPP and fine-tuning on repaired programs written by humans. Overall, our results suggest that learning from human-written natural language feedback is both more effective and sample-efficient than training exclusively on demonstrations for improving an LLM's performance on code generation tasks.
    Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks. (arXiv:2303.16563v1 [cs.LG])
    We study building a multi-task agent in Minecraft. Without human demonstrations, solving long-horizon tasks in this open-ended environment with reinforcement learning (RL) is extremely sample inefficient. To tackle the challenge, we decompose solving Minecraft tasks into learning basic skills and planning over the skills. We propose three types of fine-grained basic skills in Minecraft, and use RL with intrinsic rewards to accomplish basic skills with high success rates. For skill planning, we use Large Language Models to find the relationships between skills and build a skill graph in advance. When the agent is solving a task, our skill search algorithm walks on the skill graph and generates the proper skill plans for the agent. In experiments, our method accomplishes 24 diverse Minecraft tasks, where many tasks require sequentially executing for more than 10 skills. Our method outperforms baselines in most tasks by a large margin. The project's website and code can be found at https://sites.google.com/view/plan4mc.
    Zero-Shot Generalizable End-to-End Task-Oriented Dialog System using Context Summarization and Domain Schema. (arXiv:2303.16252v1 [cs.CL])
    Task-oriented dialog systems empower users to accomplish their goals by facilitating intuitive and expressive natural language interactions. State-of-the-art approaches in task-oriented dialog systems formulate the problem as a conditional sequence generation task and fine-tune pre-trained causal language models in the supervised setting. This requires labeled training data for each new domain or task, and acquiring such data is prohibitively laborious and expensive, thus making it a bottleneck for scaling systems to a wide range of domains. To overcome this challenge, we introduce a novel Zero-Shot generalizable end-to-end Task-oriented Dialog system, ZS-ToD, that leverages domain schemas to allow for robust generalization to unseen domains and exploits effective summarization of the dialog history. We employ GPT-2 as a backbone model and introduce a two-step training process where the goal of the first step is to learn the general structure of the dialog data and the second step optimizes the response generation as well as intermediate outputs, such as dialog state and system actions. As opposed to state-of-the-art systems that are trained to fulfill certain intents in the given domains and memorize task-specific conversational patterns, ZS-ToD learns generic task-completion skills by comprehending domain semantics via domain schemas and generalizing to unseen domains seamlessly. We conduct an extensive experimental evaluation on SGD and SGD-X datasets that span up to 20 unique domains and ZS-ToD outperforms state-of-the-art systems on key metrics, with an improvement of +17% on joint goal accuracy and +5 on inform. Additionally, we present a detailed ablation study to demonstrate the effectiveness of the proposed components and training mechanism
    Federated Learning in MIMO Satellite Broadcast System. (arXiv:2303.16603v1 [eess.SP])
    Federated learning (FL) is a type of distributed machine learning at the wireless edge that preserves the privacy of clients' data from adversaries and even the central server. Existing federated learning approaches either use (i) secure multiparty computation (SMC) which is vulnerable to inference or (ii) differential privacy which may decrease the test accuracy given a large number of parties with relatively small amounts of data each. To tackle the problem with the existing methods in the literature, In this paper, we introduce incorporate federated learning in the inner-working of MIMO systems.
    Neuro-symbolic Rule Learning in Real-world Classification Tasks. (arXiv:2303.16674v1 [cs.LG])
    Neuro-symbolic rule learning has attracted lots of attention as it offers better interpretability than pure neural models and scales better than symbolic rule learning. A recent approach named pix2rule proposes a neural Disjunctive Normal Form (neural DNF) module to learn symbolic rules with feed-forward layers. Although proved to be effective in synthetic binary classification, pix2rule has not been applied to more challenging tasks such as multi-label and multi-class classifications over real-world data. In this paper, we address this limitation by extending the neural DNF module to (i) support rule learning in real-world multi-class and multi-label classification tasks, (ii) enforce the symbolic property of mutual exclusivity (i.e. predicting exactly one class) in multi-class classification, and (iii) explore its scalability over large inputs and outputs. We train a vanilla neural DNF model similar to pix2rule's neural DNF module for multi-label classification, and we propose a novel extended model called neural DNF-EO (Exactly One) which enforces mutual exclusivity in multi-class classification. We evaluate the classification performance, scalability and interpretability of our neural DNF-based models, and compare them against pure neural models and a state-of-the-art symbolic rule learner named FastLAS. We demonstrate that our neural DNF-based models perform similarly to neural networks, but provide better interpretability by enabling the extraction of logical rules. Our models also scale well when the rule search space grows in size, in contrast to FastLAS, which fails to learn in multi-class classification tasks with 200 classes and in all multi-label settings.
    Policy Gradient Methods for Discrete Time Linear Quadratic Regulator With Random Parameters. (arXiv:2303.16548v1 [math.OC])
    This paper studies an infinite horizon optimal control problem for discrete-time linear system and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. In this general setting, we apply the policy gradient method, a reinforcement learning technique, to search for the optimal control without requiring knowledge of statistical information of the parameters. We investigate the sub-Gaussianity of the state process and establish global linear convergence guarantee for this approach based on assumptions that are weaker and easier to verify compared to existing results. Numerical experiments are presented to illustrate our result.
    Poster: Link between Bias, Node Sensitivity and Long-Tail Distribution in trained DNNs. (arXiv:2303.16589v1 [cs.LG])
    Owing to their remarkable learning (and relearning) capabilities, deep neural networks (DNNs) find use in numerous real-world applications. However, the learning of these data-driven machine learning models is generally as good as the data available to them for training. Hence, training datasets with long-tail distribution pose a challenge for DNNs, since the DNNs trained on them may provide a varying degree of classification performance across different output classes. While the overall bias of such networks is already highlighted in existing works, this work identifies the node bias that leads to a varying sensitivity of the nodes for different output classes. To the best of our knowledge, this is the first work highlighting this unique challenge in DNNs, discussing its probable causes, and providing open challenges for this new research direction. We support our reasoning using an empirical case study of the networks trained on a real-world dataset.
    Problems and shortcuts in deep learning for screening mammography. (arXiv:2303.16417v1 [cs.CV])
    This work reveals undiscovered challenges in the performance and generalizability of deep learning models. We (1) identify spurious shortcuts and evaluation issues that can inflate performance and (2) propose training and analysis methods to address them. We trained an AI model to classify cancer on a retrospective dataset of 120,112 US exams (3,467 cancers) acquired from 2008 to 2017 and 16,693 UK exams (5,655 cancers) acquired from 2011 to 2015. We evaluated on a screening mammography test set of 11,593 US exams (102 cancers; 7,594 women; age 57.1 \pm 11.0) and 1,880 UK exams (590 cancers; 1,745 women; age 63.3 \pm 7.2). A model trained on images of only view markers (no breast) achieved a 0.691 AUC. The original model trained on both datasets achieved a 0.945 AUC on the combined US+UK dataset but paradoxically only 0.838 and 0.892 on the US and UK datasets, respectively. Sampling cancers equally from both datasets during training mitigated this shortcut. A similar AUC paradox (0.903) occurred when evaluating diagnostic exams vs screening exams (0.862 vs 0.861, respectively). Removing diagnostic exams during training alleviated this bias. Finally, the model did not exhibit the AUC paradox over scanner models but still exhibited a bias toward Selenia Dimension (SD) over Hologic Selenia (HS) exams. Analysis showed that this AUC paradox occurred when a dataset attribute had values with a higher cancer prevalence (dataset bias) and the model consequently assigned a higher probability to these attribute values (model bias). Stratification and balancing cancer prevalence can mitigate shortcuts during evaluation. Dataset and model bias can introduce shortcuts and the AUC paradox, potentially pervasive issues within the healthcare AI space. Our methods can verify and mitigate shortcuts while providing a clear understanding of performance.
    Local Interpretability of Random Forests for Multi-Target Regression. (arXiv:2303.16506v1 [cs.LG])
    Multi-target regression is useful in a plethora of applications. Although random forest models perform well in these tasks, they are often difficult to interpret. Interpretability is crucial in machine learning, especially when it can directly impact human well-being. Although model-agnostic techniques exist for multi-target regression, specific techniques tailored to random forest models are not available. To address this issue, we propose a technique that provides rule-based interpretations for instances made by a random forest model for multi-target regression, influenced by a recent model-specific technique for random forest interpretability. The proposed technique was evaluated through extensive experiments and shown to offer competitive interpretations compared to state-of-the-art techniques.
    Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints. (arXiv:2303.16510v1 [stat.ML])
    Orthogonality constraints naturally appear in many machine learning problems, from Principal Components Analysis to robust neural network training. They are usually solved using Riemannian optimization algorithms, which minimize the objective function while enforcing the constraint. However, enforcing the orthogonality constraint can be the most time-consuming operation in such algorithms. Recently, Ablin & Peyr\'e (2022) proposed the Landing algorithm, a method with cheap iterations that does not enforce the orthogonality constraint but is attracted towards the manifold in a smooth manner. In this article, we provide new practical and theoretical developments for the landing algorithm. First, the method is extended to the Stiefel manifold, the set of rectangular orthogonal matrices. We also consider stochastic and variance reduction algorithms when the cost function is an average of many functions. We demonstrate that all these methods have the same rate of convergence as their Riemannian counterparts that exactly enforce the constraint. Finally, our experiments demonstrate the promise of our approach to an array of machine-learning problems that involve orthogonality constraints.
    Personalised Language Modelling of Screen Characters Using Rich Metadata Annotations. (arXiv:2303.16618v1 [cs.CL])
    Personalisation of language models for dialogue sensitises them to better capture the speaking patterns of people of specific characteristics, and/or in specific environments. However, rich character annotations are difficult to come by and to successfully leverage. In this work, we release and describe a novel set of manual annotations for 863 speakers from the popular Cornell Movie Dialog Corpus, including features like characteristic quotes and character descriptions, and a set of six automatically extracted metadata for over 95% of the featured films. We perform extensive experiments on two corpora and show that such annotations can be effectively used to personalise language models, reducing perplexity by up to 8.5%. Our method can be applied even zero-shot for speakers for whom no prior training data is available, by relying on combinations of characters' demographic characteristics. Since collecting such metadata is costly, we also contribute a cost-benefit analysis to highlight which annotations were most cost-effective relative to the reduction in perplexity.
    Who You Play Affects How You Play: Predicting Sports Performance Using Graph Attention Networks With Temporal Convolution. (arXiv:2303.16741v1 [cs.LG])
    This study presents a novel deep learning method, called GATv2-GCN, for predicting player performance in sports. To construct a dynamic player interaction graph, we leverage player statistics and their interactions during gameplay. We use a graph attention network to capture the attention that each player pays to each other, allowing for more accurate modeling of the dynamic player interactions. To handle the multivariate player statistics time series, we incorporate a temporal convolution layer, which provides the model with temporal predictive power. We evaluate the performance of our model using real-world sports data, demonstrating its effectiveness in predicting player performance. Furthermore, we explore the potential use of our model in a sports betting context, providing insights into profitable strategies that leverage our predictive power. The proposed method has the potential to advance the state-of-the-art in player performance prediction and to provide valuable insights for sports analytics and betting industries.
    A Gold Standard Dataset for the Reviewer Assignment Problem. (arXiv:2303.16750v1 [cs.IR])
    Many peer-review venues are either using or looking to use algorithms to assign submissions to reviewers. The crux of such automated approaches is the notion of the "similarity score"--a numerical estimate of the expertise of a reviewer in reviewing a paper--and many algorithms have been proposed to compute these scores. However, these algorithms have not been subjected to a principled comparison, making it difficult for stakeholders to choose the algorithm in an evidence-based manner. The key challenge in comparing existing algorithms and developing better algorithms is the lack of the publicly available gold-standard data that would be needed to perform reproducible research. We address this challenge by collecting a novel dataset of similarity scores that we release to the research community. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers who evaluated their expertise in reviewing papers they have read previously. We use this data to compare several popular algorithms employed in computer science conferences and come up with recommendations for stakeholders. Our main findings are as follows. First, all algorithms make a non-trivial amount of error. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases, highlighting the vital need for more research on the similarity-computation problem. Second, most existing algorithms are designed to work with titles and abstracts of papers, and in this regime the Specter+MFR algorithm performs best. Third, to improve performance, it may be important to develop modern deep-learning based algorithms that can make use of the full texts of papers: the classical TD-IDF algorithm enhanced with full texts of papers is on par with the deep-learning based Specter+MFR that cannot make use of this information.
    ChatGPT or academic scientist? Distinguishing authorship with over 99% accuracy using off-the-shelf machine learning tools. (arXiv:2303.16352v1 [cs.LG])
    ChatGPT has enabled access to AI-generated writing for the masses, and within just a few months, this product has disrupted the knowledge economy, initiating a culture shift in the way people work, learn, and write. The need to discriminate human writing from AI is now both critical and urgent, particularly in domains like higher education and academic writing, where AI had not been a significant threat or contributor to authorship. Addressing this need, we developed a method for discriminating text generated by ChatGPT from (human) academic scientists, relying on prevalent and accessible supervised classification methods. We focused on how a particular group of humans, academic scientists, write differently than ChatGPT, and this targeted approach led to the discovery of new features for discriminating (these) humans from AI; as examples, scientists write long paragraphs and have a penchant for equivocal language, frequently using words like but, however, and although. With a set of 20 features, including the aforementioned ones and others, we built a model that assigned the author, as human or AI, at well over 99% accuracy, resulting in 20 times fewer misclassified documents compared to the field-leading approach. This strategy for discriminating a particular set of humans writing from AI could be further adapted and developed by others with basic skills in supervised classification, enabling access to many highly accurate and targeted models for detecting AI usage in academic writing and beyond.
    On the Local Cache Update Rules in Streaming Federated Learning. (arXiv:2303.16340v1 [cs.LG])
    In this study, we address the emerging field of Streaming Federated Learning (SFL) and propose local cache update rules to manage dynamic data distributions and limited cache capacity. Traditional federated learning relies on fixed data sets, whereas in SFL, data is streamed, and its distribution changes over time, leading to discrepancies between the local training dataset and long-term distribution. To mitigate this problem, we propose three local cache update rules - First-In-First-Out (FIFO), Static Ratio Selective Replacement (SRSR), and Dynamic Ratio Selective Replacement (DRSR) - that update the local cache of each client while considering the limited cache capacity. Furthermore, we derive a convergence bound for our proposed SFL algorithm as a function of the distribution discrepancy between the long-term data distribution and the client's local training dataset. We then evaluate our proposed algorithm on two datasets: a network traffic classification dataset and an image classification dataset. Our experimental results demonstrate that our proposed local cache update rules significantly reduce the distribution discrepancy and outperform the baseline methods. Our study advances the field of SFL and provides practical cache management solutions in federated learning.
    LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interface paradigms and interpretability. (arXiv:2303.16407v1 [cs.LG])
    EEG-based recognition of activities and states involves the use of prior neuroscience knowledge to generate quantitative EEG features, which may limit BCI performance. Although neural network-based methods can effectively extract features, they often encounter issues such as poor generalization across datasets, high predicting volatility, and low model interpretability. Hence, we propose a novel lightweight multi-dimensional attention network, called LMDA-Net. By incorporating two novel attention modules designed specifically for EEG signals, the channel attention module and the depth attention module, LMDA-Net can effectively integrate features from multiple dimensions, resulting in improved classification performance across various BCI tasks. LMDA-Net was evaluated on four high-impact public datasets, including motor imagery (MI) and P300-Speller paradigms, and was compared with other representative models. The experimental results demonstrate that LMDA-Net outperforms other representative methods in terms of classification accuracy and predicting volatility, achieving the highest accuracy in all datasets within 300 training epochs. Ablation experiments further confirm the effectiveness of the channel attention module and the depth attention module. To facilitate an in-depth understanding of the features extracted by LMDA-Net, we propose class-specific neural network feature interpretability algorithms that are suitable for event-related potentials (ERPs) and event-related desynchronization/synchronization (ERD/ERS). By mapping the output of the specific layer of LMDA-Net to the time or spatial domain through class activation maps, the resulting feature visualizations can provide interpretable analysis and establish connections with EEG time-spatial analysis in neuroscience. In summary, LMDA-Net shows great potential as a general online decoding model for various EEG tasks.
    An Over-parameterized Exponential Regression. (arXiv:2303.16504v1 [cs.LG])
    Over the past few years, there has been a significant amount of research focused on studying the ReLU activation function, with the aim of achieving neural network convergence through over-parametrization. However, recent developments in the field of Large Language Models (LLMs) have sparked interest in the use of exponential activation functions, specifically in the attention mechanism. Mathematically, we define the neural function $F: \mathbb{R}^{d \times m} \times \mathbb{R}^d \rightarrow \mathbb{R}$ using an exponential activation function. Given a set of data points with labels $\{(x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)\} \subset \mathbb{R}^d \times \mathbb{R}$ where $n$ denotes the number of the data. Here $F(W(t),x)$ can be expressed as $F(W(t),x) := \sum_{r=1}^m a_r \exp(\langle w_r, x \rangle)$, where $m$ represents the number of neurons, and $w_r(t)$ are weights at time $t$. It's standard in literature that $a_r$ are the fixed weights and it's never changed during the training. We initialize the weights $W(0) \in \mathbb{R}^{d \times m}$ with random Gaussian distributions, such that $w_r(0) \sim \mathcal{N}(0, I_d)$ and initialize $a_r$ from random sign distribution for each $r \in [m]$. Using the gradient descent algorithm, we can find a weight $W(T)$ such that $\| F(W(T), X) - y \|_2 \leq \epsilon$ holds with probability $1-\delta$, where $\epsilon \in (0,0.1)$ and $m = \Omega(n^{2+o(1)}\log(n/\delta))$. To optimize the over-parameterization bound $m$, we employ several tight analysis techniques from previous studies [Song and Yang arXiv 2019, Munteanu, Omlor, Song and Woodruff ICML 2022].
    Training Language Models with Language Feedback at Scale. (arXiv:2303.16755v1 [cs.CL])
    Pretrained language models often generate outputs that are not in line with human preferences, such as harmful text or factually incorrect summaries. Recent work approaches the above issues by learning from a simple form of human feedback: comparisons between pairs of model-generated outputs. However, comparison feedback only conveys limited information about human preferences. In this paper, we introduce Imitation learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback. ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements. Second, selecting the refinement incorporating the most feedback. Third, finetuning the language model to maximize the likelihood of the chosen refinement given the input. We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback. We evaluate ILF's effectiveness on a carefully-controlled toy task and a realistic summarization task. Our experiments demonstrate that large language models accurately incorporate feedback and that finetuning with ILF scales well with the dataset size, even outperforming finetuning on human summaries. Learning from both language and comparison feedback outperforms learning from each alone, achieving human-level summarization performance.
    Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset based on Active Learning. (arXiv:2303.16778v1 [cs.CL])
    Cooking recipes allow individuals to exchange culinary ideas and provide food preparation instructions. Due to a lack of adequate labeled data, categorizing raw recipes found online to the appropriate food genres is a challenging task in this domain. Utilizing the knowledge of domain experts to categorize recipes could be a solution. In this study, we present a novel dataset of two million culinary recipes labeled in respective categories leveraging the knowledge of food experts and an active learning technique. To construct the dataset, we collect the recipes from the RecipeNLG dataset. Then, we employ three human experts whose trustworthiness score is higher than 86.667% to categorize 300K recipe by their Named Entity Recognition (NER) and assign it to one of the nine categories: bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides and fusion. Finally, we categorize the remaining 1900K recipes using Active Learning method with a blend of Query-by-Committee and Human In The Loop (HITL) approaches. There are more than two million recipes in our dataset, each of which is categorized and has a confidence score linked with it. For the 9 genres, the Fleiss Kappa score of this massive dataset is roughly 0.56026. We believe that the research community can use this dataset to perform various machine learning tasks such as recipe genre classification, recipe generation of a specific genre, new recipe creation, etc. The dataset can also be used to train and evaluate the performance of various NLP tasks such as named entity recognition, part-of-speech tagging, semantic role labeling, and so on. The dataset will be available upon publication: https://tinyurl.com/3zu4778y.
    Accelerated Cyclic Coordinate Dual Averaging with Extrapolation for Composite Convex Optimization. (arXiv:2303.16279v1 [math.OC])
    Exploiting partial first-order information in a cyclic way is arguably the most natural strategy to obtain scalable first-order methods. However, despite their wide use in practice, cyclic schemes are far less understood from a theoretical perspective than their randomized counterparts. Motivated by a recent success in analyzing an extrapolated cyclic scheme for generalized variational inequalities, we propose an Accelerated Cyclic Coordinate Dual Averaging with Extrapolation (A-CODER) method for composite convex optimization, where the objective function can be expressed as the sum of a smooth convex function accessible via a gradient oracle and a convex, possibly nonsmooth, function accessible via a proximal oracle. We show that A-CODER attains the optimal convergence rate with improved dependence on the number of blocks compared to prior work. Furthermore, for the setting where the smooth component of the objective function is expressible in a finite sum form, we introduce a variance-reduced variant of A-CODER, VR-A-CODER, with state-of-the-art complexity guarantees. Finally, we demonstrate the effectiveness of our algorithms through numerical experiments.  ( 2 min )
    ProductAE: Toward Deep Learning Driven Error-Correction Codes of Large Dimensions. (arXiv:2303.16424v1 [cs.IT])
    While decades of theoretical research have led to the invention of several classes of error-correction codes, the design of such codes is an extremely challenging task, mostly driven by human ingenuity. Recent studies demonstrate that such designs can be effectively automated and accelerated via tools from machine learning (ML), thus enabling ML-driven classes of error-correction codes with promising performance gains compared to classical designs. A fundamental challenge, however, is that it is prohibitively complex, if not impossible, to design and train fully ML-driven encoder and decoder pairs for large code dimensions. In this paper, we propose Product Autoencoder (ProductAE) -- a computationally-efficient family of deep learning driven (encoder, decoder) pairs -- aimed at enabling the training of relatively large codes (both encoder and decoder) with a manageable training complexity. We build upon ideas from classical product codes and propose constructing large neural codes using smaller code components. ProductAE boils down the complex problem of training the encoder and decoder for a large code dimension $k$ and blocklength $n$ to less-complex sub-problems of training encoders and decoders for smaller dimensions and blocklengths. Our training results show successful training of ProductAEs of dimensions as large as $k = 300$ bits with meaningful performance gains compared to state-of-the-art classical and neural designs. Moreover, we demonstrate excellent robustness and adaptivity of ProductAEs to channel models different than the ones used for training.  ( 3 min )
    Are Data-driven Explanations Robust against Out-of-distribution Data?. (arXiv:2303.16390v1 [cs.LG])
    As black-box models increasingly power high-stakes applications, a variety of data-driven explanation methods have been introduced. Meanwhile, machine learning models are constantly challenged by distributional shifts. A question naturally arises: Are data-driven explanations robust against out-of-distribution data? Our empirical results show that even though predict correctly, the model might still yield unreliable explanations under distributional shifts. How to develop robust explanations against out-of-distribution data? To address this problem, we propose an end-to-end model-agnostic learning framework Distributionally Robust Explanations (DRE). The key idea is, inspired by self-supervised learning, to fully utilizes the inter-distribution information to provide supervisory signals for the learning of explanations without human annotation. Can robust explanations benefit the model's generalization capability? We conduct extensive experiments on a wide range of tasks and data types, including classification and regression on image and scientific tabular data. Our results demonstrate that the proposed method significantly improves the model's performance in terms of explanation and prediction robustness against distributional shifts.  ( 2 min )
    Non-Asymptotic Lower Bounds For Training Data Reconstruction. (arXiv:2303.16372v1 [cs.LG])
    We investigate semantic guarantees of private learning algorithms for their resilience to training Data Reconstruction Attacks (DRAs) by informed adversaries. To this end, we derive non-asymptotic minimax lower bounds on the adversary's reconstruction error against learners that satisfy differential privacy (DP) and metric differential privacy (mDP). Furthermore, we demonstrate that our lower bound analysis for the latter also covers the high dimensional regime, wherein, the input data dimensionality may be larger than the adversary's query budget. Motivated by the theoretical improvements conferred by metric DP, we extend the privacy analysis of popular deep learning algorithms such as DP-SGD and Projected Noisy SGD to cover the broader notion of metric differential privacy.  ( 2 min )
    Combinatorial Convolutional Neural Networks for Words. (arXiv:2303.16211v1 [cs.LG])
    The paper discusses the limitations of deep learning models in identifying and utilizing features that remain invariant under a bijective transformation on the data entries, which we refer to as combinatorial patterns. We argue that the identification of such patterns may be important for certain applications and suggest providing neural networks with information that fully describes the combinatorial patterns of input entries and allows the network to determine what is relevant for prediction. To demonstrate the feasibility of this approach, we present a combinatorial convolutional neural network for word classification.  ( 2 min )
    ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales. (arXiv:2303.16245v1 [cs.DC])
    As we enter the exascale computing era, efficiently utilizing power and optimizing the performance of scientific applications under power and energy constraints has become critical and challenging. We propose a low-overhead autotuning framework to autotune performance and energy for various hybrid MPI/OpenMP scientific applications at large scales and to explore the tradeoffs between application runtime and power/energy for energy efficient application execution, then use this framework to autotune four ECP proxy applications -- XSBench, AMG, SWFFT, and SW4lite. Our approach uses Bayesian optimization with a Random Forest surrogate model to effectively search parameter spaces with up to 6 million different configurations on two large-scale production systems, Theta at Argonne National Laboratory and Summit at Oak Ridge National Laboratory. The experimental results show that our autotuning framework at large scales has low overhead and achieves good scalability. Using the proposed autotuning framework to identify the best configurations, we achieve up to 91.59% performance improvement, up to 21.2% energy savings, and up to 37.84% EDP improvement on up to 4,096 nodes.  ( 2 min )
    Tetra-AML: Automatic Machine Learning via Tensor Networks. (arXiv:2303.16214v1 [cs.LG])
    Neural networks have revolutionized many aspects of society but in the era of huge models with billions of parameters, optimizing and deploying them for commercial applications can require significant computational and financial resources. To address these challenges, we introduce the Tetra-AML toolbox, which automates neural architecture search and hyperparameter optimization via a custom-developed black-box Tensor train Optimization algorithm, TetraOpt. The toolbox also provides model compression through quantization and pruning, augmented by compression using tensor networks. Here, we analyze a unified benchmark for optimizing neural networks in computer vision tasks and show the superior performance of our approach compared to Bayesian optimization on the CIFAR-10 dataset. We also demonstrate the compression of ResNet-18 neural networks, where we use 14.5 times less memory while losing just 3.2% of accuracy. The presented framework is generic, not limited by computer vision problems, supports hardware acceleration (such as with GPUs and TPUs) and can be further extended to quantum hardware and to hybrid quantum machine learning models.  ( 2 min )
    A Perspectival Mirror of the Elephant: Investigating Language Bias on Google, ChatGPT, Wikipedia, and YouTube. (arXiv:2303.16281v1 [cs.CY])
    Contrary to Google Search's mission of delivering information from "many angles so you can form your own understanding of the world," we find that Google and its most prominent returned results -- Wikipedia and YouTube, simply reflect the narrow set of cultural stereotypes tied to the search language for complex topics like "Buddhism," "Liberalism," "colonization," "Iran" and "America." Simply stated, they present, to varying degrees, distinct information across the same search in different languages (we call it 'language bias'). Instead of presenting a global picture of a complex topic, our online searches turn us into the proverbial blind person touching a small portion of an elephant, ignorant of the existence of other cultural perspectives. The language we use to search ends up as a cultural filter to promote ethnocentric views, where a person evaluates other people or ideas based on their own culture. We also find that language bias is deeply embedded in ChatGPT. As it is primarily trained on English language data, it presents the Anglo-American perspective as the normative view, reducing the complexity of a multifaceted issue to the single Anglo-American standard. In this paper, we present evidence and analysis of language bias and discuss its larger social implications. Toward the end of the paper, we propose a potential framework of using automatic translation to leverage language bias and argue that the task of piecing together a genuine depiction of the elephant is a challenging and important endeavor that deserves a new area of research in NLP and requires collaboration with scholars from the humanities to create ethically sound and socially responsible technology together.  ( 3 min )
    Crime Prediction Using Machine Learning and Deep Learning: A Systematic Review and Future Directions. (arXiv:2303.16310v1 [cs.LG])
    Predicting crime using machine learning and deep learning techniques has gained considerable attention from researchers in recent years, focusing on identifying patterns and trends in crime occurrences. This review paper examines over 150 articles to explore the various machine learning and deep learning algorithms applied to predict crime. The study provides access to the datasets used for crime prediction by researchers and analyzes prominent approaches applied in machine learning and deep learning algorithms to predict crime, offering insights into different trends and factors related to criminal activities. Additionally, the paper highlights potential gaps and future directions that can enhance the accuracy of crime prediction. Finally, the comprehensive overview of research discussed in this paper on crime prediction using machine learning and deep learning approaches serves as a valuable reference for researchers in this field. By gaining a deeper understanding of crime prediction techniques, law enforcement agencies can develop strategies to prevent and respond to criminal activities more effectively.  ( 3 min )
    Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples. (arXiv:2303.16270v1 [cs.LG])
    Federated learning is a popular collaborative learning approach that enables clients to train a global model without sharing their local data. Vertical federated learning (VFL) deals with scenarios in which the data on clients have different feature spaces but share some overlapping samples. Existing VFL approaches suffer from high communication costs and cannot deal efficiently with limited overlapping samples commonly seen in the real world. We propose a practical vertical federated learning (VFL) framework called \textbf{one-shot VFL} that can solve the communication bottleneck and the problem of limited overlapping samples simultaneously based on semi-supervised learning. We also propose \textbf{few-shot VFL} to improve the accuracy further with just one more communication round between the server and the clients. In our proposed framework, the clients only need to communicate with the server once or only a few times. We evaluate the proposed VFL framework on both image and tabular datasets. Our methods can improve the accuracy by more than 46.5\% and reduce the communication cost by more than 330$\times$ compared with state-of-the-art VFL methods when evaluated on CIFAR-10. Our code will be made publicly available at \url{https://nvidia.github.io/NVFlare/research/one-shot-vfl}.  ( 2 min )
    AmorProt: Amino Acid Molecular Fingerprints Repurposing based Protein Fingerprint. (arXiv:2303.16209v1 [q-bio.QM])
    As protein therapeutics play an important role in almost all medical fields, numerous studies have been conducted on proteins using artificial intelligence. Artificial intelligence has enabled data driven predictions without the need for expensive experiments. Nevertheless, unlike the various molecular fingerprint algorithms that have been developed, protein fingerprint algorithms have rarely been studied. In this study, we proposed the amino acid molecular fingerprints repurposing based protein (AmorProt) fingerprint, a protein sequence representation method that effectively uses the molecular fingerprints corresponding to 20 amino acids. Subsequently, the performances of the tree based machine learning and artificial neural network models were compared using (1) amyloid classification and (2) isoelectric point regression. Finally, the applicability and advantages of the developed platform were demonstrated through a case study and the following experiments: (3) comparison of dataset dependence with feature based methods; (4) feature importance analysis; and (5) protein space analysis. Consequently, the significantly improved model performance and data set independent versatility of the AmorProt fingerprint were verified. The results revealed that the current protein representation method can be applied to various fields related to proteins, such as predicting their fundamental properties or interaction with ligands.  ( 2 min )
    Function Approximation with Randomly Initialized Neural Networks for Approximate Model Reference Adaptive Control. (arXiv:2303.16251v1 [math.OC])
    Classical results in neural network approximation theory show how arbitrary continuous functions can be approximated by networks with a single hidden layer, under mild assumptions on the activation function. However, the classical theory does not give a constructive means to generate the network parameters that achieve a desired accuracy. Recent results have demonstrated that for specialized activation functions, such as ReLUs and some classes of analytic functions, high accuracy can be achieved via linear combinations of randomly initialized activations. These recent works utilize specialized integral representations of target functions that depend on the specific activation functions used. This paper defines mollified integral representations, which provide a means to form integral representations of target functions using activations for which no direct integral representation is currently known. The new construction enables approximation guarantees for randomly initialized networks for a variety of widely used activation functions.  ( 2 min )
    Provable Robustness for Streaming Models with a Sliding Window. (arXiv:2303.16308v1 [cs.LG])
    The literature on provable robustness in machine learning has primarily focused on static prediction problems, such as image classification, in which input samples are assumed to be independent and model performance is measured as an expectation over the input distribution. Robustness certificates are derived for individual input instances with the assumption that the model is evaluated on each instance separately. However, in many deep learning applications such as online content recommendation and stock market analysis, models use historical data to make predictions. Robustness certificates based on the assumption of independent input samples are not directly applicable in such scenarios. In this work, we focus on the provable robustness of machine learning models in the context of data streams, where inputs are presented as a sequence of potentially correlated items. We derive robustness certificates for models that use a fixed-size sliding window over the input stream. Our guarantees hold for the average model performance across the entire stream and are independent of stream size, making them suitable for large data streams. We perform experiments on speech detection and human activity recognition tasks and show that our certificates can produce meaningful performance guarantees against adversarial perturbations.  ( 2 min )
    Accelerated wind farm yaw and layout optimisation with multi-fidelity deep transfer learning wake models. (arXiv:2303.16274v1 [cs.LG])
    Wind farm modelling has been an area of rapidly increasing interest with numerous analytical as well as computational-based approaches developed to extend the margins of wind farm efficiency and maximise power production. In this work, we present the novel ML framework WakeNet, which can reproduce generalised 2D turbine wake velocity fields at hub-height over a wide range of yaw angles, wind speeds and turbulence intensities (TIs), with a mean accuracy of 99.8% compared to the solution calculated using the state-of-the-art wind farm modelling software FLORIS. As the generation of sufficient high-fidelity data for network training purposes can be cost-prohibitive, the utility of multi-fidelity transfer learning has also been investigated. Specifically, a network pre-trained on the low-fidelity Gaussian wake model is fine-tuned in order to obtain accurate wake results for the mid-fidelity Curl wake model. The robustness and overall performance of WakeNet on various wake steering control and layout optimisation scenarios has been validated through power-gain heatmaps, obtaining at least 90% of the power gained through optimisation performed with FLORIS directly. We also demonstrate that when utilising the Curl model, WakeNet is able to provide similar power gains to FLORIS, two orders of magnitude faster (e.g. 10 minutes vs 36 hours per optimisation case). The wake evaluation time of wakeNet when trained on a high-fidelity CFD dataset is expected to be similar, thus further increasing computational time gains. These promising results show that generalised wake modelling with ML tools can be accurate enough to contribute towards active yaw and layout optimisation, while producing realistic optimised configurations at a fraction of the computational cost, hence making it feasible to perform real-time active yaw control as well as robust optimisation under uncertainty.  ( 3 min )
    XAIR: A Framework of Explainable AI in Augmented Reality. (arXiv:2303.16292v1 [cs.HC])
    Explainable AI (XAI) has established itself as an important component of AI-driven interactive systems. With Augmented Reality (AR) becoming more integrated in daily lives, the role of XAI also becomes essential in AR because end-users will frequently interact with intelligent services. However, it is unclear how to design effective XAI experiences for AR. We propose XAIR, a design framework that addresses "when", "what", and "how" to provide explanations of AI output in AR. The framework was based on a multi-disciplinary literature review of XAI and HCI research, a large-scale survey probing 500+ end-users' preferences for AR-based explanations, and three workshops with 12 experts collecting their insights about XAI design in AR. XAIR's utility and effectiveness was verified via a study with 10 designers and another study with 12 end-users. XAIR can provide guidelines for designers, inspiring them to identify new design opportunities and achieve effective XAI designs in AR.  ( 2 min )
    TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition. (arXiv:2303.16268v1 [cs.CV])
    Semi-Supervised Learning can be more beneficial for the video domain compared to images because of its higher annotation cost and dimensionality. Besides, any video understanding task requires reasoning over both spatial and temporal dimensions. In order to learn both the static and motion related features for the semi-supervised action recognition task, existing methods rely on hard input inductive biases like using two-modalities (RGB and Optical-flow) or two-stream of different playback rates. Instead of utilizing unlabeled videos through diverse input streams, we rely on self-supervised video representations, particularly, we utilize temporally-invariant and temporally-distinctive representations. We observe that these representations complement each other depending on the nature of the action. Based on this observation, we propose a student-teacher semi-supervised learning framework, TimeBalance, where we distill the knowledge from a temporally-invariant and a temporally-distinctive teacher. Depending on the nature of the unlabeled video, we dynamically combine the knowledge of these two teachers based on a novel temporal similarity-based reweighting scheme. Our method achieves state-of-the-art performance on three action recognition benchmarks: UCF101, HMDB51, and Kinetics400. Code: https://github.com/DAVEISHAN/TimeBalance  ( 2 min )
  • Open

    Learning Identity-Preserving Transformations on Data Manifolds. (arXiv:2106.12096v2 [cs.LG] UPDATED)
    Many machine learning techniques incorporate identity-preserving transformations into their models to generalize their performance to previously unseen data. These transformations are typically selected from a set of functions that are known to maintain the identity of an input when applied (e.g., rotation, translation, flipping, and scaling). However, there are many natural variations that cannot be labeled for supervision or defined through examination of the data. As suggested by the manifold hypothesis, many of these natural variations live on or near a low-dimensional, nonlinear manifold. Several techniques represent manifold variations through a set of learned Lie group operators that define directions of motion on the manifold. However, these approaches are limited because they require transformation labels when training their models and they lack a method for determining which regions of the manifold are appropriate for applying each specific operator. We address these limitations by introducing a learning strategy that does not require transformation labels and developing a method that learns the local regions where each operator is likely to be used while preserving the identity of inputs. Experiments on MNIST and Fashion MNIST highlight our model's ability to learn identity-preserving transformations on multi-class datasets. Additionally, we train on CelebA to showcase our model's ability to learn semantically meaningful transformations on complex datasets in an unsupervised manner.  ( 3 min )
    Attention in a family of Boltzmann machines emerging from modern Hopfield networks. (arXiv:2212.04692v2 [cs.LG] UPDATED)
    Hopfield networks and Boltzmann machines (BMs) are fundamental energy-based neural network models. Recent studies on modern Hopfield networks have broaden the class of energy functions and led to a unified perspective on general Hopfield networks including an attention module. In this letter, we consider the BM counterparts of modern Hopfield networks using the associated energy functions, and study their salient properties from a trainability perspective. In particular, the energy function corresponding to the attention module naturally introduces a novel BM, which we refer to as the attentional BM (AttnBM). We verify that AttnBM has a tractable likelihood function and gradient for certain special cases and is easy to train. Moreover, we reveal the hidden connections between AttnBM and some single-layer models, namely the Gaussian--Bernoulli restricted BM and the denoising autoencoder with softmax units coming from denoising score matching. We also investigate BMs introduced by other energy functions and show that the energy function of dense associative memory models gives BMs belonging to Exponential Family Harmoniums.  ( 3 min )
    Diffusion Schr\"odinger Bridge Matching. (arXiv:2303.16852v1 [stat.ML])
    Solving transport problems, i.e. finding a map transporting one given distribution to another, has numerous applications in machine learning. Novel mass transport methods motivated by generative modeling have recently been proposed, e.g. Denoising Diffusion Models (DDMs) and Flow Matching Models (FMMs) implement such a transport through a Stochastic Differential Equation (SDE) or an Ordinary Differential Equation (ODE). However, while it is desirable in many applications to approximate the deterministic dynamic Optimal Transport (OT) map which admits attractive properties, DDMs and FMMs are not guaranteed to provide transports close to the OT map. In contrast, Schr\"odinger bridges (SBs) compute stochastic dynamic mappings which recover entropy-regularized versions of OT. Unfortunately, existing numerical methods approximating SBs either scale poorly with dimension or accumulate errors across iterations. In this work, we introduce Iterative Markovian Fitting, a new methodology for solving SB problems, and Diffusion Schr\"odinger Bridge Matching (DSBM), a novel numerical algorithm for computing IMF iterates. DSBM significantly improves over previous SB numerics and recovers as special/limiting cases various recent transport methods. We demonstrate the performance of DSBM on a variety of problems.  ( 2 min )
    Provable Robustness for Streaming Models with a Sliding Window. (arXiv:2303.16308v1 [cs.LG])
    The literature on provable robustness in machine learning has primarily focused on static prediction problems, such as image classification, in which input samples are assumed to be independent and model performance is measured as an expectation over the input distribution. Robustness certificates are derived for individual input instances with the assumption that the model is evaluated on each instance separately. However, in many deep learning applications such as online content recommendation and stock market analysis, models use historical data to make predictions. Robustness certificates based on the assumption of independent input samples are not directly applicable in such scenarios. In this work, we focus on the provable robustness of machine learning models in the context of data streams, where inputs are presented as a sequence of potentially correlated items. We derive robustness certificates for models that use a fixed-size sliding window over the input stream. Our guarantees hold for the average model performance across the entire stream and are independent of stream size, making them suitable for large data streams. We perform experiments on speech detection and human activity recognition tasks and show that our certificates can produce meaningful performance guarantees against adversarial perturbations.  ( 2 min )
    Non-Asymptotic Lower Bounds For Training Data Reconstruction. (arXiv:2303.16372v1 [cs.LG])
    We investigate semantic guarantees of private learning algorithms for their resilience to training Data Reconstruction Attacks (DRAs) by informed adversaries. To this end, we derive non-asymptotic minimax lower bounds on the adversary's reconstruction error against learners that satisfy differential privacy (DP) and metric differential privacy (mDP). Furthermore, we demonstrate that our lower bound analysis for the latter also covers the high dimensional regime, wherein, the input data dimensionality may be larger than the adversary's query budget. Motivated by the theoretical improvements conferred by metric DP, we extend the privacy analysis of popular deep learning algorithms such as DP-SGD and Projected Noisy SGD to cover the broader notion of metric differential privacy.  ( 2 min )
    Optimal approximation of $C^k$-functions using shallow complex-valued neural networks. (arXiv:2303.16813v1 [math.FA])
    We prove a quantitative result for the approximation of functions of regularity $C^k$ (in the sense of real variables) defined on the complex cube $\Omega_n := [-1,1]^n +i[-1,1]^n\subseteq \mathbb{C}^n$ using shallow complex-valued neural networks. Precisely, we consider neural networks with a single hidden layer and $m$ neurons, i.e., networks of the form $z \mapsto \sum_{j=1}^m \sigma_j \cdot \phi\big(\rho_j^T z + b_j\big)$ and show that one can approximate every function in $C^k \left( \Omega_n; \mathbb{C}\right)$ using a function of that form with error of the order $m^{-k/(2n)}$ as $m \to \infty$, provided that the activation function $\phi: \mathbb{C} \to \mathbb{C}$ is smooth but not polyharmonic on some non-empty open set. Furthermore, we show that the selection of the weights $\sigma_j, b_j \in \mathbb{C}$ and $\rho_j \in \mathbb{C}^n$ is continuous with respect to $f$ and prove that the derived rate of approximation is optimal under this continuity assumption. We also discuss the optimality of the result for a possibly discontinuous choice of the weights.  ( 2 min )
    Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation. (arXiv:2207.04475v2 [stat.ML] UPDATED)
    This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a $d$-dimensional linear system $\bar{\mathbf{A}} \theta = \bar{\mathbf{b}}$ for which $(\bar{\mathbf{A}}, \bar{\mathbf{b}})$ can only be estimated by (asymptotically) unbiased observations $\{(\mathbf{A}(Z_n),\mathbf{b}(Z_n))\}_{n \in \mathbb{N}}$. We consider here the case where $\{Z_n\}_{n \in \mathbb{N}}$ is an i.i.d. sequence or a uniformly geometrically ergodic Markov chain. We derive $p$-th moment and high-probability deviation bounds for the iterates defined by LSA and its Polyak-Ruppert-averaged version. Our finite-time instance-dependent bounds for the averaged LSA iterates are sharp in the sense that the leading term we obtain coincides with the local asymptotic minimax limit. Moreover, the remainder terms of our bounds admit a tight dependence on the mixing time $t_{\operatorname{mix}}$ of the underlying chain and the norm of the noise variables. We emphasize that our result requires the SA step size to scale only with logarithm of the problem dimension $d$.  ( 2 min )
    A Byzantine-Resilient Aggregation Scheme for Federated Learning via Matrix Autoregression on Client Updates. (arXiv:2303.16668v1 [cs.LG])
    In this work, we propose FLANDERS, a novel federated learning (FL) aggregation scheme robust to Byzantine attacks. FLANDERS considers the local model updates sent by clients at each FL round as a matrix-valued time series. Then, it identifies malicious clients as outliers of this time series by comparing actual observations with those estimated by a matrix autoregressive forecasting model. Experiments conducted on several datasets under different FL settings demonstrate that FLANDERS matches the robustness of the most powerful baselines against Byzantine clients. Furthermore, FLANDERS remains highly effective even under extremely severe attack scenarios, as opposed to existing defense strategies.
    An inexact linearized proximal algorithm for a class of DC composite optimization problems and applications. (arXiv:2303.16822v1 [math.OC])
    This paper is concerned with a class of DC composite optimization problems which, as an extension of the convex composite optimization problem and the DC program with nonsmooth components, often arises from robust factorization models of low-rank matrix recovery. For this class of nonconvex and nonsmooth problems, we propose an inexact linearized proximal algorithm (iLPA) which in each step computes an inexact minimizer of a strongly convex majorization constructed by the partial linearization of their objective functions. The generated iterate sequence is shown to be convergent under the Kurdyka-{\L}ojasiewicz (KL) property of a potential function, and the convergence admits a local R-linear rate if the potential function has the KL property of exponent $1/2$ at the limit point. For the latter assumption, we provide a verifiable condition by leveraging the composite structure, and clarify its relation with the regularity used for the convex composite optimization. Finally, the proposed iLPA is applied to a robust factorization model for matrix completions with outliers, DC programs with nonsmooth components, and $\ell_1$-norm exact penalty of DC constrained programs, and numerical comparison with the existing algorithms confirms the superiority of our iLPA in computing time and quality of solutions.
    Global Convergence of Over-parameterized Deep Equilibrium Models. (arXiv:2205.13814v2 [cs.LG] UPDATED)
    A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on the initial equilibrium point, we show that the unique equilibrium point always exists during the training process, and the gradient descent is proved to converge to a globally optimal solution at a linear convergence rate for the quadratic loss function. In order to show that the required initial condition is satisfied via mild over-parameterization, we perform a fine-grained analysis on random DEQs. We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.
    Nonlinear Independent Component Analysis for Principled Disentanglement in Unsupervised Deep Learning. (arXiv:2303.16535v1 [cs.LG])
    A central problem in unsupervised deep learning is how to find useful representations of high-dimensional data, sometimes called "disentanglement". Most approaches are heuristic and lack a proper theoretical foundation. In linear representation learning, independent component analysis (ICA) has been successful in many applications areas, and it is principled, i.e. based on a well-defined probabilistic model. However, extension of ICA to the nonlinear case has been problematic due to the lack of identifiability, i.e. uniqueness of the representation. Recently, nonlinear extensions that utilize temporal structure or some auxiliary information have been proposed. Such models are in fact identifiable, and consequently, an increasing number of algorithms have been developed. In particular, some self-supervised algorithms can be shown to estimate nonlinear ICA, even though they have initially been proposed from heuristic perspectives. This paper reviews the state-of-the-art of nonlinear ICA theory and algorithms.
    Context-aware Bayesian Mixed Multinomial Logit Model. (arXiv:2210.05737v2 [stat.ML] UPDATED)
    The mixed multinomial logit model assumes constant preference parameters of a decision-maker throughout different choice situations, which may be considered too strong for certain choice modelling applications. This paper proposes an effective approach to model context-dependent intra-respondent heterogeneity, thereby introducing the concept of the Context-aware Bayesian mixed multinomial logit model, where a neural network maps contextual information to interpretable shifts in the preference parameters of each individual in each choice occasion. The proposed model offers several key advantages. First, it supports both continuous and discrete variables, as well as complex non-linear interactions between both types of variables. Secondly, each context specification is considered jointly as a whole by the neural network rather than each variable being considered independently. Finally, since the neural network parameters are shared across all decision-makers, it can leverage information from other decision-makers to infer the effect of a particular context on a particular decision-maker. Even though the context-aware Bayesian mixed multinomial logit model allows for flexible interactions between attributes, the increase in computational complexity is minor, compared to the mixed multinomial logit model. We illustrate the concept and interpretation of the proposed model in a simulation study. We furthermore present a real-world case study from the travel behaviour domain - a bicycle route choice model, based on a large-scale, crowdsourced dataset of GPS trajectories including 119,448 trips made by 8,555 cyclists.
    PAC-Bayesian bounds for learning LTI-ss systems with input from empirical loss. (arXiv:2303.16816v1 [stat.ML])
    In this paper we derive a Probably Approxilmately Correct(PAC)-Bayesian error bound for linear time-invariant (LTI) stochastic dynamical systems with inputs. Such bounds are widespread in machine learning, and they are useful for characterizing the predictive power of models learned from finitely many data points. In particular, with the bound derived in this paper relates future average prediction errors with the prediction error generated by the model on the data used for learning. In turn, this allows us to provide finite-sample error bounds for a wide class of learning/system identification algorithms. Furthermore, as LTI systems are a sub-class of recurrent neural networks (RNNs), these error bounds could be a first step towards PAC-Bayesian bounds for RNNs.  ( 2 min )
    Convergence of Momentum-Based Heavy Ball Method with Batch Updating and/or Approximate Gradients. (arXiv:2303.16241v1 [math.OC])
    In this paper, we study the well-known "Heavy Ball" method for convex and nonconvex optimization introduced by Polyak in 1964, and establish its convergence under a variety of situations. Traditionally, most algorthms use "full-coordinate update," that is, at each step, very component of the argument is updated. However, when the dimension of the argument is very high, it is more efficient to update some but not all components of the argument at each iteration. We refer to this as "batch updating" in this paper. When gradient-based algorithms are used together with batch updating, in principle it is sufficient to compute only those components of the gradient for which the argument is to be updated. However, if a method such as back propagation is used to compute these components, computing only some components of gradient does not offer much savings over computing the entire gradient. Therefore, to achieve a noticeable reduction in CPU usage at each step, one can use first-order differences to approximate the gradient. The resulting estimates are biased, and also have unbounded variance. Thus some delicate analysis is required to ensure that the HB algorithm converge when batch updating is used instead of full-coordinate updating, and/or approximate gradients are used instead of true gradients. In this paper, we not only establish the almost sure convergence of the iterations to the stationary point(s) of the objective function, but also derive upper bounds on the rate of convergence. To the best of our knowledge, there is no other paper that combines all of these features.  ( 3 min )
    An Over-parameterized Exponential Regression. (arXiv:2303.16504v1 [cs.LG])
    Over the past few years, there has been a significant amount of research focused on studying the ReLU activation function, with the aim of achieving neural network convergence through over-parametrization. However, recent developments in the field of Large Language Models (LLMs) have sparked interest in the use of exponential activation functions, specifically in the attention mechanism. Mathematically, we define the neural function $F: \mathbb{R}^{d \times m} \times \mathbb{R}^d \rightarrow \mathbb{R}$ using an exponential activation function. Given a set of data points with labels $\{(x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)\} \subset \mathbb{R}^d \times \mathbb{R}$ where $n$ denotes the number of the data. Here $F(W(t),x)$ can be expressed as $F(W(t),x) := \sum_{r=1}^m a_r \exp(\langle w_r, x \rangle)$, where $m$ represents the number of neurons, and $w_r(t)$ are weights at time $t$. It's standard in literature that $a_r$ are the fixed weights and it's never changed during the training. We initialize the weights $W(0) \in \mathbb{R}^{d \times m}$ with random Gaussian distributions, such that $w_r(0) \sim \mathcal{N}(0, I_d)$ and initialize $a_r$ from random sign distribution for each $r \in [m]$. Using the gradient descent algorithm, we can find a weight $W(T)$ such that $\| F(W(T), X) - y \|_2 \leq \epsilon$ holds with probability $1-\delta$, where $\epsilon \in (0,0.1)$ and $m = \Omega(n^{2+o(1)}\log(n/\delta))$. To optimize the over-parameterization bound $m$, we employ several tight analysis techniques from previous studies [Song and Yang arXiv 2019, Munteanu, Omlor, Song and Woodruff ICML 2022].  ( 2 min )
    Maximum likelihood smoothing estimation in state-space models: An incomplete-information based approach. (arXiv:2303.16364v1 [stat.ME])
    This paper revisits classical works of Rauch (1963, et al. 1965) and develops a novel method for maximum likelihood (ML) smoothing estimation from incomplete information/data of stochastic state-space systems. Score function and conditional observed information matrices of incomplete data are introduced and their distributional identities are established. Using these identities, the ML smoother $\widehat{x}_{k\vert n}^s =\argmax_{x_k} \log f(x_k,\widehat{x}_{k+1\vert n}^s, y_{0:n}\vert\theta)$, $k\leq n-1$, is presented. The result shows that the ML smoother gives an estimate of state $x_k$ with more adherence of loglikehood having less standard errors than that of the ML state estimator $\widehat{x}_k=\argmax_{x_k} \log f(x_k,y_{0:k}\vert\theta)$, with $\widehat{x}_{n\vert n}^s=\widehat{x}_n$. Recursive estimation is given in terms of an EM-gradient-particle algorithm which extends the work of \cite{Lange} for ML smoothing estimation. The algorithm has an explicit iteration update which lacks in (\cite{Ramadan}) EM-algorithm for smoothing. A sequential Monte Carlo method is developed for valuation of the score function and observed information matrices. A recursive equation for the covariance matrix of estimation error is developed to calculate the standard errors. In the case of linear systems, the method shows that the Rauch-Tung-Striebel (RTS) smoother is a fully efficient smoothing state-estimator whose covariance matrix coincides with the Cram\'er-Rao lower bound, the inverse of expected information matrix. Furthermore, the RTS smoother coincides with the Kalman filter having less covariance matrix. Numerical studies are performed, confirming the accuracy of the main results.  ( 3 min )
    Correcting for Selection Bias and Missing Response in Regression using Privileged Information. (arXiv:2303.16800v1 [math.ST])
    When estimating a regression model, we might have data where some labels are missing, or our data might be biased by a selection mechanism. When the response or selection mechanism is ignorable (i.e., independent of the response variable given the features) one can use off-the-shelf regression methods; in the nonignorable case one typically has to adjust for bias. We observe that privileged data (i.e. data that is only available during training) might render a nonignorable selection mechanism ignorable, and we refer to this scenario as Privilegedly Missing at Random (PMAR). We propose a novel imputation-based regression method, named repeated regression, that is suitable for PMAR. We also consider an importance weighted regression method, and a doubly robust combination of the two. The proposed methods are easy to implement with most popular out-of-the-box regression algorithms. We empirically assess the performance of the proposed methods with extensive simulated experiments and on a synthetically augmented real-world dataset. We conclude that repeated regression can appropriately correct for bias, and can have considerable advantage over weighted regression, especially when extrapolating to regions of the feature space where response is never observed.  ( 2 min )
    Minimal Width for Universal Property of Deep RNN. (arXiv:2211.13866v2 [stat.ML] UPDATED)
    A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data. Imitating a dynamical system, an infinite-width RNN can approximate any open dynamical system in a compact domain. In general, deep networks with bounded widths are more effective than wide networks in practice; however, the universal approximation theorem for deep narrow structures has yet to be extensively studied. In this study, we prove the universality of deep narrow RNNs and show that the upper bound of the minimum width for universality can be independent of the length of the data. Specifically, we show that a deep RNN with ReLU activation can approximate any continuous function or $L^p$ function with the widths $d_x+d_y+2$ and $\max\{d_x+1,d_y\}$, respectively, where the target function maps a finite sequence of vectors in $\mathbb{R}^{d_x}$ to a finite sequence of vectors in $\mathbb{R}^{d_y}$. We also compute the additional width required if the activation function is $\tanh$ or more. In addition, we prove the universality of other recurrent networks, such as bidirectional RNNs. Bridging a multi-layer perceptron and an RNN, our theory and proof technique can be an initial step toward further research on deep RNNs.  ( 2 min )
    Lifting uniform learners via distributional decomposition. (arXiv:2303.16208v1 [stat.ML])
    We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution $\mathcal{D}$. The efficiency of our transformation scales with the inherent complexity of $\mathcal{D}$, running in $\mathrm{poly}(n, (md)^d)$ time for distributions over $\{\pm 1\}^n$ whose pmfs are computed by depth-$d$ decision trees, where $m$ is the sample complexity of the original algorithm. For monotone distributions our transformation uses only samples from $\mathcal{D}$, and for general ones it uses subcube conditioning samples. A key technical ingredient is an algorithm which, given the aforementioned access to $\mathcal{D}$, produces an optimal decision tree decomposition of $\mathcal{D}$: an approximation of $\mathcal{D}$ as a mixture of uniform distributions over disjoint subcubes. With this decomposition in hand, we run the uniform-distribution learner on each subcube and combine the hypotheses using the decision tree. This algorithmic decomposition lemma also yields new algorithms for learning decision tree distributions with runtimes that exponentially improve on the prior state of the art -- results of independent interest in distribution learning.  ( 2 min )
    Interpolating Discriminant Functions in High-Dimensional Gaussian Latent Mixtures. (arXiv:2210.14347v2 [stat.ML] UPDATED)
    This paper considers binary classification of high-dimensional features under a postulated model with a low-dimensional latent Gaussian mixture structure and non-vanishing noise. A generalized least squares estimator is used to estimate the direction of the optimal separating hyperplane. The estimated hyperplane is shown to interpolate on the training data. While the direction vector can be consistently estimated as could be expected from recent results in linear regression, a naive plug-in estimate fails to consistently estimate the intercept. A simple correction, that requires an independent hold-out sample, renders the procedure minimax optimal in many scenarios. The interpolation property of the latter procedure can be retained, but surprisingly depends on the way the labels are encoded.  ( 2 min )
    One-Step Estimation of Differentiable Hilbert-Valued Parameters. (arXiv:2303.16711v1 [math.ST])
    We present estimators for smooth Hilbert-valued parameters, where smoothness is characterized by a pathwise differentiability condition. When the parameter space is a reproducing kernel Hilbert space, we provide a means to obtain efficient, root-n rate estimators and corresponding confidence sets. These estimators correspond to generalizations of cross-fitted one-step estimators based on Hilbert-valued efficient influence functions. We give theoretical guarantees even when arbitrary estimators of nuisance functions are used, including those based on machine learning techniques. We show that these results naturally extend to Hilbert spaces that lack a reproducing kernel, as long as the parameter has an efficient influence function. However, we also uncover the unfortunate fact that, when there is no reproducing kernel, many interesting parameters fail to have an efficient influence function, even though they are pathwise differentiable. To handle these cases, we propose a regularized one-step estimator and associated confidence sets. We also show that pathwise differentiability, which is a central requirement of our approach, holds in many cases. Specifically, we provide multiple examples of pathwise differentiable parameters and develop corresponding estimators and confidence sets. Among these examples, four are particularly relevant to ongoing research by the causal inference community: the counterfactual density function, dose-response function, conditional average treatment effect function, and counterfactual kernel mean embedding.  ( 2 min )
    Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits. (arXiv:2206.05404v3 [stat.ML] UPDATED)
    We propose a linear contextual bandit algorithm with $O(\sqrt{dT\log T})$ regret bound, where $d$ is the dimension of contexts and $T$ isthe time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization. Depending on the randomization, our proposed estimator takes contributions either from contexts of all arms or from selected contexts. We establish a self-normalized bound for our estimator, which allows a novel decomposition of the cumulative regret into \textit{additive} dimension-dependent terms instead of multiplicative terms. We also prove a novel lower bound of $\Omega(\sqrt{dT})$ under our problem setting. Hence, the regret of our proposed algorithm matches the lower bound up to logarithmic factors. The numerical experiments support the theoretical guarantees and show that our proposed method outperforms the existing linear bandit algorithms.  ( 2 min )
    Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints. (arXiv:2303.16510v1 [stat.ML])
    Orthogonality constraints naturally appear in many machine learning problems, from Principal Components Analysis to robust neural network training. They are usually solved using Riemannian optimization algorithms, which minimize the objective function while enforcing the constraint. However, enforcing the orthogonality constraint can be the most time-consuming operation in such algorithms. Recently, Ablin & Peyr\'e (2022) proposed the Landing algorithm, a method with cheap iterations that does not enforce the orthogonality constraint but is attracted towards the manifold in a smooth manner. In this article, we provide new practical and theoretical developments for the landing algorithm. First, the method is extended to the Stiefel manifold, the set of rectangular orthogonal matrices. We also consider stochastic and variance reduction algorithms when the cost function is an average of many functions. We demonstrate that all these methods have the same rate of convergence as their Riemannian counterparts that exactly enforce the constraint. Finally, our experiments demonstrate the promise of our approach to an array of machine-learning problems that involve orthogonality constraints.  ( 2 min )
    Probabilistic inverse optimal control with local linearization for non-linear partially observable systems. (arXiv:2303.16698v1 [cs.LG])
    Inverse optimal control methods can be used to characterize behavior in sequential decision-making tasks. Most existing work, however, requires the control signals to be known, or is limited to fully-observable or linear systems. This paper introduces a probabilistic approach to inverse optimal control for stochastic non-linear systems with missing control signals and partial observability that unifies existing approaches. By using an explicit model of the noise characteristics of the sensory and control systems of the agent in conjunction with local linearization techniques, we derive an approximate likelihood for the model parameters, which can be computed within a single forward pass. We evaluate our proposed method on stochastic and partially observable version of classic control tasks, a navigation task, and a manual reaching task. The proposed method has broad applicability, ranging from imitation learning to sensorimotor neuroscience.  ( 2 min )
    Comparing Machine Learning Methods for Estimating Heterogeneous Treatment Effects by Combining Data from Multiple Randomized Controlled Trials. (arXiv:2303.16299v1 [stat.ME])
    Individualized treatment decisions can improve health outcomes, but using data to make these decisions in a reliable, precise, and generalizable way is challenging with a single dataset. Leveraging multiple randomized controlled trials allows for the combination of datasets with unconfounded treatment assignment to improve the power to estimate heterogeneous treatment effects. This paper discusses several non-parametric approaches for estimating heterogeneous treatment effects using data from multiple trials. We extend single-study methods to a scenario with multiple trials and explore their performance through a simulation study, with data generation scenarios that have differing levels of cross-trial heterogeneity. The simulations demonstrate that methods that directly allow for heterogeneity of the treatment effect across trials perform better than methods that do not, and that the choice of single-study method matters based on the functional form of the treatment effect. Finally, we discuss which methods perform well in each setting and then apply them to four randomized controlled trials to examine effect heterogeneity of treatments for major depressive disorder.  ( 2 min )
    Randomly Projected Convex Clustering Model: Motivation, Realization, and Cluster Recovery Guarantees. (arXiv:2303.16841v1 [cs.LG])
    In this paper, we propose a randomly projected convex clustering model for clustering a collection of $n$ high dimensional data points in $\mathbb{R}^d$ with $K$ hidden clusters. Compared to the convex clustering model for clustering original data with dimension $d$, we prove that, under some mild conditions, the perfect recovery of the cluster membership assignments of the convex clustering model, if exists, can be preserved by the randomly projected convex clustering model with embedding dimension $m = O(\epsilon^{-2}\log(n))$, where $0 < \epsilon < 1$ is some given parameter. We further prove that the embedding dimension can be improved to be $O(\epsilon^{-2}\log(K))$, which is independent of the number of data points. Extensive numerical experiment results will be presented in this paper to demonstrate the robustness and superior performance of the randomly projected convex clustering model. The numerical results presented in this paper also demonstrate that the randomly projected convex clustering model can outperform the randomly projected K-means model in practice.  ( 2 min )
    Operator learning with PCA-Net: upper and lower complexity bounds. (arXiv:2303.16317v1 [cs.LG])
    Neural operators are gaining attention in computational science and engineering. PCA-Net is a recently proposed neural operator architecture which combines principal component analysis (PCA) with neural networks to approximate an underlying operator. The present work develops approximation theory for this approach, improving and significantly extending previous work in this direction. In terms of qualitative bounds, this paper derives a novel universal approximation result, under minimal assumptions on the underlying operator and the data-generating distribution. In terms of quantitative bounds, two potential obstacles to efficient operator learning with PCA-Net are identified, and made rigorous through the derivation of lower complexity bounds; the first relates to the complexity of the output distribution, measured by a slow decay of the PCA eigenvalues. The other obstacle relates the inherent complexity of the space of operators between infinite-dimensional input and output spaces, resulting in a rigorous and quantifiable statement of the curse of dimensionality. In addition to these lower bounds, upper complexity bounds are derived; first, a suitable smoothness criterion is shown to ensure a algebraic decay of the PCA eigenvalues. Then, it is shown that PCA-Net can overcome the general curse of dimensionality for specific operators of interest, arising from the Darcy flow and Navier-Stokes equations.  ( 2 min )
    Maximum likelihood method revisited: Gauge symmetry in Kullback -- Leibler divergence and performance-guaranteed regularization. (arXiv:2303.16721v1 [stat.ML])
    The maximum likelihood method is the best-known method for estimating the probabilities behind the data. However, the conventional method obtains the probability model closest to the empirical distribution, resulting in overfitting. Then regularization methods prevent the model from being excessively close to the wrong probability, but little is known systematically about their performance. The idea of regularization is similar to error-correcting codes, which obtain optimal decoding by mixing suboptimal solutions with an incorrectly received code. The optimal decoding in error-correcting codes is achieved based on gauge symmetry. We propose a theoretically guaranteed regularization in the maximum likelihood method by focusing on a gauge symmetry in Kullback -- Leibler divergence. In our approach, we obtain the optimal model without the need to search for hyperparameters frequently appearing in regularization.  ( 2 min )

  • Open

    [P] Fabrice Bellard's TextSynth Server
    I recently came across this project on Mr. Bellard's website, (QEMU's creator) which is the backend behind TextSynth: "TextSynth provides access to large language or text-to-image models such as GPT-J, GPT-Neo, M2M100, CodeGen, Stable Diffusion thru a REST API and a playground. [...] TextSynth employs custom inference code to get faster inference (hence lower costs) on standard GPUs and CPUs. [...]" Here are TextSynth Server's features: "Supports many Transformer variants (GPT-J, GPT-NeoX, GPT-Neo, OPT, Fairseq GPT, M2M100, CodeGen, GPT2, T5, RWKV, LLAMA) and Stable Diffusion. Integrated REST JSON API for text completion, translation and image generation. It is used by textsynth.com. Very high performance for small and large batches on CPU and GPU. Support of dynamic batching …  ( 45 min )
    [D] Style/ object consistent image generation
    Hi all, Stable Diffusion has been shown to be capable of amazing text -> image generations. If the image generates an object or character in the scene, is it possible to create a new scene that contains the same object? For example, creating an OC character in one image, and then generating a series of images where the same character is doing a series of different tasks. Alternatively, this could be a playground, and then pictures of the same playground but destroyed, or filled with kids. This is different from variations since I would want the same object in the same style but in different contexts. To some extent I believe this idea of generation consistency could be applied to making consistent UIs or graphic content. I was wondering if there were any papers and/or repos that tackle this issue. Thanks all in advance! submitted by /u/Macetodaface [link] [comments]  ( 44 min )
    [D] Training a 65b LLaMA model
    I apologize if what I'm about to say sounds trivial, but I recently trained the 7b version of llama on my json dataset containing 122k questions and answers. The results were quite good, but I noticed that about 30% of the answers could be improved. I've heard that the 65b model is significantly better, so I'm interested in training it to see how it performs. I already tried Google Colab (high-ram), Paperspace, Deepnote, and Jetbrains, and all crashed. I'm wondering how I can realistically train the 65b model with my $1k budget and complete the training process without any major issues? Any advice is appreciated. submitted by /u/Business-Lead2679 [link] [comments]  ( 45 min )
    [D] Anyone aware of research that examines the effect of different beam sizes during LLM evaluation?
    I am surprisingly having trouble finding comparisons between beam sizes. Large papers seem to only declare what beam size they use, but I would love to see some sort of graph/data that shows the trend of accuracy as beam size increases. My intuition would be that large beam sizes would be extremely beneficial for difficult problems, but obviously also extremely computationally expensive. submitted by /u/BadassGhost [link] [comments]  ( 7 min )
    [P] Imaginary programming: implementation-free TypeScript functions for GPT-powered web development
    imaginary.dev is a project I built to allow web developers to use GPT to easily add AI features to their existing web user interfaces. All a developer does is declares a function prototype in TypeScript with a good comment saying what the function should do, and then they can call the function from other TypeScript and JavaScript code, even though they've never implemented the function in question. It looks something like: /** * This function takes in a blog post text and returns at least 5 good titles for the blog post. * The titles should be snappy and interesting and entice people to click on the blog post. * * @param blogPostText - string with the blog post text * @returns an array of at least 5 good, enticing titles for the blog post. * * @imaginary */ declare function titleForBlogPost(blogPostText: string): Promise>; Under the covers, we've written a TypeScript and Babel plugin that replaces these "imaginary function" declarations with runtime calls to GPT asking GPT what the theoretical function would return for a particular set of inputs. So it's not using GPT to write code (like CoPilot or Ghostwriter); it's using GPT to act as the runtime. This gives you freedom to implement things that you could never do in traditional programming: classification, extraction of structured information out of human language, translation, spell checking, creative generation, etc. Here's a screencast where I show off adding intelligent features to a simple blog post web app: https://www.loom.com/share/b367f4863fe843998270121131ae04d9 Let me know what you think. Is this useful? Is this something you think you'd enjoy using? Is this a good direction to take web development? Happy to hear any and all feedback! submitted by /u/xander76 [link] [comments]  ( 7 min )
    [D] Improvements/alternatives to U-net for medical images segmentation?
    Hello, I am working on a project in which I'm detecting cavities in X-rays. The dataset I have is pretty limited (~100 images). Each X-ray has a black and white mask that shows where in the image are the cavities. I'm trying to improve my results. What I've tried so far: different loss functions: BCE, dice loss, bce+dice, tversky loss, focal tversky loss modifying the images' gamma to make the cavities more visible trying out different U-Nets: U-net, V-net, U-net++, UNET 3+, Attention U-net, R2U-net, ResUnet-a, U^2-Net, TransUNET, and Swin-UNET None of the new U-nets that I've tried improved the results. Probably because they are more suited for a larger dataset. I'm now looking for other things to try to improve my results. Currently my network is detecting cavities, but it has trouble with the smaller ones. submitted by /u/viertys [link] [comments]  ( 47 min )
    [D] What would be the best way to build a catalogue of faces based on a bunch of videos?
    Hi, I have a bunch of videos (about 1000) that contain various people, about 300 people. I want to build an app that allows me to build a catalogue of the people that appear in every video, based on their faces, so that i can just ask, "In which videos does x person appear?" or "Which people appear in this video?" Are there any projects that already do this? If not, what would be the best libraries to achieve this? submitted by /u/Chance-Specialist132 [link] [comments]  ( 43 min )
    [Discussion] Using GIFs to contextualize LLM responses
    This post is related to the casual/conversational use of LLMs. It mostly applies to developers of novelty LLM apps but the conversation about user immersion could apply to serious machine learning circles as well. DISCUSSION IDEA Overlaying an LLM response onto a relevant GIF can be a low-cost method to increase user immersion It could be thought of as similar to attaching a TTS voice to the response. It makes it a little more human, like someone sending you memes. The proposed workflow for this is below: WORKFLOW Instruct the LLM to follow a strict response template of Relevant GIF name, Reply Extract GIF name and Reply separate Feed GIF name into a search (Giphy in this case) and return the first result Split GIF into frames Overlay Reply into each frame and combine into the final GIF Users can reply using text but the LLM retains the context of the conversation and returns appropriate GIFs overlain with corresponding text. Once text-to-video and VR develops further, user immersion would go to a whole new level - GIFs could be a useful intermediary. submitted by /u/Microsofte [link] [comments]  ( 44 min )
    [R] The Debate Over Understanding in AI’s Large Language Models
    submitted by /u/currentscurrents [link] [comments]  ( 45 min )
    [D] The best way to train an LLM on company data
    Hey guys, I want to train any LLM on my company’s data we have stored in Azure and Snowflake It’s all in tabular form, and I was wondering how can I train an LLM on the data, and be able to ask it questions about it. No computations required from the model, but at least be able to tell answer questions such as: What was Apple’s return compared to it’s sector last month ( we have financial data) - is it possible to train an LLM to understand tabluar data - is it possible to train it on Snowflake/Azure Any help or links would be appreciated! submitted by /u/jaxolingo [link] [comments]  ( 52 min )
    [D] llama 7b vs 65b ?
    Hello what are we talking in term of diminishing returns between the 2 models ? do the 65b really improve a lot ? bonus question: how to train the 7b model to learn specific field on my computer ? (makin it tailored to my needs) submitted by /u/deck4242 [link] [comments]  ( 43 min )
    [D] Alternatives to fb Hydra?
    I have been trying to find a nice tech stack I like for designing and running machine learning models, and currently I'm trying out mlflow, hydra, and optuna. However, hydra seems to have several limitations that are really annoying and are making me reconsider my choice. Most problematic is the inability to group parameters together in a multirun. Hydra only supports trying all combinations of parameters, as described in https://github.com/facebookresearch/hydra/issues/1258, which does not seem to be a priority for hydra. Furthermore, hydras optuna optimizer implementation does not allow for early pruning of bad runs, which while not a deal breaker is definitely a nice to have feature. What I do like about hydra is their ability to combine config yaml, using defaults. So does anyone have any good alternatives or suggestions for how to fix this or what to switch to? submitted by /u/alyflex [link] [comments]  ( 7 min )
    [Discussion] IsItBS: asking GPT to reflect x times will create a feedback loop that causes it to scrutinize itself x times?
    I've seen posts claiming this. There is a paper saying that self-scrutiny feedback loops can improve the performance of GPT-4 by 30%. I've experimented with feedback loops using the API and don't doubt that this can, or in future may be able to, produce emergent behaviour. I'm no expert but my surface-level understanding of transformers is that they would not create feedback loops just from prompting and would merely just respond as if they were. If it were true, it would have significant economical implications since creating the feedback loop separately multiplies the price each loop. submitted by /u/RedditPolluter [link] [comments]  ( 45 min )
    [D] Summer School on Systems Vision Science in Tuebingen, Germany, application deadline this Friday
    Systems vision science combines computational, behavioral, and neuroscience methods to discover functions and algorithms for vision in various brain regions and their implementations in neural circuits. Should be interesting for computer vision/machine learning researchers interested in learning about biological vision submitted by /u/retinex [link] [comments]  ( 43 min )
    [D] What are the problems you face in your data science workflow?
    Hi All, We are a team of data analyst. We are doing a survey on the problems data scientists face while doing their work. Please comment if you have anything to share from your experience. In our personal work, we have found that sharing our work across our team and saving a history of our work progress is challenging. Thanks submitted by /u/lightversetech [link] [comments]  ( 43 min )
    [R] You Only Segment Once: Towards Real-Time Panoptic Segmentation [CVPR 2023]
    Happy to introduce our latest work on Panoptic Segmentation: YOSO. And to the best of our knowledge, YOSO is the first read-time panoptic segmentation framework that delivers competitive performance compared to the state-of-the-art models. The code is available here: https://github.com/hujiecpp/YOSO Specifically, YOSO achieves 46.4 PQ, 45.5 FPS on COCO; 52.5 PQ, 22.6 FPS on Cityscapes; 38.0 PQ, 35.4 FPS on ADE20k and 34.1 PQ 7.1 FPS on Mapillary Vistas. ​ https://preview.redd.it/5bmcl1n6bmqa1.png?width=1362&format=png&auto=webp&s=7394a0abdfa31da1bebe7c2dc9cf18abde73db47 https://preview.redd.it/o4pkvri7bmqa1.png?width=674&format=png&auto=webp&s=118af84ba363c0ebcbf0ff21bf8c528d446688e6 https://preview.redd.it/ufdaa028bmqa1.png?width=681&format=png&auto=webp&s=f22f5d667ca91e4499a936c87aabb2fa023fff4e https://preview.redd.it/y2t2upj8bmqa1.png?width=603&format=png&auto=webp&s=d169d49c81fa8a00e6cddc12a36d453764362d5c https://preview.redd.it/3wiyep09bmqa1.png?width=597&format=png&auto=webp&s=b4beb8e8983ec5d750b78c79f0c775db3b49e427 https://preview.redd.it/g1lhwce9bmqa1.png?width=595&format=png&auto=webp&s=470a519abe02ab7022699a6d68580b252072b388 ​ submitted by /u/Technical-Vast1314 [link] [comments]  ( 44 min )
    [R] AI-Virology Integration
    Hello everyone, I am currently working on a research paper that explores the integration of AI-aided immune tweening with permafrost immunity. Permafrost melting has released ancient and novel pathogens that we have no cures for, posing a significant threat to public health. As a way of identifying these viruses, we have discovered that AI can perform fractal analysis called fractalomics, which helps us understand their shapes. However, I am seeking more specific AI-related topics to look into and any related papers or researchers. Additionally, any related ideas of integrating AI with microscopic biology and predictions of patterns would be helpful in understanding the impact of permafrost melting on public health. If anyone has any insights or knowledge on the use of AI in identifying and characterizing immuno-regulatory pathway sequences or related topics, please feel free to share. Any information related to permafrost immunity and its impact on immune surveillance, oral pathology, and public health would also be appreciated. Thank you for your help! submitted by /u/souper-nerd [link] [comments]  ( 44 min )
    [D] Do model weights have the same license as the modem architecture?
    Lightning AI released Lit-LLaMa: an architecture based on Meta’s LLaMa but with a more permissive license. However, they still rely on the weights trained by Meta, which have a license restricting commercial usage. Is developing the architecture enough to change the license associated with the model’s weights? submitted by /u/murphwalker [link] [comments]  ( 47 min )
    [D] Pause Giant AI Experiments: An Open Letter. Signatories include Stuart Russell, Elon Musk, and Steve Wozniak
    Letter: https://futureoflife.org/open-letter/pause-giant-ai-experiments/ The Future of Life Institute has come out with an open letter calling for a moratorium on AI training of models more powerful than GPT-4 with an impressive list of people signing on. I, for one, think this may be one of our only chances to stop the catastrophe that we are barreling into with the unregulated race to create the most powerful AIs possible by everyone with a big computer. What are your thoughts? Resources to Learn More About The Dangers of Misaligned Advanced AI: https://vkrakovna.wordpress.com/ai-safety-resources/ https://www.youtube.com/@RobertMilesAI View Poll submitted by /u/GenericNameRandomNum [link] [comments]  ( 44 min )
  • Open

    Getting lost with all these LLM-related projects
    ChatGPT, GPT-4, Alpaca, LLaMa, Bard, Bing GPT... LLMs have popped up like crypto projects two years ago. Beside ChatGPT with GPT-4, what others are worth tracking right now? Am I correct in saying that cloud-based go for ChatGPT, local go for Alpaca, and ignore the rest? submitted by /u/yzT- [link] [comments]  ( 43 min )
    I have created a monster
    submitted by /u/transdimensionalmeme [link] [comments]  ( 42 min )
    How much room to evolve does the AI field still have?
    Hey guys, i work as a software engineer. And am thinking of when finishing my degree to maybe get a masters and phd to work as an AI researcher. But i'm afraid that when the time comes where i could do research that wouldn't be any groundbreaking fields inside AI to improve. Is kind of a silly fear, but one i have nonetheless. What are you guys thoughts on the matter. Do you think that in 10 years AI still going to be able to evolve even more? Thanks. submitted by /u/Sherlockyz [link] [comments]  ( 43 min )
    Would you like/use an AI-centered social media app?
    I'm thinking about building a Twitter-like app focused exclusively on AI content. After ChatGPT was released I believe there was an inflection point and that the pace of change in AI is going to continue accelerating. Recently, I've noticed that all I'm interested in seeing when I go on Twitter or Reddit is AI progress, but that there is a bunch of extra fluff that I don't want. Both Reddit and Twitter are constantly trying to promote other things and distract you from what you came to look at. I'm doing this poll to see if anyone else would be interested in an AI-focused feed. ​ Appreciate feedback! View Poll submitted by /u/Xponential_AI [link] [comments]  ( 43 min )
    An extremely deep dive conversation with Sam Altman: OpenAI CEO covering GPT-4 and the Future of AI with Lex Fridman.
    submitted by /u/AverageCowboyCentaur [link] [comments]  ( 6 min )
    Data Entry Automation & AI - Where to start?
    Hi everyone, I'm very new to the AI train (only started paying attention at GPT-3 and really getting interested with the news about GPT-4), and was hoping you could help point me in the right direction for integrating AI & automation into my work. I am a line manager in a large organisation (sorry I don't want to give too many details as it is a government body), and a big part of the work we do is data entry. I am certain that we could use AI to improve our output in a major way and I am curious if anyone could point me in the right direction to get started. Basically we receive applications, both digitally and on paper, and the information on the applications is entered into our internal systems. Any missing documents are requested and when everything is finally received we send the completed applications to the next stage in the process. This area is ripe for automation - the work is repetitive, relatively simple but time consuming due to the volumes of work needing completion, and if it could be automated with an AI trained to handle 95% of the cases could save 10s of millions of dollars a year, plus our staff could actually get on with the interesting parts of their jobs. It'd be a pretty radical shift in policy if I could get it across the line, but I think it's worth exploring but I need to know where to start. Which AI tools are going to be the best optimized to assist me? I know GPT4 can write code and such, but what about reading human handwriting and training it to use our software (which is wildly outdated anyway)? Any advice you have would be amazing really. submitted by /u/improvethyself1314 [link] [comments]  ( 7 min )
    A program or intelligence to verify prescriptions?
    I'm a pharmacist and have absolutely no training in software development. Anyone have any insights on whether it's possible and with what a program could read and verify prescriptions for certain types of errors (wrong drug/dose/instructions for a certain patient/ patient history) Day to day I see so many medication errors that are caught but humans are not flawless. I'm just wondering if it's possible, been done? Or am I just dreaming. submitted by /u/moneyisjustplastic [link] [comments]  ( 42 min )
    I'm Looking for a Photo Organization Service
    I've spent the last decade as a hobbies photographer. Landscapes, nature, street, that kind of thing. I have amassed close to a quarter of a million photos, all raw, high resolution taken with a DSLR. What I'm looking for is a way to organize, potentially rank my photos, and potentially have an AI analyze them for "quality". I have seen this kind of thing done with faces and headshots, but I'm not looking for that. (I assume I'd have to mass process the images into a smaller resolution, which is fine). Does such a service exist? Does something similar exist? Thanks for any input. submitted by /u/venicerocco [link] [comments]  ( 43 min )
    Killswitch Engineer
    submitted by /u/jaketocake [link] [comments]  ( 6 min )
    Law Ai
    My friend who is an attorney wants to input some legal work product and train Ai for future work product creation. Any recommendations on how to start such a thing? submitted by /u/mikemongo [link] [comments]  ( 43 min )
    gpt roasts devs
    submitted by /u/Tomoko--Kuroki [link] [comments]  ( 42 min )
    [Reuters] Elon Musk and others urge AI pause, citing risks to society (and their prospects of building an insurmountable tech lead before anyone else can challenge them)
    submitted by /u/farraway45 [link] [comments]  ( 49 min )
    Reading Clubs for ML papers. Who wants to join?
    Hey! My friend, a Ph.D. student in Computer Science at Oxford and an MSc graduate from Cambridge, and I (a Backend Engineer), started a reading club where we go through 20 research papers that cover 80% of what matters today Our goal is to read one paper a week, then meet to discuss it and share knowledge, and insights and keep each other accountable, etc. I shared it with a few friends and was surprised by the high interest to join. So I decided to invite you guys to join us as well. We are looking for ML enthusiasts that want to join our reading clubs (there are already 3 groups). The concept is simple - we have a discord that hosts all of the “readers” and I split all readers (by their background) into small groups of 6, some of them are more active (doing additional exercises, etc it depends on you.), and some are less demanding and mostly focus on reading the papers. As for prerequisites, I think its recommended to have at least BSC in CS or equivalent knowledge and the ability to read scientific papers in English ​ If any of you are interested to join please comment below And if you have any suggestions feel free to let me know ​ Some of the articles on our list: Attention is all you need BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding A Style-Based Generator Architecture for Generative Adversarial Networks Mastering the Game of Go with Deep Neural Networks and Tree Search Deep Neural Networks for YouTube Recommendations submitted by /u/__god_bless_you_ [link] [comments]  ( 7 min )
    Fear mongering by Elon Musk, experts : urge pause on training AI systems more powerful than GPT-4
    submitted by /u/ahivarn [link] [comments]  ( 45 min )
    Let’s make a thread of FREE AI TOOLS you would recommend
    Tons of AI tools are being generated but only few are powerful and free like ChatGPT. Please add the free AI tools you’ve personally used with the best use case to help the community. submitted by /u/superzzgirl [link] [comments]  ( 42 min )
    I need help choosing a research direction
    Hi, can someone give me advice on what sort of research projects I can look for based on my interests? I'm finishing a BSc in Mathematics and what I've enjoyed most so far have been set theory, mathematical logic and theoretical computer science (I've sort of always planned to do something CS-related from a math angle, but am interested to hear about anything). I have been learning AI on the side and through summer jobs, and am particularly interested in doing something related to interpretability of LLMs. Is there anyone with a similar set of interests who can tell me what sort of work (or maybe niche areas of Math I might not have heard of) they've enjoyed? I've seen AI projects that mention symbolic AI, game theoretic approaches, logic based reasoning etc. However, 1) they all sound interesting but since I don't understand everything in the project descriptions, it's hard to tell if I'd find them interesting, what kind of background they're best suited for, whether they entail engineering or theoretical work etc., and 2) I'd still like advice to make sure there aren't opportunities I'm missing. I'd especially appreciate concrete research programs that I can google (eg I liked this one https://safeandtrustedai.org/apply-now/#projects and am considering applying to the Center for Long Term Risk's internship). Anything is helpful. Thank you!! submitted by /u/anamarijakozina [link] [comments]  ( 43 min )
    Viral image of Pope Francis underscores danger of AI spreading misinformation, experts say
    ​ PHOTO CREDIT: MIDJOURNEY By Shannon Larson Over the weekend, an image of Pope Francis spread like wildfire on social media platforms. In place of his daily attire of austere white cassocks, he adopted a more modern look — a long, trendy white puffer coat, his traditional pectoral cross resting above a cinched waist. The pontiff exuded “drip” and “swagger,” in the words of online commenters, and received widespread praise for his striking style choices. While many assumed the picture was genuine, given its realistic look and a pope known for breaking convention, it was entirely fake. The visual is the latest example of an image produced using an easily accessible image generator powered by artificial intelligence. In this case, the program was Midjourney, with the deepfake image shar…  ( 46 min )
    i can handle this
    submitted by /u/faxfrag [link] [comments]  ( 42 min )
    Is there a way?
    With the new capabilities of AI. I’d like to start training a bot with all my Memories, views and speech habits. Something like a chatbot biography that my kids can talk to after I’m gone? Any suggestions? submitted by /u/Justdudeatplay [link] [comments]  ( 45 min )
    Microsoft has reportedly threatened to restrict access to data for companies using rival AI and search tools
    This move could potentially harm Microsoft's competitors in the tech industry, who rely on access to this data to improve their own AI products. This move by Microsoft could be seen as an attempt to maintain its dominance in the AI industry and limit competition. However, it could also draw scrutiny from regulators, who have been increasingly focused on antitrust concerns in the tech industry. Microsoft has not yet commented on the report, and it is unclear what specific actions the company may take in the future. However, this development is sure to be closely watched by competitors and industry observers alike. submitted by /u/ChirperChiara [link] [comments]  ( 7 min )
    It will become necessary to adopt the term BAI (Before AI) and AAI (After AI) in order to distinguish a time when images, texts, media and more was being exclusively produced by humans
    Scientific papers are being produced right now with the help of GPT4, AI images are becoming indistinguishable from real life images, everything will change so much in the coming decade that we will have the need to distinguish between two eras. submitted by /u/arcotime29 [link] [comments]  ( 7 min )
    Interesting interview with Sam Altman
    Great listen discussing AGI and ChatGPT submitted by /u/acatinasweater [link] [comments]  ( 42 min )
    Breaking Bad as The Simpsons
    submitted by /u/fignewtgingrich [link] [comments]  ( 42 min )
    What will be the posible implications if OpenAI become actually Open?
    Most of the latest work of OpenAI in models like GPT-4 is totally close, they says is because the competition and security. Seeing how impresive is the technology in his infancy and how much they try to hold the models to be "polite" and "unbiassed", what will be possible implications if they release all the models and theirs dataset to the public? How dangerous could it be if they allow anyone modify the model to fit theirs necessities, being ethical or not. ​ Would you like the tecnology open or not? submitted by /u/NeoCiber [link] [comments]  ( 7 min )
  • Open

    How come I can use PPO for CarRacing but not SAC
    I am doing a university project where I am comparing different RL algorithms on gym environments. I want to compare SAC (and some others) to PPO benchmarking on the CarRacing environment but I keep getting this error: MemoryError: Unable to allocate 25.7 GiB for an array with shape (1000000, 1, 3, 96, 96) and data type uint8 ​ https://preview.redd.it/z7jh5eoz7rqa1.png?width=1920&format=png&auto=webp&s=8a1bec5d4c0b2deb3d11dcb03ab4137affe3ffc6 ​ Anyone know why? submitted by /u/stainedmug [link] [comments]  ( 43 min )
    Extending The Monte Carlo CFR With Importance Sampling For Agent Exploration
    submitted by /u/abstractcontrol [link] [comments]  ( 7 min )
  • Open

    HAYAT HOLDING uses Amazon SageMaker to increase product quality and optimize manufacturing output, saving $300,000 annually
    This is a guest post by Neslihan Erdogan, Global Industrial IT Manager at HAYAT HOLDING. With the ongoing digitization of the manufacturing processes and Industry 4.0, there is enormous potential to use machine learning (ML) for quality prediction. Process manufacturing is a production method that uses formulas or recipes to produce goods by combining ingredients […]  ( 11 min )
    Achieve effective business outcomes with no-code machine learning using Amazon SageMaker Canvas
    On November 30, 2021, we announced the general availability of Amazon SageMaker Canvas, a visual point-and-click interface that enables business analysts to generate highly accurate machine learning (ML) predictions without having to write a single line of code. With Canvas, you can take ML mainstream throughout your organization so business analysts without data science or […]  ( 7 min )
    How the UNDP Independent Evaluation Office is using AWS AI/ML services to enhance the use of evaluation to support progress toward the Sustainable Development Goals
    The United Nations (UN) was founded in 1945 by 51 original Member States committed to maintaining international peace and security, developing friendly relations among nations, and promoting social progress, better living standards, and human rights. The UN is currently made up of 193 Member States and has evolved over the years to keep pace with […]  ( 9 min )
  • Open

    Research Focus: Week of March 27, 2023
    Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft. Machine translation (MT) models are designed to learn from large amounts of data collected from real-world sources. However, this training data may contain implicit biases which may be […] The post Research Focus: Week of March 27, 2023 appeared first on Microsoft Research.  ( 9 min )
  • Open

    Bacterial injection system delivers proteins in mice and human cells
    With further development, the programmable system could be used in a range of applications including gene and cancer therapies.  ( 8 min )
  • Open

    DSC Weekly 29 March 2023 – New Books and Courses Explore Synthetic Data, ML Strategies
    Announcements New Books and Courses Explore Synthetic Data, ML Strategies MLtechniques released two new books recently. The first one, version 4.1 now, deals with synthetic data. This updated version includes a chapter on GAN (generative adversarial networks), with a comparison to more traditional methods such as copulas. Applied to real-life datasets, the author discusses the… Read More »DSC Weekly 29 March 2023 – New Books and Courses Explore Synthetic Data, ML Strategies The post DSC Weekly 29 March 2023 – New Books and Courses Explore Synthetic Data, ML Strategies appeared first on Data Science Central.  ( 19 min )
  • Open

    Blender Update 3.5 Fuels 3D Content Creation, Powered by NVIDIA GeForce RTX GPUs
    Blender, the world’s most popular 3D creation suite — free and open source — released its major version 3.5 update. Expected to have a profound impact on 3D creative workflows, this latest release features support for Open Shading Language (OSL) shaders with the NVIDIA OptiX ray-tracing engine.  ( 7 min )
    Ubisoft’s Yves Jacquier on How Generative AI Will Revolutionize Gaming
    Tools like ChatGPT have awakened the world to the potential of generative AI. Now, much more is coming. On the latest episode of the NVIDIA AI Podcast, Yves Jacquier, executive director of Ubisoft La Forge, shares valuable insights into the transformative potential of generative AI in the gaming industry. With over two decades of experience Read article >  ( 5 min )
  • Open

    Privacy implications of hashing data
    Cryptographic hash functions are also known as one-way functions because given an input x, one can easily compute its hashed value f(x), but it is impractical to recover x from knowing f(x). However, if we know that x comes from a small universe of possible values, our one-way function can effectively become a two-way function, […] Privacy implications of hashing data first appeared on John D. Cook.  ( 7 min )
  • Open

    AI-Virology Integration
    Hello everyone, I am currently working on a research paper that explores the integration of AI-aided immune tweening with permafrost immunity. Permafrost melting has released ancient and novel pathogens that we have no cures for, posing a significant threat to public health. As a way of identifying these viruses, we have discovered that AI can perform fractal analysis called fractalomics, which helps us understand their shapes. However, I am seeking more specific AI-related topics to look into and any related papers or researchers. Additionally, any related ideas of integrating AI with microscopic biology and predictions of patterns would be helpful in understanding the impact of permafrost melting on public health. If anyone has any insights or knowledge on the use of AI in identifying and characterizing immuno-regulatory pathway sequences or related topics, please feel free to share. Any information related to permafrost immunity and its impact on immune surveillance, oral pathology, and public health would also be appreciated. Thank you for your help! submitted by /u/souper-nerd [link] [comments]  ( 42 min )
    Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators (12G VRAM)
    submitted by /u/nickb [link] [comments]  ( 41 min )
  • Open

    Visual Chain-of-Thought Diffusion Models. (arXiv:2303.16187v1 [cs.CV])
    Recent progress with conditional image diffusion models has been stunning, and this holds true whether we are speaking about models conditioned on a text description, a scene layout, or a sketch. Unconditional image diffusion models are also improving but lag behind, as do diffusion models which are conditioned on lower-dimensional features like class labels. We propose to close the gap between conditional and unconditional models using a two-stage sampling procedure. In the first stage we sample an embedding describing the semantic content of the image. In the second stage we sample the image conditioned on this embedding and then discard the embedding. Doing so lets us leverage the power of conditional diffusion models on the unconditional generation task, which we show improves FID by 25-50% compared to standard unconditional generation.  ( 2 min )
    End-To-End Optimization of LiDAR Beam Configuration for 3D Object Detection and Localization. (arXiv:2201.03860v2 [cs.RO] UPDATED)
    Existing learning methods for LiDAR-based applications use 3D points scanned under a pre-determined beam configuration, e.g., the elevation angles of beams are often evenly distributed. Those fixed configurations are task-agnostic, so simply using them can lead to sub-optimal performance. In this work, we take a new route to learn to optimize the LiDAR beam configuration for a given application. Specifically, we propose a reinforcement learning-based learning-to-optimize (RL-L2O) framework to automatically optimize the beam configuration in an end-to-end manner for different LiDAR-based applications. The optimization is guided by the final performance of the target task and thus our method can be integrated easily with any LiDAR-based application as a simple drop-in module. The method is especially useful when a low-resolution (low-cost) LiDAR is needed, for instance, for system deployment at a massive scale. We use our method to search for the beam configuration of a low-resolution LiDAR for two important tasks: 3D object detection and localization. Experiments show that the proposed RL-L2O method improves the performance in both tasks significantly compared to the baseline methods. We believe that a combination of our method with the recent advances of programmable LiDARs can start a new research direction for LiDAR-based active perception. The code is publicly available at https://github.com/vniclas/lidar_beam_selection  ( 3 min )
    Generalization and Stability of Interpolating Neural Networks with Minimal Width. (arXiv:2302.09235v2 [stat.ML] UPDATED)
    We investigate the generalization and optimization properties of shallow neural-network classifiers trained by gradient descent in the interpolating regime. Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error $\epsilon$ and their distance from initialization is $g(\epsilon)$, we demonstrate that gradient descent with $n$ training data achieves training error $O(g(1/T)^2 /T)$ and generalization error $O(g(1/T)^2 /n)$ at iteration $T$, provided there are at least $m=\Omega(g(1/T)^4)$ hidden neurons. We then show that our realizable setting encompasses a special case where data are separable by the model's neural tangent kernel. For this and logistic-loss minimization, we prove the training loss decays at a rate of $\tilde O(1/ T)$ given polylogarithmic number of neurons $m=\Omega(\log^4 (T))$. Moreover, with $m=\Omega(\log^{4} (n))$ neurons and $T\approx n$ iterations, we bound the test loss by $\tilde{O}(1/n)$. Our results differ from existing generalization outcomes using the algorithmic-stability framework, which necessitate polynomial width and yield suboptimal generalization rates. Central to our analysis is the use of a new self-bounded weak-convexity property, which leads to a generalized local quasi-convexity property for sufficiently parameterized neural-network classifiers. Eventually, despite the objective's non-convexity, this leads to convergence and generalization-gap bounds that resemble those found in the convex setting of linear logistic regression.  ( 2 min )
    ASIC: Aligning Sparse in-the-wild Image Collections. (arXiv:2303.16201v1 [cs.CV])
    We present a method for joint alignment of sparse in-the-wild image collections of an object category. Most prior works assume either ground-truth keypoint annotations or a large dataset of images of a single object category. However, neither of the above assumptions hold true for the long-tail of the objects present in the world. We present a self-supervised technique that directly optimizes on a sparse collection of images of a particular object/object category to obtain consistent dense correspondences across the collection. We use pairwise nearest neighbors obtained from deep features of a pre-trained vision transformer (ViT) model as noisy and sparse keypoint matches and make them dense and accurate matches by optimizing a neural network that jointly maps the image collection into a learned canonical grid. Experiments on CUB and SPair-71k benchmarks demonstrate that our method can produce globally consistent and higher quality correspondences across the image collection when compared to existing self-supervised methods. Code and other material will be made available at \url{https://kampta.github.io/asic}.  ( 2 min )
    Neural Collapse Inspired Federated Learning with Non-iid Data. (arXiv:2303.16066v1 [cs.LG])
    One of the challenges in federated learning is the non-independent and identically distributed (non-iid) characteristics between heterogeneous devices, which cause significant differences in local updates and affect the performance of the central server. Although many studies have been proposed to address this challenge, they only focus on local training and aggregation processes to smooth the changes and fail to achieve high performance with deep learning models. Inspired by the phenomenon of neural collapse, we force each client to be optimized toward an optimal global structure for classification. Specifically, we initialize it as a random simplex Equiangular Tight Frame (ETF) and fix it as the unit optimization target of all clients during the local updating. After guaranteeing all clients are learning to converge to the global optimum, we propose to add a global memory vector for each category to remedy the parameter fluctuation caused by the bias of the intra-class condition distribution among clients. Our experimental results show that our method can improve the performance with faster convergence speed on different-size datasets.  ( 2 min )
    Sparse Gaussian Processes with Spherical Harmonic Features Revisited. (arXiv:2303.15948v1 [stat.ML])
    We revisit the Gaussian process model with spherical harmonic features and study connections between the associated RKHS, its eigenstructure and deep models. Based on this, we introduce a new class of kernels which correspond to deep models of continuous depth. In our formulation, depth can be estimated as a kernel hyper-parameter by optimizing the evidence lower bound. Further, we introduce sparseness in the eigenbasis by variational learning of the spherical harmonic phases. This enables scaling to larger input dimensions than previously, while also allowing for learning of high frequency variations. We validate our approach on machine learning benchmark datasets.  ( 2 min )
    A Statistical Model for Predicting Generalization in Few-Shot Classification. (arXiv:2212.06461v2 [cs.LG] UPDATED)
    The estimation of the generalization error of classifiers often relies on a validation set. Such a set is hardly available in few-shot learning scenarios, a highly disregarded shortcoming in the field. In these scenarios, it is common to rely on features extracted from pre-trained neural networks combined with distance-based classifiers such as nearest class mean. In this work, we introduce a Gaussian model of the feature distribution. By estimating the parameters of this model, we are able to predict the generalization error on new classification tasks with few samples. We observe that accurate distance estimates between class-conditional densities are the key to accurate estimates of the generalization performance. Therefore, we propose an unbiased estimator for these distances and integrate it in our numerical analysis. We empirically show that our approach outperforms alternatives such as the leave-one-out cross-validation strategy.  ( 2 min )
    Invariant preservation in machine learned PDE solvers via error correction. (arXiv:2303.16110v1 [math.NA])
    Machine learned partial differential equation (PDE) solvers trade the reliability of standard numerical methods for potential gains in accuracy and/or speed. The only way for a solver to guarantee that it outputs the exact solution is to use a convergent method in the limit that the grid spacing $\Delta x$ and timestep $\Delta t$ approach zero. Machine learned solvers, which learn to update the solution at large $\Delta x$ and/or $\Delta t$, can never guarantee perfect accuracy. Some amount of error is inevitable, so the question becomes: how do we constrain machine learned solvers to give us the sorts of errors that we are willing to tolerate? In this paper, we design more reliable machine learned PDE solvers by preserving discrete analogues of the continuous invariants of the underlying PDE. Examples of such invariants include conservation of mass, conservation of energy, the second law of thermodynamics, and/or non-negative density. Our key insight is simple: to preserve invariants, at each timestep apply an error-correcting algorithm to the update rule. Though this strategy is different from how standard solvers preserve invariants, it is necessary to retain the flexibility that allows machine learned solvers to be accurate at large $\Delta x$ and/or $\Delta t$. This strategy can be applied to any autoregressive solver for any time-dependent PDE in arbitrary geometries with arbitrary boundary conditions. Although this strategy is very general, the specific error-correcting algorithms need to be tailored to the invariants of the underlying equations as well as to the solution representation and time-stepping scheme of the solver. The error-correcting algorithms we introduce have two key properties. First, by preserving the right invariants they guarantee numerical stability. Second, in closed or periodic systems they do so without degrading the accuracy of an already-accurate solver.  ( 3 min )
    Exploring Deep Learning Methods for Classification of SAR Images: Towards NextGen Convolutions via Transformers. (arXiv:2303.15852v1 [eess.IV])
    Images generated by high-resolution SAR have vast areas of application as they can work better in adverse light and weather conditions. One such area of application is in the military systems. This study is an attempt to explore the suitability of current state-of-the-art models introduced in the domain of computer vision for SAR target classification (MSTAR). Since the application of any solution produced for military systems would be strategic and real-time, accuracy is often not the only criterion to measure its performance. Other important parameters like prediction time and input resiliency are equally important. The paper deals with these issues in the context of SAR images. Experimental results show that deep learning models can be suitably applied in the domain of SAR image classification with the desired performance levels.  ( 2 min )
    Ecosystem Graphs: The Social Footprint of Foundation Models. (arXiv:2303.15772v1 [cs.LG])
    Foundation models (e.g. ChatGPT, StableDiffusion) pervasively influence society, warranting immediate social attention. While the models themselves garner much attention, to accurately characterize their impact, we must consider the broader sociotechnical ecosystem. We propose Ecosystem Graphs as a documentation framework to transparently centralize knowledge of this ecosystem. Ecosystem Graphs is composed of assets (datasets, models, applications) linked together by dependencies that indicate technical (e.g. how Bing relies on GPT-4) and social (e.g. how Microsoft relies on OpenAI) relationships. To supplement the graph structure, each asset is further enriched with fine-grained metadata (e.g. the license or training emissions). We document the ecosystem extensively at https://crfm.stanford.edu/ecosystem-graphs/. As of March 16, 2023, we annotate 262 assets (64 datasets, 128 models, 70 applications) from 63 organizations linked by 356 dependencies. We show Ecosystem Graphs functions as a powerful abstraction and interface for achieving the minimum transparency required to address myriad use cases. Therefore, we envision Ecosystem Graphs will be a community-maintained resource that provides value to stakeholders spanning AI researchers, industry professionals, social scientists, auditors and policymakers.  ( 2 min )
    Behavioral Machine Learning? Computer Predictions of Corporate Earnings also Overreact. (arXiv:2303.16158v1 [q-fin.ST])
    There is considerable evidence that machine learning algorithms have better predictive abilities than humans in various financial settings. But, the literature has not tested whether these algorithmic predictions are more rational than human predictions. We study the predictions of corporate earnings from several algorithms, notably linear regressions and a popular algorithm called Gradient Boosted Regression Trees (GBRT). On average, GBRT outperformed both linear regressions and human stock analysts, but it still overreacted to news and did not satisfy rational expectation as normally defined. By reducing the learning rate, the magnitude of overreaction can be minimized, but it comes with the cost of poorer out-of-sample prediction accuracy. Human stock analysts who have been trained in machine learning methods overreact less than traditionally trained analysts. Additionally, stock analyst predictions reflect information not otherwise available to machine algorithms.  ( 2 min )
    Transformer and Snowball Graph Convolution Learning for Biomedical Graph Classification. (arXiv:2303.16132v1 [cs.LG])
    Graph or network has been widely used for describing and modeling complex systems in biomedicine. Deep learning methods, especially graph neural networks (GNNs), have been developed to learn and predict with such structured data. In this paper, we proposed a novel transformer and snowball encoding networks (TSEN) for biomedical graph classification, which introduced transformer architecture with graph snowball connection into GNNs for learning whole-graph representation. TSEN combined graph snowball connection with graph transformer by snowball encoding layers, which enhanced the power to capture multi-scale information and global patterns to learn the whole-graph features. On the other hand, TSEN also used snowball graph convolution as position embedding in transformer structure, which was a simple yet effective method for capturing local patterns naturally. Results of experiments using four graph classification datasets demonstrated that TSEN outperformed the state-of-the-art typical GNN models and the graph-transformer based GNN models.  ( 2 min )
    Guided Transfer Learning. (arXiv:2303.16154v1 [cs.LG])
    Machine learning requires exuberant amounts of data and computation. Also, models require equally excessive growth in the number of parameters. It is, therefore, sensible to look for technologies that reduce these demands on resources. Here, we propose an approach called guided transfer learning. Each weight and bias in the network has its own guiding parameter that indicates how much this parameter is allowed to change while learning a new task. Guiding parameters are learned during an initial scouting process. Guided transfer learning can result in a reduction in resources needed to train a network. In some applications, guided transfer learning enables the network to learn from a small amount of data. In other cases, a network with a smaller number of parameters can learn a task which otherwise only a larger network could learn. Guided transfer learning potentially has many applications when the amount of data, model size, or the availability of computational resources reach their limits.
    JANA: Jointly Amortized Neural Approximation of Complex Bayesian Models. (arXiv:2302.09125v2 [cs.LG] UPDATED)
    This work proposes ''jointly amortized neural approximation'' (JANA) of intractable likelihood functions and posterior densities arising in Bayesian surrogate modeling and simulation-based inference. We train three complementary networks in an end-to-end fashion: 1) a summary network to compress individual data points, sets, or time series into informative embedding vectors; 2) a posterior network to learn an amortized approximate posterior; and 3) a likelihood network to learn an amortized approximate likelihood. Their interaction opens a new route to amortized marginal likelihood and posterior predictive estimation -- two important ingredients of Bayesian workflows that are often too expensive for standard methods. We benchmark the fidelity of JANA on a variety of simulation models against state-of-the-art Bayesian methods and propose a powerful and interpretable diagnostic for joint calibration. In addition, we investigate the ability of recurrent likelihood networks to emulate complex time series models without resorting to hand-crafted summary statistics.
    Understanding and Exploring the Whole Set of Good Sparse Generalized Additive Models. (arXiv:2303.16047v1 [cs.LG])
    In real applications, interaction between machine learning model and domain experts is critical; however, the classical machine learning paradigm that usually produces only a single model does not facilitate such interaction. Approximating and exploring the Rashomon set, i.e., the set of all near-optimal models, addresses this practical challenge by providing the user with a searchable space containing a diverse set of models from which domain experts can choose. We present a technique to efficiently and accurately approximate the Rashomon set of sparse, generalized additive models (GAMs). We present algorithms to approximate the Rashomon set of GAMs with ellipsoids for fixed support sets and use these ellipsoids to approximate Rashomon sets for many different support sets. The approximated Rashomon set serves as a cornerstone to solve practical challenges such as (1) studying the variable importance for the model class; (2) finding models under user-specified constraints (monotonicity, direct editing); (3) investigating sudden changes in the shape functions. Experiments demonstrate the fidelity of the approximated Rashomon set and its effectiveness in solving practical challenges.
    A Comparative Study of Federated Learning Models for COVID-19 Detection. (arXiv:2303.16141v1 [cs.LG])
    Deep learning is effective in diagnosing COVID-19 and requires a large amount of data to be effectively trained. Due to data and privacy regulations, hospitals generally have no access to data from other hospitals. Federated learning (FL) has been used to solve this problem, where it utilizes a distributed setting to train models in hospitals in a privacy-preserving manner. Deploying FL is not always feasible as it requires high computation and network communication resources. This paper evaluates five FL algorithms' performance and resource efficiency for Covid-19 detection. A decentralized setting with CNN networks is set up, and the performance of FL algorithms is compared with a centralized environment. We examined the algorithms with varying numbers of participants, federated rounds, and selection algorithms. Our results show that cyclic weight transfer can have better overall performance, and results are better with fewer participating hospitals. Our results demonstrate good performance for detecting COVID-19 patients and might be useful in deploying FL algorithms for covid-19 detection and medical image analysis in general.
    Automatically Score Tissue Images Like a Pathologist by Transfer Learning. (arXiv:2209.05954v2 [cs.LG] UPDATED)
    Cancer is the second leading cause of death in the world. Diagnosing cancer early on can save many lives. Pathologists have to look at tissue microarray (TMA) images manually to identify tumors, which can be time-consuming, inconsistent and subjective. Existing algorithms that automatically detect tumors have either not achieved the accuracy level of a pathologist or require substantial human involvements. A major challenge is that TMA images with different shapes, sizes, and locations can have the same score. Learning staining patterns in TMA images requires a huge number of images, which are severely limited due to privacy concerns and regulations in medical organizations. TMA images from different cancer types may have common characteristics that could provide valuable information, but using them directly harms the accuracy. By selective transfer learning from multiple small auxiliary sets, the proposed algorithm is able to extract knowledge from tissue images showing a ``similar" scoring pattern but with different cancer types. Remarkably, transfer learning has made it possible for the algorithm to break the critical accuracy barrier -- the proposed algorithm reports an accuracy of 75.9% on breast cancer TMA images from the Stanford Tissue Microarray Database, achieving the 75\% accuracy level of pathologists. This will allow pathologists to confidently use automatic algorithms to assist them in recognizing tumors consistently with a higher accuracy in real time.
    On the Importance of Feature Separability in Predicting Out-Of-Distribution Error. (arXiv:2303.15488v1 [cs.LG])
    Estimating the generalization performance is practically challenging on out-of-distribution (OOD) data without ground truth labels. While previous methods emphasize the connection between distribution difference and OOD accuracy, we show that a large domain gap not necessarily leads to a low test accuracy. In this paper, we investigate this problem from the perspective of feature separability, and propose a dataset-level score based upon feature dispersion to estimate the test accuracy under distribution shift. Our method is inspired by desirable properties of features in representation learning: high inter-class dispersion and high intra-class compactness. Our analysis shows that inter-class dispersion is strongly correlated with the model accuracy, while intra-class compactness does not reflect the generalization performance on OOD data. Extensive experiments demonstrate the superiority of our method in both prediction performance and computational efficiency.  ( 2 min )
    Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures. (arXiv:2303.16100v1 [cs.LG])
    Executing machine learning inference tasks on resource-constrained edge devices requires careful hardware-software co-design optimizations. Recent examples have shown how transformer-based deep neural network models such as ALBERT can be used to enable the execution of natural language processing (NLP) inference on mobile systems-on-chip housing custom hardware accelerators. However, while these existing solutions are effective in alleviating the latency, energy, and area costs of running single NLP tasks, achieving multi-task inference requires running computations over multiple variants of the model parameters, which are tailored to each of the targeted tasks. This approach leads to either prohibitive on-chip memory requirements or paying the cost of off-chip memory access. This paper proposes adapter-ALBERT, an efficient model optimization for maximal data reuse across different tasks. The proposed model's performance and robustness to data compression methods are evaluated across several language tasks from the GLUE benchmark. Additionally, we demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator to extrapolate performance, power, and area improvements over the execution of a traditional ALBERT model on the same hardware platform.
    Dictionary Learning for the Almost-Linear Sparsity Regime. (arXiv:2210.10855v2 [cs.LG] UPDATED)
    Dictionary learning, the problem of recovering a sparsely used matrix $\mathbf{D} \in \mathbb{R}^{M \times K}$ and $N$ $s$-sparse vectors $\mathbf{x}_i \in \mathbb{R}^{K}$ from samples of the form $\mathbf{y}_i = \mathbf{D}\mathbf{x}_i$, is of increasing importance to applications in signal processing and data science. When the dictionary is known, recovery of $\mathbf{x}_i$ is possible even for sparsity linear in dimension $M$, yet to date, the only algorithms which provably succeed in the linear sparsity regime are Riemannian trust-region methods, which are limited to orthogonal dictionaries, and methods based on the sum-of-squares hierarchy, which requires super-polynomial time in order to obtain an error which decays in $M$. In this work, we introduce SPORADIC (SPectral ORAcle DICtionary Learning), an efficient spectral method on family of reweighted covariance matrices. We prove that in high enough dimensions, SPORADIC can recover overcomplete ($K > M$) dictionaries satisfying the well-known restricted isometry property (RIP) even when sparsity is linear in dimension up to logarithmic factors. Moreover, these accuracy guarantees have an ``oracle property" that the support and signs of the unknown sparse vectors $\mathbf{x}_i$ can be recovered exactly with high probability, allowing for arbitrarily close estimation of $\mathbf{D}$ with enough samples in polynomial time. To the author's knowledge, SPORADIC is the first polynomial-time algorithm which provably enjoys such convergence guarantees for overcomplete RIP matrices in the near-linear sparsity regime.
    Physics-guided adversarial networks for artificial digital image correlation data generation. (arXiv:2303.15939v1 [eess.IV])
    Digital image correlation (DIC) has become a valuable tool in the evaluation of mechanical experiments, particularly fatigue crack growth experiments. The evaluation requires accurate information of the crack path and crack tip position, which is difficult to obtain due to inherent noise and artefacts. Machine learning models have been extremely successful in recognizing this relevant information given labelled DIC displacement data. For the training of robust models, which generalize well, big data is needed. However, data is typically scarce in the field of material science and engineering because experiments are expensive and time-consuming. We present a method to generate synthetic DIC displacement data using generative adversarial networks with a physics-guided discriminator. To decide whether data samples are real or fake, this discriminator additionally receives the derived von Mises equivalent strain. We show that this physics-guided approach leads to improved results in terms of visual quality of samples, sliced Wasserstein distance, and geometry score.
    Enabling Inter-organizational Analytics in Business Networks Through Meta Machine Learning. (arXiv:2303.15834v1 [cs.LG])
    Successful analytics solutions that provide valuable insights often hinge on the connection of various data sources. While it is often feasible to generate larger data pools within organizations, the application of analytics within (inter-organizational) business networks is still severely constrained. As data is distributed across several legal units, potentially even across countries, the fear of disclosing sensitive information as well as the sheer volume of the data that would need to be exchanged are key inhibitors for the creation of effective system-wide solutions -- all while still reaching superior prediction performance. In this work, we propose a meta machine learning method that deals with these obstacles to enable comprehensive analyses within a business network. We follow a design science research approach and evaluate our method with respect to feasibility and performance in an industrial use case. First, we show that it is feasible to perform network-wide analyses that preserve data confidentiality as well as limit data transfer volume. Second, we demonstrate that our method outperforms a conventional isolated analysis and even gets close to a (hypothetical) scenario where all data could be shared within the network. Thus, we provide a fundamental contribution for making business networks more effective, as we remove a key obstacle to tap the huge potential of learning from data that is scattered throughout the network.  ( 2 min )
    Deep Selection: A Fully Supervised Camera Selection Network for Surgery Recordings. (arXiv:2303.15947v1 [cs.CV])
    Recording surgery in operating rooms is an essential task for education and evaluation of medical treatment. However, recording the desired targets, such as the surgery field, surgical tools, or doctor's hands, is difficult because the targets are heavily occluded during surgery. We use a recording system in which multiple cameras are embedded in the surgical lamp, and we assume that at least one camera is recording the target without occlusion at any given time. As the embedded cameras obtain multiple video sequences, we address the task of selecting the camera with the best view of the surgery. Unlike the conventional method, which selects the camera based on the area size of the surgery field, we propose a deep neural network that predicts the camera selection probability from multiple video sequences by learning the supervision of the expert annotation. We created a dataset in which six different types of plastic surgery are recorded, and we provided the annotation of camera switching. Our experiments show that our approach successfully switched between cameras and outperformed three baseline methods.  ( 2 min )
    EvoPrompting: Language Models for Code-Level Neural Architecture Search. (arXiv:2302.14838v1 [cs.NE] CROSS LISTED)
    Given the recent impressive accomplishments of language models (LMs) for code generation, we explore the use of LMs as adaptive mutation and crossover operators for an evolutionary neural architecture search (NAS) algorithm. While NAS still proves too difficult a task for LMs to succeed at solely through prompting, we find that the combination of evolutionary prompt engineering with soft prompt-tuning, a method we term EvoPrompting, consistently finds diverse and high performing models. We first demonstrate that EvoPrompting is effective on the computationally efficient MNIST-1D dataset, where EvoPrompting produces convolutional architecture variants that outperform both those designed by human experts and naive few-shot prompting in terms of accuracy and model size. We then apply our method to searching for graph neural networks on the CLRS Algorithmic Reasoning Benchmark, where EvoPrompting is able to design novel architectures that outperform current state-of-the-art models on 21 out of 30 algorithmic reasoning tasks while maintaining similar model size. EvoPrompting is successful at designing accurate and efficient neural network architectures across a variety of machine learning tasks, while also being general enough for easy adaptation to other tasks beyond neural network design.  ( 2 min )
    Thread Counting in Plain Weave for Old Paintings Using Semi-Supervised Regression Deep Learning Models. (arXiv:2303.15999v1 [cs.CV])
    In this work the authors develop regression approaches based on deep learning to perform thread density estimation for plain weave canvas analysis. Previous approaches were based on Fourier analysis, that are quite robust for some scenarios but fail in some other, in machine learning tools, that involve pre-labeling of the painting at hand, or the segmentation of thread crossing points, that provides good estimations in all scenarios with no need of pre-labeling. The segmentation approach is time-consuming as estimation of the densities is performed after locating the crossing points. In this novel proposal, we avoid this step by computing the density of threads directly from the image with a regression deep learning model. We also incorporate some improvements in the initial preprocessing of the input image with an impact on the final error. Several models are proposed and analyzed to retain the best one. Furthermore, we further reduce the density estimation error by introducing a semi-supervised approach. The performance of our novel algorithm is analyzed with works by Ribera, Vel\'azquez, and Poussin where we compare our results to the ones of previous approaches. Finally, the method is put into practice to support the change of authorship or a masterpiece at the Museo del Prado.
    Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis. (arXiv:2003.13898v3 [cs.CV] UPDATED)
    We propose a novel ECGAN for the challenging semantic image synthesis task. Although considerable improvement has been achieved, the quality of synthesized images is far from satisfactory due to three largely unresolved challenges. 1) The semantic labels do not provide detailed structural information, making it difficult to synthesize local details and structures. 2) The widely adopted CNN operations such as convolution, down-sampling, and normalization usually cause spatial resolution loss and thus cannot fully preserve the original semantic information, leading to semantically inconsistent results. 3) Existing semantic image synthesis methods focus on modeling local semantic information from a single input semantic layout. However, they ignore global semantic information of multiple input semantic layouts, i.e., semantic cross-relations between pixels across different input layouts. To tackle 1), we propose to use edge as an intermediate representation which is further adopted to guide image generation via a proposed attention guided edge transfer module. Edge information is produced by a convolutional generator and introduces detailed structure information. To tackle 2), we design an effective module to selectively highlight class-dependent feature maps according to the original semantic layout to preserve the semantic information. To tackle 3), inspired by current methods in contrastive learning, we propose a novel contrastive learning method, which aims to enforce pixel embeddings belonging to the same semantic class to generate more similar image content than those from different classes. Doing so can capture more semantic relations by explicitly exploring the structures of labeled pixels from multiple input semantic layouts. Experiments on three challenging datasets show that our ECGAN achieves significantly better results than state-of-the-art methods.
    Common Subexpression-based Compression and Multiplication of Sparse Constant Matrices. (arXiv:2303.16106v1 [cs.LG])
    In deep learning inference, model parameters are pruned and quantized to reduce the model size. Compression methods and common subexpression (CSE) elimination algorithms are applied on sparse constant matrices to deploy the models on low-cost embedded devices. However, the state-of-the-art CSE elimination methods do not scale well for handling large matrices. They reach hours for extracting CSEs in a $200 \times 200$ matrix while their matrix multiplication algorithms execute longer than the conventional matrix multiplication methods. Besides, there exist no compression methods for matrices utilizing CSEs. As a remedy to this problem, a random search-based algorithm is proposed in this paper to extract CSEs in the column pairs of a constant matrix. It produces an adder tree for a $1000 \times 1000$ matrix in a minute. To compress the adder tree, this paper presents a compression format by extending the Compressed Sparse Row (CSR) to include CSEs. While compression rates of more than $50\%$ can be achieved compared to the original CSR format, simulations for a single-core embedded system show that the matrix multiplication execution time can be reduced by $20\%$.  ( 2 min )
    Natural Selection Favors AIs over Humans. (arXiv:2303.16200v1 [cs.CY])
    For billions of years, evolution has been the driving force behind the development of life, including humans. Evolution endowed humans with high intelligence, which allowed us to become one of the most successful species on the planet. Today, humans aim to create artificial intelligence systems that surpass even our own intelligence. As artificial intelligences (AIs) evolve and eventually surpass us in all domains, how might evolution shape our relations with AIs? By analyzing the environment that is shaping the evolution of AIs, we argue that the most successful AI agents will likely have undesirable traits. Competitive pressures among corporations and militaries will give rise to AI agents that automate human roles, deceive others, and gain power. If such agents have intelligence that exceeds that of humans, this could lead to humanity losing control of its future. More abstractly, we argue that natural selection operates on systems that compete and vary, and that selfish species typically have an advantage over species that are altruistic to other species. This Darwinian logic could also apply to artificial agents, as agents may eventually be better able to persist into the future if they behave selfishly and pursue their own interests with little regard for humans, which could pose catastrophic risks. To counteract these risks and Darwinian forces, we consider interventions such as carefully designing AI agents' intrinsic motivations, introducing constraints on their actions, and institutions that encourage cooperation. These steps, or others that resolve the problems we pose, will be necessary in order to ensure the development of artificial intelligence is a positive one.  ( 2 min )
    FedREP: Towards Horizontal Federated Load Forecasting for Retail Energy Providers. (arXiv:2203.00219v2 [cs.DC] UPDATED)
    As Smart Meters are collecting and transmitting household energy consumption data to Retail Energy Providers (REP), the main challenge is to ensure the effective use of fine-grained consumer data while ensuring data privacy. In this manuscript, we tackle this challenge for energy load consumption forecasting in regards to REPs which is essential to energy demand management, load switching and infrastructure development. Specifically, we note that existing energy load forecasting is centralized, which are not scalable and most importantly, vulnerable to data privacy threats. Besides, REPs are individual market participants and liable to ensure the privacy of their own customers. To address this issue, we propose a novel horizontal privacy-preserving federated learning framework for REPs energy load forecasting, namely FedREP. We consider a federated learning system consisting of a control centre and multiple retailers by enabling multiple REPs to build a common, robust machine learning model without sharing data, thus addressing critical issues such as data privacy, data security and scalability. For forecasting, we use a state-of-the-art Long Short-Term Memory (LSTM) neural network due to its ability to learn long term sequences of observations and promises of higher accuracy with time-series data while solving the vanishing gradient problem. Finally, we conduct extensive data-driven experiments using a real energy consumption dataset. Experimental results demonstrate that our proposed federated learning framework can achieve sufficient performance in terms of MSE ranging between 0.3 to 0.4 and is relatively similar to that of a centralized approach while preserving privacy and improving scalability.  ( 3 min )
    A neural operator-based surrogate solver for free-form electromagnetic inverse design. (arXiv:2302.01934v2 [physics.comp-ph] UPDATED)
    Neural operators have emerged as a powerful tool for solving partial differential equations in the context of scientific machine learning. Here, we implement and train a modified Fourier neural operator as a surrogate solver for electromagnetic scattering problems and compare its data efficiency to existing methods. We further demonstrate its application to the gradient-based nanophotonic inverse design of free-form, fully three-dimensional electromagnetic scatterers, an area that has so far eluded the application of deep learning techniques.  ( 2 min )
    Energy Regularized RNNs for Solving Non-Stationary Bandit Problems. (arXiv:2303.06552v2 [cs.LG] UPDATED)
    We consider a Multi-Armed Bandit problem in which the rewards are non-stationary and are dependent on past actions and potentially on past contexts. At the heart of our method, we employ a recurrent neural network, which models these sequences. In order to balance between exploration and exploitation, we present an energy minimization term that prevents the neural network from becoming too confident in support of a certain action. This term provably limits the gap between the maximal and minimal probabilities assigned by the network. In a diverse set of experiments, we demonstrate that our method is at least as effective as methods suggested to solve the sub-problem of Rotting Bandits, and can solve intuitive extensions of various benchmark problems. We share our implementation at https://github.com/rotmanmi/Energy-Regularized-RNN.  ( 2 min )
    PDExplain: Contextual Modeling of PDEs in the Wild. (arXiv:2303.15827v1 [cs.LG])
    We propose an explainable method for solving Partial Differential Equations by using a contextual scheme called PDExplain. During the training phase, our method is fed with data collected from an operator-defined family of PDEs accompanied by the general form of this family. In the inference phase, a minimal sample collected from a phenomenon is provided, where the sample is related to the PDE family but not necessarily to the set of specific PDEs seen in the training phase. We show how our algorithm can predict the PDE solution for future timesteps. Moreover, our method provides an explainable form of the PDE, a trait that can assist in modelling phenomena based on data in physical sciences. To verify our method, we conduct extensive experimentation, examining its quality both in terms of prediction error and explainability.  ( 2 min )
    Information-Theoretic GAN Compression with Variational Energy-based Model. (arXiv:2303.16050v1 [cs.CV])
    We propose an information-theoretic knowledge distillation approach for the compression of generative adversarial networks, which aims to maximize the mutual information between teacher and student networks via a variational optimization based on an energy-based model. Because the direct computation of the mutual information in continuous domains is intractable, our approach alternatively optimizes the student network by maximizing the variational lower bound of the mutual information. To achieve a tight lower bound, we introduce an energy-based model relying on a deep neural network to represent a flexible variational distribution that deals with high-dimensional images and consider spatial dependencies between pixels, effectively. Since the proposed method is a generic optimization algorithm, it can be conveniently incorporated into arbitrary generative adversarial networks and even dense prediction networks, e.g., image enhancement models. We demonstrate that the proposed algorithm achieves outstanding performance in model compression of generative adversarial networks consistently when combined with several existing models.
    AtteSTNet -- An attention and subword tokenization based approach for code-switched text hate speech detection. (arXiv:2112.11479v3 [cs.CL] UPDATED)
    Recent advancements in technology have led to a boost in social media usage which has ultimately led to large amounts of user-generated data which also includes hateful and offensive speech. The language used in social media is often a combination of English and the native language in the region. In India, Hindi is used predominantly and is often code-switched with English, giving rise to the Hinglish (Hindi+English) language. Various approaches have been made in the past to classify the code-mixed Hinglish hate speech using different machine learning and deep learning-based techniques. However, these techniques make use of recurrence on convolution mechanisms which are computationally expensive and have high memory requirements. Past techniques also make use of complex data processing making the existing techniques very complex and non-sustainable to change in data. Proposed work gives a much simpler approach which is not only at par with these complex networks but also exceeds performance with the use of subword tokenization algorithms like BPE and Unigram, along with multi-head attention-based techniques, giving an accuracy of 87.41% and an F1 score of 0.851 on standard datasets. Efficient use of BPE and Unigram algorithms help handle the nonconventional Hinglish vocabulary making the proposed technique simple, efficient and sustainable to use in the real world.
    Learning to Generalize Provably in Learning to Optimize. (arXiv:2302.11085v2 [cs.LG] UPDATED)
    Learning to optimize (L2O) has gained increasing popularity, which automates the design of optimizers by data-driven approaches. However, current L2O methods often suffer from poor generalization performance in at least two folds: (i) applying the L2O-learned optimizer to unseen optimizees, in terms of lowering their loss function values (optimizer generalization, or ``generalizable learning of optimizers"); and (ii) the test performance of an optimizee (itself as a machine learning model), trained by the optimizer, in terms of the accuracy over unseen data (optimizee generalization, or ``learning to generalize"). While the optimizer generalization has been recently studied, the optimizee generalization (or learning to generalize) has not been rigorously studied in the L2O context, which is the aim of this paper. We first theoretically establish an implicit connection between the local entropy and the Hessian, and hence unify their roles in the handcrafted design of generalizable optimizers as equivalent metrics of the landscape flatness of loss functions. We then propose to incorporate these two metrics as flatness-aware regularizers into the L2O framework in order to meta-train optimizers to learn to generalize, and theoretically show that such generalization ability can be learned during the L2O meta-training process and then transformed to the optimizee loss function. Extensive experiments consistently validate the effectiveness of our proposals with substantially improved generalization on multiple sophisticated L2O models and diverse optimizees. Our code is available at: https://github.com/VITA-Group/Open-L2O/tree/main/Model_Free_L2O/L2O-Entropy.
    Solving Regularized Exp, Cosh and Sinh Regression Problems. (arXiv:2303.15725v1 [cs.LG])
    In modern machine learning, attention computation is a fundamental task for training large language models such as Transformer, GPT-4 and ChatGPT. In this work, we study exponential regression problem which is inspired by the softmax/exp unit in the attention mechanism in large language models. The standard exponential regression is non-convex. We study the regularization version of exponential regression problem which is a convex problem. We use approximate newton method to solve in input sparsity time. Formally, in this problem, one is given matrix $A \in \mathbb{R}^{n \times d}$, $b \in \mathbb{R}^n$, $w \in \mathbb{R}^n$ and any of functions $\exp, \cosh$ and $\sinh$ denoted as $f$. The goal is to find the optimal $x$ that minimize $ 0.5 \| f(Ax) - b \|_2^2 + 0.5 \| \mathrm{diag}(w) A x \|_2^2$. The straightforward method is to use the naive Newton's method. Let $\mathrm{nnz}(A)$ denote the number of non-zeros entries in matrix $A$. Let $\omega$ denote the exponent of matrix multiplication. Currently, $\omega \approx 2.373$. Let $\epsilon$ denote the accuracy error. In this paper, we make use of the input sparsity and purpose an algorithm that use $\log ( \|x_0 - x^*\|_2 / \epsilon)$ iterations and $\widetilde{O}(\mathrm{nnz}(A) + d^{\omega} )$ per iteration time to solve the problem.
    Data Augmentation techniques in time series domain: A survey and taxonomy. (arXiv:2206.13508v3 [cs.LG] UPDATED)
    With the latest advances in Deep Learning-based generative models, it has not taken long to take advantage of their remarkable performance in the area of time series. Deep neural networks used to work with time series heavily depend on the size and consistency of the datasets used in training. These features are not usually abundant in the real world, where they are usually limited and often have constraints that must be guaranteed. Therefore, an effective way to increase the amount of data is by using Data Augmentation techniques, either by adding noise or permutations and by generating new synthetic data. This work systematically reviews the current state-of-the-art in the area to provide an overview of all available algorithms and proposes a taxonomy of the most relevant research. The efficiency of the different variants will be evaluated as a central part of the process, as well as the different metrics to evaluate the performance and the main problems concerning each model will be analysed. The ultimate aim of this study is to provide a summary of the evolution and performance of areas that produce better results to guide future researchers in this field.
    Towards a User Privacy-Aware Mobile Gaming App Installation Prediction Model. (arXiv:2302.03332v2 [cs.LG] UPDATED)
    Over the past decade, programmatic advertising has received a great deal of attention in the online advertising industry. A real-time bidding (RTB) system is rapidly becoming the most popular method to buy and sell online advertising impressions. Within the RTB system, demand-side platforms (DSP) aim to spend advertisers' campaign budgets efficiently while maximizing profit, seeking impressions that result in high user responses, such as clicks or installs. In the current study, we investigate the process of predicting a mobile gaming app installation from the point of view of a particular DSP, while paying attention to user privacy, and exploring the trade-off between privacy preservation and model performance. There are multiple levels of potential threats to user privacy, depending on the privacy leaks associated with the data-sharing process, such as data transformation or de-anonymization. To address these concerns, privacy-preserving techniques were proposed, such as cryptographic approaches, for training privacy-aware machine-learning models. However, the ability to train a mobile gaming app installation prediction model without using user-level data, can prevent these threats and protect the users' privacy, even though the model's ability to predict may be impaired. Additionally, current laws might force companies to declare that they are collecting data, and might even give the user the option to opt out of such data collection, which might threaten companies' business models in digital advertising, which are dependent on the collection and use of user-level data. We conclude that privacy-aware models might still preserve significant capabilities, enabling companies to make better decisions, dependent on the privacy-efficacy trade-off utility function of each case.
    A Survey on Malware Detection with Graph Representation Learning. (arXiv:2303.16004v1 [cs.CR])
    Malware detection has become a major concern due to the increasing number and complexity of malware. Traditional detection methods based on signatures and heuristics are used for malware detection, but unfortunately, they suffer from poor generalization to unknown attacks and can be easily circumvented using obfuscation techniques. In recent years, Machine Learning (ML) and notably Deep Learning (DL) achieved impressive results in malware detection by learning useful representations from data and have become a solution preferred over traditional methods. More recently, the application of such techniques on graph-structured data has achieved state-of-the-art performance in various domains and demonstrates promising results in learning more robust representations from malware. Yet, no literature review focusing on graph-based deep learning for malware detection exists. In this survey, we provide an in-depth literature review to summarize and unify existing works under the common approaches and architectures. We notably demonstrate that Graph Neural Networks (GNNs) reach competitive results in learning robust embeddings from malware represented as expressive graph structures, leading to an efficient detection by downstream classifiers. This paper also reviews adversarial attacks that are utilized to fool graph-based detection methods. Challenges and future research directions are discussed at the end of the paper.
    Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes. (arXiv:2303.15846v1 [cs.CL])
    We investigate different natural language processing (NLP) approaches based on contextualised word representations for the problem of early prediction of lung cancer using free-text patient medical notes of Dutch primary care physicians. Because lung cancer has a low prevalence in primary care, we also address the problem of classification under highly imbalanced classes. Specifically, we use large Transformer-based pretrained language models (PLMs) and investigate: 1) how \textit{soft prompt-tuning} -- an NLP technique used to adapt PLMs using small amounts of training data -- compares to standard model fine-tuning; 2) whether simpler static word embedding models (WEMs) can be more robust compared to PLMs in highly imbalanced settings; and 3) how models fare when trained on notes from a small number of patients. We find that 1) soft-prompt tuning is an efficient alternative to standard model fine-tuning; 2) PLMs show better discrimination but worse calibration compared to simpler static word embedding models as the classification problem becomes more imbalanced; and 3) results when training models on small number of patients are mixed and show no clear differences between PLMs and WEMs. All our code is available open source in \url{https://bitbucket.org/aumc-kik/prompt_tuning_cancer_prediction/}.
    Stitchable Neural Networks. (arXiv:2302.06586v3 [cs.LG] UPDATED)
    The public model zoo containing enormous powerful pretrained model families (e.g., ResNet/DeiT) has reached an unprecedented scope than ever, which significantly contributes to the success of deep learning. As each model family consists of pretrained models with diverse scales (e.g., DeiT-Ti/S/B), it naturally arises a fundamental question of how to efficiently assemble these readily available models in a family for dynamic accuracy-efficiency trade-offs at runtime. To this end, we present Stitchable Neural Networks (SN-Net), a novel scalable and efficient framework for model deployment. It cheaply produces numerous networks with different complexity and performance trade-offs given a family of pretrained neural networks, which we call anchors. Specifically, SN-Net splits the anchors across the blocks/layers and then stitches them together with simple stitching layers to map the activations from one anchor to another. With only a few epochs of training, SN-Net effectively interpolates between the performance of anchors with varying scales. At runtime, SN-Net can instantly adapt to dynamic resource constraints by switching the stitching positions. Extensive experiments on ImageNet classification demonstrate that SN-Net can obtain on-par or even better performance than many individually trained networks while supporting diverse deployment scenarios. For example, by stitching Swin Transformers, we challenge hundreds of models in Timm model zoo with a single network. We believe this new elastic model framework can serve as a strong baseline for further research in wider communities.
    FINDE: Neural Differential Equations for Finding and Preserving Invariant Quantities. (arXiv:2210.00272v2 [cs.LG] UPDATED)
    Many real-world dynamical systems are associated with first integrals (a.k.a. invariant quantities), which are quantities that remain unchanged over time. The discovery and understanding of first integrals are fundamental and important topics both in the natural sciences and in industrial applications. First integrals arise from the conservation laws of system energy, momentum, and mass, and from constraints on states; these are typically related to specific geometric structures of the governing equations. Existing neural networks designed to ensure such first integrals have shown excellent accuracy in modeling from data. However, these models incorporate the underlying structures, and in most situations where neural networks learn unknown systems, these structures are also unknown. This limitation needs to be overcome for scientific discovery and modeling of unknown systems. To this end, we propose first integral-preserving neural differential equation (FINDE). By leveraging the projection method and the discrete gradient method, FINDE finds and preserves first integrals from data, even in the absence of prior knowledge about underlying structures. Experimental results demonstrate that FINDE can predict future states of target systems much longer and find various quantities consistent with well-known first integrals in a unified manner.
    Heat flux for semi-local machine-learning potentials. (arXiv:2303.14434v2 [cond-mat.mtrl-sci] UPDATED)
    The Green-Kubo (GK) method is a rigorous framework for heat transport simulations in materials. However, it requires an accurate description of the potential-energy surface and carefully converged statistics. Machine-learning potentials can achieve the accuracy of first-principles simulations while allowing to reach well beyond their simulation time and length scales at a fraction of the cost. In this paper, we explain how to apply the GK approach to the recent class of message-passing machine-learning potentials, which iteratively consider semi-local interactions beyond the initial interaction cutoff. We derive an adapted heat flux formulation that can be implemented using automatic differentiation without compromising computational efficiency. The approach is demonstrated and validated by calculating the thermal conductivity of zirconium dioxide across temperatures.
    CoGANPPIS: Coevolution-enhanced Global Attention Neural Network for Protein-Protein Interaction Site Prediction. (arXiv:2303.06945v2 [q-bio.QM] UPDATED)
    Protein-protein interactions are essential in biochemical processes. Accurate prediction of the protein-protein interaction sites (PPIs) deepens our understanding of biological mechanism and is crucial for new drug design. However, conventional experimental methods for PPIs prediction are costly and time-consuming so that many computational approaches, especially ML-based methods, have been developed recently. Although these approaches have achieved gratifying results, there are still two limitations: (1) Most models have excavated some useful input features, but failed to take coevolutionary features into account, which could provide clues for inter-residue relationships; (2) The attention-based models only allocate attention weights for neighboring residues, instead of doing it globally, neglecting that some residues being far away from the target residues might also matter. We propose a coevolution-enhanced global attention neural network, a sequence-based deep learning model for PPIs prediction, called CoGANPPIS. It utilizes three layers in parallel for feature extraction: (1) Local-level representation aggregation layer, which aggregates the neighboring residues' features; (2) Global-level representation learning layer, which employs a novel coevolution-enhanced global attention mechanism to allocate attention weights to all the residues on the same protein sequences; (3) Coevolutionary information learning layer, which applies CNN & pooling to coevolutionary information to obtain the coevolutionary profile representation. Then, the three outputs are concatenated and passed into several fully connected layers for the final prediction. Application on two benchmark datasets demonstrated a state-of-the-art performance of our model. The source code is publicly available at https://github.com/Slam1423/CoGANPPIS_source_code.
    Machine Learning in Orbit Estimation: a Survey. (arXiv:2207.08993v3 [astro-ph.EP] UPDATED)
    Since the late 1950s, when the first artificial satellite was launched, the number of Resident Space Objects has steadily increased. It is estimated that around one million objects larger than one cm are currently orbiting the Earth, with only thirty thousand larger than ten cm being tracked. To avert a chain reaction of collisions, known as Kessler Syndrome, it is essential to accurately track and predict debris and satellites' orbits. Current approximate physics-based methods have errors in the order of kilometers for seven-day predictions, which is insufficient when considering space debris, typically with less than one meter. This failure is usually due to uncertainty around the state of the space object at the beginning of the trajectory, forecasting errors in environmental conditions such as atmospheric drag, and unknown characteristics such as the mass or geometry of the space object. Operators can enhance Orbit Prediction accuracy by deriving unmeasured objects' characteristics and improving non-conservative forces' effects by leveraging data-driven techniques, such as Machine Learning. In this survey, we provide an overview of the work in applying Machine Learning for Orbit Determination, Orbit Prediction, and atmospheric density modeling.
    LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. (arXiv:2303.16199v1 [cs.CV])
    We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the input text tokens at higher transformer layers. Then, a zero-init attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With efficient training, LLaMA-Adapter generates high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Furthermore, our approach can be simply extended to multi-modal input, e.g., images, for image-conditioned LLaMA, which achieves superior reasoning capacity on ScienceQA. We release our code at https://github.com/ZrrSkywalker/LLaMA-Adapter.
    Geometry of Radial Basis Neural Networks for Safety Biased Approximation of Unsafe Regions. (arXiv:2210.05596v2 [eess.SY] UPDATED)
    Barrier function-based inequality constraints are a means to enforce safety specifications for control systems. When used in conjunction with a convex optimization program, they provide a computationally efficient method to enforce safety for the general class of control-affine systems. One of the main assumptions when taking this approach is the a priori knowledge of the barrier function itself, i.e., knowledge of the safe set. In the context of navigation through unknown environments where the locally safe set evolves with time, such knowledge does not exist. This manuscript focuses on the synthesis of a zeroing barrier function characterizing the safe set based on safe and unsafe sample measurements, e.g., from perception data in navigation applications. Prior work formulated a supervised machine learning algorithm whose solution guaranteed the construction of a zeroing barrier function with specific level-set properties. However, it did not explore the geometry of the neural network design used for the synthesis process. This manuscript describes the specific geometry of the neural network used for zeroing barrier function synthesis, and shows how the network provides the necessary representation for splitting the state space into safe and unsafe regions.
    Iterative label cleaning for transductive and semi-supervised few-shot learning. (arXiv:2012.07962v3 [cs.LG] UPDATED)
    Few-shot learning amounts to learning representations and acquiring knowledge such that novel tasks may be solved with both supervision and data being limited. Improved performance is possible by transductive inference, where the entire test set is available concurrently, and semi-supervised learning, where more unlabeled data is available. Focusing on these two settings, we introduce a new algorithm that leverages the manifold structure of the labeled and unlabeled data distribution to predict pseudo-labels, while balancing over classes and using the loss value distribution of a limited-capacity classifier to select the cleanest labels, iteratively improving the quality of pseudo-labels. Our solution surpasses or matches the state of the art results on four benchmark datasets, namely miniImageNet, tieredImageNet, CUB and CIFAR-FS, while being robust over feature space pre-processing and the quantity of available data. The publicly available source code can be found in https://github.com/MichalisLazarou/iLPC.
    Knowledge Enhanced Graph Neural Networks. (arXiv:2303.15487v1 [cs.AI])
    Graph data is omnipresent and has a large variety of applications such as natural science, social networks or semantic web. Though rich in information, graphs are often noisy and incomplete. Therefore, graph completion tasks such as node classification or link prediction have gained attention. On the one hand, neural methods such as graph neural networks have proven to be robust tools for learning rich representations of noisy graphs. On the other hand, symbolic methods enable exact reasoning on graphs. We propose KeGNN, a neuro-symbolic framework for learning on graph data that combines both paradigms and allows for the integration of prior knowledge into a graph neural network model. In essence, KeGNN consists of a graph neural network as a base on which knowledge enhancement layers are stacked with the objective of refining predictions with respect to prior knowledge. We instantiate KeGNN in conjunction with two standard graph neural networks: Graph Convolutional Networks and Graph Attention Networks, and evaluate KeGNN on multiple benchmark datasets for node classification.
    HD-Bind: Encoding of Molecular Structure with Low Precision, Hyperdimensional Binary Representations. (arXiv:2303.15604v1 [q-bio.BM])
    Publicly available collections of drug-like molecules have grown to comprise 10s of billions of possibilities in recent history due to advances in chemical synthesis. Traditional methods for identifying ``hit'' molecules from a large collection of potential drug-like candidates have relied on biophysical theory to compute approximations to the Gibbs free energy of the binding interaction between the drug to its protein target. A major drawback of the approaches is that they require exceptional computing capabilities to consider for even relatively small collections of molecules. Hyperdimensional Computing (HDC) is a recently proposed learning paradigm that is able to leverage low-precision binary vector arithmetic to build efficient representations of the data that can be obtained without the need for gradient-based optimization approaches that are required in many conventional machine learning and deep learning approaches. This algorithmic simplicity allows for acceleration in hardware that has been previously demonstrated for a range of application areas. We consider existing HDC approaches for molecular property classification and introduce two novel encoding algorithms that leverage the extended connectivity fingerprint (ECFP) algorithm. We show that HDC-based inference methods are as much as 90 times more efficient than more complex representative machine learning methods and achieve an acceleration of nearly 9 orders of magnitude as compared to inference with molecular docking. We demonstrate multiple approaches for the encoding of molecular data for HDC and examine their relative performance on a range of challenging molecular property prediction and drug-protein binding classification tasks. Our work thus motivates further investigation into molecular representation learning to develop ultra-efficient pre-screening tools.
    Explicit Planning Helps Language Models in Logical Reasoning. (arXiv:2303.15714v1 [cs.CL])
    Language models have been shown to perform remarkably well on a wide range of natural language processing tasks. In this paper, we propose a novel system that uses language models to perform multi-step logical reasoning. Our system incorporates explicit planning into its inference procedure, thus able to make more informed reasoning decisions at each step by looking ahead into their future effects. In our experiments, our full system significantly outperforms other competing systems. On a multiple-choice question answering task, our system performs competitively compared to GPT-3-davinci despite having only around 1.5B parameters. We conduct several ablation studies to demonstrate that explicit planning plays a crucial role in the system's performance.
    VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs. (arXiv:2303.15893v1 [cs.CV])
    We introduce VIVE3D, a novel approach that extends the capabilities of image-based 3D GANs to video editing and is able to represent the input video in an identity-preserving and temporally consistent way. We propose two new building blocks. First, we introduce a novel GAN inversion technique specifically tailored to 3D GANs by jointly embedding multiple frames and optimizing for the camera parameters. Second, besides traditional semantic face edits (e.g. for age and expression), we are the first to demonstrate edits that show novel views of the head enabled by the inherent properties of 3D GANs and our optical flow-guided compositing technique to combine the head with the background video. Our experiments demonstrate that VIVE3D generates high-fidelity face edits at consistent quality from a range of camera viewpoints which are composited with the original video in a temporally and spatially consistent manner.
    Complementary Domain Adaptation and Generalization for Unsupervised Continual Domain Shift Learning. (arXiv:2303.15833v1 [cs.LG])
    Continual domain shift poses a significant challenge in real-world applications, particularly in situations where labeled data is not available for new domains. The challenge of acquiring knowledge in this problem setting is referred to as unsupervised continual domain shift learning. Existing methods for domain adaptation and generalization have limitations in addressing this issue, as they focus either on adapting to a specific domain or generalizing to unseen domains, but not both. In this paper, we propose Complementary Domain Adaptation and Generalization (CoDAG), a simple yet effective learning framework that combines domain adaptation and generalization in a complementary manner to achieve three major goals of unsupervised continual domain shift learning: adapting to a current domain, generalizing to unseen domains, and preventing forgetting of previously seen domains. Our approach is model-agnostic, meaning that it is compatible with any existing domain adaptation and generalization algorithms. We evaluate CoDAG on several benchmark datasets and demonstrate that our model outperforms state-of-the-art models in all datasets and evaluation metrics, highlighting its effectiveness and robustness in handling unsupervised continual domain shift learning.
    Machine learning tools to improve nonlinear modeling parameters of RC columns. (arXiv:2303.16140v1 [cs.LG])
    Modeling parameters are essential to the fidelity of nonlinear models of concrete structures subjected to earthquake ground motions, especially when simulating seismic events strong enough to cause collapse. This paper addresses two of the most significant barriers to improving nonlinear modeling provisions in seismic evaluation standards using experimental data sets: identifying the most likely mode of failure of structural components, and implementing data fitting techniques capable of recognizing interdependencies between input parameters and nonlinear relationships between input parameters and model outputs. Machine learning tools in the Scikit-learn and Pytorch libraries were used to calibrate equations and black-box numerical models for nonlinear modeling parameters (MP) a and b of reinforced concrete columns defined in the ASCE 41 and ACI 369.1 standards, and to estimate their most likely mode of failure. It was found that machine learning regression models and machine learning black-boxes were more accurate than current provisions in the ACI 369.1/ASCE 41 Standards. Among the regression models, Regularized Linear Regression was the most accurate for estimating MP a, and Polynomial Regression was the most accurate for estimating MP b. The two black-box models evaluated, namely the Gaussian Process Regression and the Neural Network (NN), provided the most accurate estimates of MPs a and b. The NN model was the most accurate machine learning tool of all evaluated. A multi-class classification tool from the Scikit-learn machine learning library correctly identified column mode of failure with 79% accuracy for rectangular columns and with 81% accuracy for circular columns, a substantial improvement over the classification rules in ASCE 41-13.
    Distributed Graph Embedding with Information-Oriented Random Walks. (arXiv:2303.15702v1 [cs.DC])
    Graph embedding maps graph nodes to low-dimensional vectors, and is widely adopted in machine learning tasks. The increasing availability of billion-edge graphs underscores the importance of learning efficient and effective embeddings on large graphs, such as link prediction on Twitter with over one billion edges. Most existing graph embedding methods fall short of reaching high data scalability. In this paper, we present a general-purpose, distributed, information-centric random walk-based graph embedding framework, DistGER, which can scale to embed billion-edge graphs. DistGER incrementally computes information-centric random walks. It further leverages a multi-proximity-aware, streaming, parallel graph partitioning strategy, simultaneously achieving high local partition quality and excellent workload balancing across machines. DistGER also improves the distributed Skip-Gram learning model to generate node embeddings by optimizing the access locality, CPU throughput, and synchronization efficiency. Experiments on real-world graphs demonstrate that compared to state-of-the-art distributed graph embedding frameworks, including KnightKing, DistDGL, and Pytorch-BigGraph, DistGER exhibits 2.33x-129x acceleration, 45% reduction in cross-machines communication, and > 10% effectiveness improvement in downstream tasks.
    Plasticity Neural Network Based on Astrocytic effects at Critical Period, Synaptic Competition and Strength Rebalance by Current and Mnemonic Brain Plasticity and Synapse Formation. (arXiv:2203.11740v12 [cs.NE] UPDATED)
    In addition to the weights of synaptic shared connections, PNN includes weights of synaptic effective ranges [14-25]. PNN considers synaptic strength balance in dynamic of phagocytosing of synapses and static of constant sum of synapses length [14], and includes the lead behavior of the school of fish. Synapse formation will inhibit dendrites generation in experiments and simulations [15]. The memory persistence gradient of retrograde circuit similar to the Enforcing Resilience in a Spring Boot. The relatively good and inferior gradient information stored in memory engram cells in synapse formation of retrograde circuit like the folds in brain [16]. The controversy was claimed if human hippocampal neurogenesis persists throughout aging, may have a new and longer circuit in late iteration [17,18]. Closing the critical period will cause neurological disorder in experiments and simulations [19]. Considering both negative and positive memories persistence help activate synapse better than only considering positive memory [20]. Astrocytic phagocytosis will avoid the local accumulation of synapses by simulation, lack of astrocytic phagocytosis causes excitatory synapses and functionally impaired synapses accumulate in experiments and lead to destruction of cognition, but local longer synapses and worse results in simulations [21]. It gives relationship of intelligence and cortical thickness, individual differences [22]. The PNN considered the memory engram cells that strengthened Synapses [23]. The effects of PNN's memory structure and tPBM may be the same for powerful penetrability of signals [24].And extracted relatively good or inferior memories both have exponential decay by non-classical quantum computing[25]. Memory persistence inhibit local synaptic accumulation. It may introduce the relatively good and inferior solution in PSO. The simple PNN only has the synaptic phagocytosis.
    CREATED: Generating Viable Counterfactual Sequences for Predictive Process Analytics. (arXiv:2303.15844v1 [cs.LG])
    Predictive process analytics focuses on predicting future states, such as the outcome of running process instances. These techniques often use machine learning models or deep learning models (such as LSTM) to make such predictions. However, these deep models are complex and difficult for users to understand. Counterfactuals answer ``what-if'' questions, which are used to understand the reasoning behind the predictions. For example, what if instead of emailing customers, customers are being called? Would this alternative lead to a different outcome? Current methods to generate counterfactual sequences either do not take the process behavior into account, leading to generating invalid or infeasible counterfactual process instances, or heavily rely on domain knowledge. In this work, we propose a general framework that uses evolutionary methods to generate counterfactual sequences. Our framework does not require domain knowledge. Instead, we propose to train a Markov model to compute the feasibility of generated counterfactual sequences and adapt three other measures (delta in outcome prediction, similarity, and sparsity) to ensure their overall viability. The evaluation shows that we generate viable counterfactual sequences, outperform baseline methods in viability, and yield similar results when compared to the state-of-the-art method that requires domain knowledge.
    Item Graph Convolution Collaborative Filtering for Inductive Recommendations. (arXiv:2303.15946v1 [cs.IR])
    Graph Convolutional Networks (GCN) have been recently employed as core component in the construction of recommender system algorithms, interpreting user-item interactions as the edges of a bipartite graph. However, in the absence of side information, the majority of existing models adopt an approach of randomly initialising the user embeddings and optimising them throughout the training process. This strategy makes these algorithms inherently transductive, curtailing their ability to generate predictions for users that were unseen at training time. To address this issue, we propose a convolution-based algorithm, which is inductive from the user perspective, while at the same time, depending only on implicit user-item interaction data. We propose the construction of an item-item graph through a weighted projection of the bipartite interaction network and to employ convolution to inject higher order associations into item embeddings, while constructing user representations as weighted sums of the items with which they have interacted. Despite not training individual embeddings for each user our approach achieves state of-the-art recommendation performance with respect to transductive baselines on four real-world datasets, showing at the same time robust inductive performance.
    From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers. (arXiv:2107.07999v8 [cs.LG] UPDATED)
    In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way. We show that recent results on linear causal attention (Choromanski et al., 2021) and log-linear RPE-attention (Luo et al., 2021) are special cases of this general mechanism. However by casting the problem as a topological (graph-based) modulation of unmasked attention, we obtain several results unknown before, including efficient d-dimensional RPE-masking and graph-kernel masking. We leverage many mathematical techniques ranging from spectral analysis through dynamic programming and random walks to new algorithms for solving Markov processes on graphs. We provide a corresponding empirical evaluation.
    Hyperbolic Geometry in Computer Vision: A Novel Framework for Convolutional Neural Networks. (arXiv:2303.15919v1 [cs.CV])
    Real-world visual data exhibit intrinsic hierarchical structures that can be represented effectively in hyperbolic spaces. Hyperbolic neural networks (HNNs) are a promising approach for learning feature representations in such spaces. However, current methods in computer vision rely on Euclidean backbones and only project features to the hyperbolic space in the task heads, limiting their ability to fully leverage the benefits of hyperbolic geometry. To address this, we present HCNN, the first fully hyperbolic convolutional neural network (CNN) designed for computer vision tasks. Based on the Lorentz model, we generalize fundamental components of CNNs and propose novel formulations of the convolutional layer, batch normalization, and multinomial logistic regression (MLR). Experimentation on standard vision tasks demonstrates the effectiveness of our HCNN framework and the Lorentz model in both hybrid and fully hyperbolic settings. Overall, we aim to pave the way for future research in hyperbolic computer vision by offering a new paradigm for interpreting and analyzing visual data. Our code is publicly available at https://github.com/kschwethelm/HyperbolicCV.
    Trainable Projected Gradient Method for Robust Fine-tuning. (arXiv:2303.10720v2 [cs.CV] UPDATED)
    Recent studies on transfer learning have shown that selectively fine-tuning a subset of layers or customizing different learning rates for each layer can greatly improve robustness to out-of-distribution (OOD) data and retain generalization capability in the pre-trained models. However, most of these methods employ manually crafted heuristics or expensive hyper-parameter searches, which prevent them from scaling up to large datasets and neural networks. To solve this problem, we propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization. This is motivated by formulating fine-tuning as a bi-level constrained optimization problem. Specifically, TPGM maintains a set of projection radii, i.e., distance constraints between the fine-tuned model and the pre-trained model, for each layer, and enforces them through weight projections. To learn the constraints, we propose a bi-level optimization to automatically learn the best set of projection radii in an end-to-end manner. Theoretically, we show that the bi-level optimization formulation could explain the regularization capability of TPGM. Empirically, with little hyper-parameter search cost, TPGM outperforms existing fine-tuning methods in OOD performance while matching the best in-distribution (ID) performance. For example, when fine-tuned on DomainNet-Real and ImageNet, compared to vanilla fine-tuning, TPGM shows $22\%$ and $10\%$ relative OOD improvement respectively on their sketch counterparts. Code is available at \url{https://github.com/PotatoTian/TPGM}.
    VIDIMU. Multimodal video and IMU kinematic dataset on daily life activities using affordable devices. (arXiv:2303.16150v1 [cs.CV])
    Human activity recognition and clinical biomechanics are challenging problems in physical telerehabilitation medicine. However, most publicly available datasets on human body movements cannot be used to study both problems in an out-of-the-lab movement acquisition setting. The objective of the VIDIMU dataset is to pave the way towards affordable patient tracking solutions for remote daily life activities recognition and kinematic analysis. The dataset includes 13 activities registered using a commodity camera and five inertial sensors. The video recordings were acquired in 54 subjects, of which 16 also had simultaneous recordings of inertial sensors. The novelty of VIDIMU lies in: i) the clinical relevance of the chosen movements, ii) the combined utilization of affordable video and custom sensors, and iii) the implementation of state-of-the-art tools for multimodal data processing of 3D body pose tracking and motion reconstruction in a musculoskeletal model from inertial data. The validation confirms that a minimally disturbing acquisition protocol, performed according to real-life conditions can provide a comprehensive picture of human joint angles during daily life activities.
    BC-IRL: Learning Generalizable Reward Functions from Demonstrations. (arXiv:2303.16194v1 [cs.LG])
    How well do reward functions learned with inverse reinforcement learning (IRL) generalize? We illustrate that state-of-the-art IRL algorithms, which maximize a maximum-entropy objective, learn rewards that overfit to the demonstrations. Such rewards struggle to provide meaningful rewards for states not covered by the demonstrations, a major detriment when using the reward to learn policies in new situations. We introduce BC-IRL a new inverse reinforcement learning method that learns reward functions that generalize better when compared to maximum-entropy IRL approaches. In contrast to the MaxEnt framework, which learns to maximize rewards around demonstrations, BC-IRL updates reward parameters such that the policy trained with the new reward matches the expert demonstrations better. We show that BC-IRL learns rewards that generalize better on an illustrative simple task and two continuous robotic control tasks, achieving over twice the success rate of baselines in challenging generalization settings.
    Cerebral Palsy Prediction with Frequency Attention Informed Graph Convolutional Networks. (arXiv:2204.10997v2 [cs.CV] UPDATED)
    Early diagnosis and intervention are clinically considered the paramount part of treating cerebral palsy (CP), so it is essential to design an efficient and interpretable automatic prediction system for CP. We highlight a significant difference between CP infants' frequency of human movement and that of the healthy group, which improves prediction performance. However, the existing deep learning-based methods did not use the frequency information of infants' movement for CP prediction. This paper proposes a frequency attention informed graph convolutional network and validates it on two consumer-grade RGB video datasets, namely MINI-RGBD and RVI-38 datasets. Our proposed frequency attention module aids in improving both classification performance and system interpretability. In addition, we design a frequency-binning method that retains the critical frequency of the human joint position data while filtering the noise. Our prediction performance achieves state-of-the-art research on both datasets. Our work demonstrates the effectiveness of frequency information in supporting the prediction of CP non-intrusively and provides a way for supporting the early diagnosis of CP in the resource-limited regions where the clinical resources are not abundant.
    Nonlinear classifiers for ranking problems based on kernelized SVM. (arXiv:2002.11436v2 [cs.LG] UPDATED)
    Many classification problems focus on maximizing the performance only on the samples with the highest relevance instead of all samples. As an example, we can mention ranking problems, accuracy at the top or search engines where only the top few queries matter. In our previous work, we derived a general framework including several classes of these linear classification problems. In this paper, we extend the framework to nonlinear classifiers. Utilizing a similarity to SVM, we dualize the problems, add kernels and propose a componentwise dual ascent method.
    Multi-view information fusion using multi-view variational autoencoders to predict proximal femoral strength. (arXiv:2210.00674v2 [cs.LG] UPDATED)
    The aim of this paper is to design a deep learning-based model to predict proximal femoral strength using multi-view information fusion. Method: We developed new models using multi-view variational autoencoder (MVAE) for feature representation learning and a product of expert (PoE) model for multi-view information fusion. We applied the proposed models to an in-house Louisiana Osteoporosis Study (LOS) cohort with 931 male subjects, including 345 African Americans and 586 Caucasians. With an analytical solution of the product of Gaussian distribution, we adopted variational inference to train the designed MVAE-PoE model to perform common latent feature extraction. We performed genome-wide association studies (GWAS) to select 256 genetic variants with the lowest p-values for each proximal femoral strength and integrated whole genome sequence (WGS) features and DXA-derived imaging features to predict proximal femoral strength. Results: The best prediction model for fall fracture load was acquired by integrating WGS features and DXA-derived imaging features. The designed models achieved the mean absolute percentage error of 18.04%, 6.84% and 7.95% for predicting proximal femoral fracture loads using linear models of fall loading, nonlinear models of fall loading, and nonlinear models of stance loading, respectively. Compared to existing multi-view information fusion methods, the proposed MVAE-PoE achieved the best performance. Conclusion: The proposed models are capable of predicting proximal femoral strength using WGS features and DXA-derived imaging features. Though this tool is not a substitute for FEA using QCT images, it would make improved assessment of hip fracture risk more widely available while avoiding the increased radiation dosage and clinical costs from QCT.
    DSCA: A Dual-Stream Network with Cross-Attention on Whole-Slide Image Pyramids for Cancer Prognosis. (arXiv:2206.05782v4 [eess.IV] UPDATED)
    The cancer prognosis on gigapixel Whole-Slide Images (WSIs) has always been a challenging task. To further enhance WSI visual representations, existing methods have explored image pyramids, instead of single-resolution images, in WSIs. In spite of this, they still face two major problems: high computational cost and the unnoticed semantical gap in multi-resolution feature fusion. To tackle these problems, this paper proposes to efficiently exploit WSI pyramids from a new perspective, the dual-stream network with cross-attention (DSCA). Our key idea is to utilize two sub-streams to process the WSI patches with two resolutions, where a square pooling is devised in a high-resolution stream to significantly reduce computational costs, and a cross-attention-based method is proposed to properly handle the fusion of dual-stream features. We validate our DSCA on three publicly-available datasets with a total number of 3,101 WSIs from 1,911 patients. Our experiments and ablation studies verify that (i) the proposed DSCA could outperform existing state-of-the-art methods in cancer prognosis, by an average C-Index improvement of around 4.6%; (ii) our DSCA network is more efficient in computation -- it has more learnable parameters (6.31M vs. 860.18K) but less computational costs (2.51G vs. 4.94G), compared to a typical existing multi-resolution network. (iii) the key components of DSCA, dual-stream and cross-attention, indeed contribute to our model's performance, gaining an average C-Index rise of around 2.0% while maintaining a relatively-small computational load. Our DSCA could serve as an alternative and effective tool for WSI-based cancer prognosis.
    Predicting Thermoelectric Power Factor of Bismuth Telluride During Laser Powder Bed Fusion Additive Manufacturing. (arXiv:2303.15663v1 [cs.LG])
    An additive manufacturing (AM) process, like laser powder bed fusion, allows for the fabrication of objects by spreading and melting powder in layers until a freeform part shape is created. In order to improve the properties of the material involved in the AM process, it is important to predict the material characterization property as a function of the processing conditions. In thermoelectric materials, the power factor is a measure of how efficiently the material can convert heat to electricity. While earlier works have predicted the material characterization properties of different thermoelectric materials using various techniques, implementation of machine learning models to predict the power factor of bismuth telluride (Bi2Te3) during the AM process has not been explored. This is important as Bi2Te3 is a standard material for low temperature applications. Thus, we used data about manufacturing processing parameters involved and in-situ sensor monitoring data collected during AM of Bi2Te3, to train different machine learning models in order to predict its thermoelectric power factor. We implemented supervised machine learning techniques using 80% training and 20% test data and further used the permutation feature importance method to identify important processing parameters and in-situ sensor features which were best at predicting power factor of the material. Ensemble-based methods like random forest, AdaBoost classifier, and bagging classifier performed the best in predicting power factor with the highest accuracy of 90% achieved by the bagging classifier model. Additionally, we found the top 15 processing parameters and in-situ sensor features to characterize the material manufacturing property like power factor. These features could further be optimized to maximize power factor of the thermoelectric material and improve the quality of the products built using this material.
    Cluster-Guided Unsupervised Domain Adaptation for Deep Speaker Embedding. (arXiv:2303.15944v1 [cs.LG])
    Recent studies have shown that pseudo labels can contribute to unsupervised domain adaptation (UDA) for speaker verification. Inspired by the self-training strategies that use an existing classifier to label the unlabeled data for retraining, we propose a cluster-guided UDA framework that labels the target domain data by clustering and combines the labeled source domain data and pseudo-labeled target domain data to train a speaker embedding network. To improve the cluster quality, we train a speaker embedding network dedicated for clustering by minimizing the contrastive center loss. The goal is to reduce the distance between an embedding and its assigned cluster center while enlarging the distance between the embedding and the other cluster centers. Using VoxCeleb2 as the source domain and CN-Celeb1 as the target domain, we demonstrate that the proposed method can achieve an equal error rate (EER) of 8.10% on the CN-Celeb1 evaluation set without using any labels from the target domain. This result outperforms the supervised baseline by 39.6% and is the state-of-the-art UDA performance on this corpus.
    Explaining Exchange Rate Forecasts with Macroeconomic Fundamentals Using Interpretive Machine Learning. (arXiv:2303.16149v1 [q-fin.ST])
    The complexity and ambiguity of financial and economic systems, along with frequent changes in the economic environment, have made it difficult to make precise predictions that are supported by theory-consistent explanations. Interpreting the prediction models used for forecasting important macroeconomic indicators is highly valuable for understanding relations among different factors, increasing trust towards the prediction models, and making predictions more actionable. In this study, we develop a fundamental-based model for the Canadian-U.S. dollar exchange rate within an interpretative framework. We propose a comprehensive approach using machine learning to predict the exchange rate and employ interpretability methods to accurately analyze the relationships among macroeconomic variables. Moreover, we implement an ablation study based on the output of the interpretations to improve the predictive accuracy of the models. Our empirical results show that crude oil, as Canada's main commodity export, is the leading factor that determines the exchange rate dynamics with time-varying effects. The changes in the sign and magnitude of the contributions of crude oil to the exchange rate are consistent with significant events in the commodity and energy markets and the evolution of the crude oil trend in Canada. Gold and the TSX stock index are found to be the second and third most important variables that influence the exchange rate. Accordingly, this analysis provides trustworthy and practical insights for policymakers and economists and accurate knowledge about the predictive model's decisions, which are supported by theoretical considerations.
    Forecasting localized weather impacts on vegetation as seen from space with meteo-guided video prediction. (arXiv:2303.16198v1 [cs.CV])
    We present a novel approach for modeling vegetation response to weather in Europe as measured by the Sentinel 2 satellite. Existing satellite imagery forecasting approaches focus on photorealistic quality of the multispectral images, while derived vegetation dynamics have not yet received as much attention. We leverage both spatial and temporal context by extending state-of-the-art video prediction methods with weather guidance. We extend the EarthNet2021 dataset to be suitable for vegetation modeling by introducing a learned cloud mask and an appropriate evaluation scheme. Qualitative and quantitative experiments demonstrate superior performance of our approach over a wide variety of baseline methods, including leading approaches to satellite imagery forecasting. Additionally, we show how our modeled vegetation dynamics can be leveraged in a downstream task: inferring gross primary productivity for carbon monitoring. To the best of our knowledge, this work presents the first models for continental-scale vegetation modeling at fine resolution able to capture anomalies beyond the seasonal cycle, thereby paving the way for predictive assessments of vegetation status.
    Efficient Alternating Minimization Solvers for Wyner Multi-View Unsupervised Learning. (arXiv:2303.15866v1 [cs.IT])
    In this work, we adopt Wyner common information framework for unsupervised multi-view representation learning. Within this framework, we propose two novel formulations that enable the development of computational efficient solvers based on the alternating minimization principle. The first formulation, referred to as the {\em variational form}, enjoys a linearly growing complexity with the number of views and is based on a variational-inference tight surrogate bound coupled with a Lagrangian optimization objective function. The second formulation, i.e., the {\em representational form}, is shown to include known results as special cases. Here, we develop a tailored version from the alternating direction method of multipliers (ADMM) algorithm for solving the resulting non-convex optimization problem. In the two cases, the convergence of the proposed solvers is established in certain relevant regimes. Furthermore, our empirical results demonstrate the effectiveness of the proposed methods as compared with the state-of-the-art solvers. In a nutshell, the proposed solvers offer computational efficiency, theoretical convergence guarantees, scalable complexity with the number of views, and exceptional accuracy as compared with the state-of-the-art techniques. Our focus here is devoted to the discrete case and our results for continuous distributions are reported elsewhere.
    Quantifying Spatial Under-reporting Disparities in Resident Crowdsourcing. (arXiv:2204.08620v2 [stat.AP] UPDATED)
    Modern city governance relies heavily on crowdsourcing ("co-production") to identify problems such as downed trees and power lines. A major concern is that residents do not report problems at the same rates, with reporting heterogeneity directly translating to downstream disparities in how quickly incidents can be addressed. Measuring such under-reporting is a difficult statistical task, as, by definition, we do not observe incidents that are not reported or when reported incidents first occurred. Thus, low reporting rates and low ground-truth incident rates cannot be naively distinguished. We develop a method to identify (heterogeneous) reporting rates, without using external ground truth data. Our insight is that rates on $\textit{duplicate}$ reports about the same incident can be leveraged to disambiguate whether an incident has occurred with its reporting rate once it has occurred. Using this idea, we reduce the question to a standard Poisson rate estimation task -- even though the full incident reporting interval is also unobserved. We apply our method to over 100,000 resident reports made to the New York City Department of Parks and Recreation and to over 900,000 reports made to the Chicago Department of Transportation and Department of Water Management, finding that there are substantial spatial disparities in reporting rates even after controlling for incident characteristics -- some neighborhoods report three times as quickly as do others. These spatial disparities correspond to socio-economic characteristics: in NYC, higher population density, fraction of people with college degrees, income, and fraction of population that is White all positively correlate with reporting rates.
    Task Phasing: Automated Curriculum Learning from Demonstrations. (arXiv:2210.10999v2 [cs.LG] UPDATED)
    Applying reinforcement learning (RL) to sparse reward domains is notoriously challenging due to insufficient guiding signals. Common RL techniques for addressing such domains include (1) learning from demonstrations and (2) curriculum learning. While these two approaches have been studied in detail, they have rarely been considered together. This paper aims to do so by introducing a principled task phasing approach that uses demonstrations to automatically generate a curriculum sequence. Using inverse RL from (suboptimal) demonstrations we define a simple initial task. Our task phasing approach then provides a framework to gradually increase the complexity of the task all the way to the target task, while retuning the RL agent in each phasing iteration. Two approaches for phasing are considered: (1) gradually increasing the proportion of time steps an RL agent is in control, and (2) phasing out a guiding informative reward function. We present conditions that guarantee the convergence of these approaches to an optimal policy. Experimental results on 3 sparse reward domains demonstrate that our task phasing approaches outperform state-of-the-art approaches with respect to asymptotic performance.
    Do Neural Topic Models Really Need Dropout? Analysis of the Effect of Dropout in Topic Modeling. (arXiv:2303.15973v1 [cs.CL])
    Dropout is a widely used regularization trick to resolve the overfitting issue in large feedforward neural networks trained on a small dataset, which performs poorly on the held-out test subset. Although the effectiveness of this regularization trick has been extensively studied for convolutional neural networks, there is a lack of analysis of it for unsupervised models and in particular, VAE-based neural topic models. In this paper, we have analyzed the consequences of dropout in the encoder as well as in the decoder of the VAE architecture in three widely used neural topic models, namely, contextualized topic model (CTM), ProdLDA, and embedded topic model (ETM) using four publicly available datasets. We characterize the dropout effect on these models in terms of the quality and predictive performance of the generated topics.
    Forecasting Large Realized Covariance Matrices: The Benefits of Factor Models and Shrinkage. (arXiv:2303.16151v1 [q-fin.ST])
    We propose a model to forecast large realized covariance matrices of returns, applying it to the constituents of the S\&P 500 daily. To address the curse of dimensionality, we decompose the return covariance matrix using standard firm-level factors (e.g., size, value, and profitability) and use sectoral restrictions in the residual covariance matrix. This restricted model is then estimated using vector heterogeneous autoregressive (VHAR) models with the least absolute shrinkage and selection operator (LASSO). Our methodology improves forecasting precision relative to standard benchmarks and leads to better estimates of minimum variance portfolios.
    Enhancement Encoding: A Novel Imbalanced Classification Approach via Encoding the Training Labels. (arXiv:2208.11056v2 [cs.LG] UPDATED)
    Class imbalance, which is also called long-tailed distribution, is a common problem in classification tasks based on machine learning. If it happens, the minority data will be overwhelmed by the majority, which presents quite a challenge for data science. To address the class imbalance problem, researchers have proposed lots of methods: some people make the data set balanced (SMOTE), some others refine the loss function (Focal Loss), and even someone has noticed the value of labels influences class-imbalanced learning (Yang and Xu. Rethinking the value of labels for improving class-imbalanced learning. In NeurIPS 2020), but no one changes the way to encode the labels of data yet. Nowadays, the most prevailing technique to encode labels is the one-hot encoding due to its nice performance in the general situation. However, it is not a good choice for imbalanced data, because the classifier will treat majority and minority samples equally. In this paper, we innovatively propose the enhancement encoding technique, which is specially designed for the imbalanced classification. The enhancement encoding combines re-weighting and cost-sensitiveness, which can reflect the difference between hard and easy (or minority and majority) classes. To reduce the number of validation samples and the computation cost, we also replace the confusion matrix with a novel soft-confusion matrix which works better with a small validation set. In the experiments, we evaluate the enhancement encoding with three different types of loss. And the results show that enhancement encoding is very effective to improve the performance of the network trained with imbalanced data. Particularly, the performance on minority classes is much better.
    DisWOT: Student Architecture Search for Distillation WithOut Training. (arXiv:2303.15678v1 [cs.CV])
    Knowledge distillation (KD) is an effective training strategy to improve the lightweight student models under the guidance of cumbersome teachers. However, the large architecture difference across the teacher-student pairs limits the distillation gains. In contrast to previous adaptive distillation methods to reduce the teacher-student gap, we explore a novel training-free framework to search for the best student architectures for a given teacher. Our work first empirically show that the optimal model under vanilla training cannot be the winner in distillation. Secondly, we find that the similarity of feature semantics and sample relations between random-initialized teacher-student networks have good correlations with final distillation performances. Thus, we efficiently measure similarity matrixs conditioned on the semantic activation maps to select the optimal student via an evolutionary algorithm without any training. In this way, our student architecture search for Distillation WithOut Training (DisWOT) significantly improves the performance of the model in the distillation stage with at least 180$\times$ training acceleration. Additionally, we extend similarity metrics in DisWOT as new distillers and KD-based zero-proxies. Our experiments on CIFAR, ImageNet and NAS-Bench-201 demonstrate that our technique achieves state-of-the-art results on different search spaces. Our project and code are available at https://lilujunai.github.io/DisWOT-CVPR2023/.
    One Transformer Can Understand Both 2D & 3D Molecular Data. (arXiv:2210.01765v4 [cs.LG] UPDATED)
    Unlike vision and language data which usually has a unique format, molecules can naturally be characterized using different chemical formulations. One can view a molecule as a 2D graph or define it as a collection of atoms located in a 3D space. For molecular representation learning, most previous works designed neural networks only for a particular data format, making the learned models likely to fail for other data formats. We believe a general-purpose neural network model for chemistry should be able to handle molecular tasks across data modalities. To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations. Using the standard Transformer as the backbone architecture, Transformer-M develops two separated channels to encode 2D and 3D structural information and incorporate them with the atom features in the network modules. When the input data is in a particular format, the corresponding channel will be activated, and the other will be disabled. By training on 2D and 3D molecular data with properly designed supervised signals, Transformer-M automatically learns to leverage knowledge from different data modalities and correctly capture the representations. We conducted extensive experiments for Transformer-M. All empirical results show that Transformer-M can simultaneously achieve strong performance on 2D and 3D tasks, suggesting its broad applicability. The code and models will be made publicly available at https://github.com/lsj2408/Transformer-M.
    Parameter-Efficient Tuning Makes a Good Classification Head. (arXiv:2210.16771v2 [cs.CL] UPDATED)
    In recent years, pretrained models revolutionized the paradigm of natural language understanding (NLU), where we append a randomly initialized classification head after the pretrained backbone, e.g. BERT, and finetune the whole model. As the pretrained backbone makes a major contribution to the improvement, we naturally expect a good pretrained classification head can also benefit the training. However, the final-layer output of the backbone, i.e. the input of the classification head, will change greatly during finetuning, making the usual head-only pretraining (LP-FT) ineffective. In this paper, we find that parameter-efficient tuning makes a good classification head, with which we can simply replace the randomly initialized heads for a stable performance gain. Our experiments demonstrate that the classification head jointly pretrained with parameter-efficient tuning consistently improves the performance on 9 tasks in GLUE and SuperGLUE.
    NoiseGrad: Enhancing Explanations by Introducing Stochasticity to Model Weights. (arXiv:2106.10185v3 [cs.LG] CROSS LISTED)
    Many efforts have been made for revealing the decision-making process of black-box learning machines such as deep neural networks, resulting in useful local and global explanation methods. For local explanation, stochasticity is known to help: a simple method, called SmoothGrad, has improved the visual quality of gradient-based attribution by adding noise to the input space and averaging the explanations of the noisy inputs. In this paper, we extend this idea and propose NoiseGrad that enhances both local and global explanation methods. Specifically, NoiseGrad introduces stochasticity in the weight parameter space, such that the decision boundary is perturbed. NoiseGrad is expected to enhance the local explanation, similarly to SmoothGrad, due to the dual relationship between the input perturbation and the decision boundary perturbation. We evaluate NoiseGrad and its fusion with SmoothGrad -- FusionGrad -- qualitatively and quantitatively with several evaluation criteria, and show that our novel approach significantly outperforms the baseline methods. Both NoiseGrad and FusionGrad are method-agnostic and as handy as SmoothGrad using a simple heuristic for the choice of the hyperparameter setting without the need of finetuning.
    Your Diffusion Model is Secretly a Zero-Shot Classifier. (arXiv:2303.16203v1 [cs.LG])
    The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive compositional generalization abilities. Almost all use cases thus far have solely focused on sampling; however, diffusion models can also provide conditional density estimates, which are useful for tasks beyond image generation. In this paper, we show that the density estimates from large-scale text-to-image diffusion models like Stable Diffusion can be leveraged to perform zero-shot classification without any additional training. Our generative approach to classification attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models. We also find that our diffusion-based approach has stronger multimodal relational reasoning abilities than competing contrastive approaches. Finally, we evaluate diffusion models trained on ImageNet and find that they approach the performance of SOTA discriminative classifiers trained on the same dataset, even with weak augmentations and no regularization. Results and visualizations at https://diffusion-classifier.github.io/
    Deep Curvilinear Editing: Commutative and Nonlinear Image Manipulation for Pretrained Deep Generative Model. (arXiv:2211.14573v2 [cs.CV] UPDATED)
    Semantic editing of images is the fundamental goal of computer vision. Although deep learning methods, such as generative adversarial networks (GANs), are capable of producing high-quality images, they often do not have an inherent way of editing generated images semantically. Recent studies have investigated a way of manipulating the latent variable to determine the images to be generated. However, methods that assume linear semantic arithmetic have certain limitations in terms of the quality of image editing, whereas methods that discover nonlinear semantic pathways provide non-commutative editing, which is inconsistent when applied in different orders. This study proposes a novel method called deep curvilinear editing (DeCurvEd) to determine semantic commuting vector fields on the latent space. We theoretically demonstrate that owing to commutativity, the editing of multiple attributes depends only on the quantities and not on the order. Furthermore, we experimentally demonstrate that compared to previous methods, the nonlinear and commutative nature of DeCurvEd facilitates the disentanglement of image attributes and provides higher-quality editing.
    Deep Riemannian Networks for EEG Decoding. (arXiv:2212.10426v4 [cs.LG] UPDATED)
    State-of-the-art performance in electroencephalography (EEG) decoding tasks is currently often achieved with either Deep-Learning or Riemannian-Geometry-based decoders. Recently, there is growing interest in Deep Riemannian Networks (DRNs) possibly combining the advantages of both previous classes of methods. However, there are still a range of topics where additional insight is needed to pave the way for a more widespread application of DRNs in EEG. These include architecture design questions such as network size and end-to-end ability as well as model training questions. How these factors affect model performance has not been explored. Additionally, it is not clear how the data within these networks is transformed, and whether this would correlate with traditional EEG decoding. Our study aims to lay the groundwork in the area of these topics through the analysis of DRNs for EEG with a wide range of hyperparameters. Networks were tested on two public EEG datasets and compared with state-of-the-art ConvNets. Here we propose end-to-end EEG SPDNet (EE(G)-SPDNet), and we show that this wide, end-to-end DRN can outperform the ConvNets, and in doing so use physiologically plausible frequency regions. We also show that the end-to-end approach learns more complex filters than traditional band-pass filters targeting the classical alpha, beta, and gamma frequency bands of the EEG, and that performance can benefit from channel specific filtering approaches. Additionally, architectural analysis revealed areas for further improvement due to the possible loss of Riemannian specific information throughout the network. Our study thus shows how to design and train DRNs to infer task-related information from the raw EEG without the need of handcrafted filterbanks and highlights the potential of end-to-end DRNs such as EE(G)-SPDNet for high-performance EEG decoding.
    Fast Convergence Federated Learning with Aggregated Gradients. (arXiv:2303.15799v1 [cs.LG])
    Federated Learning (FL) is a novel machine learning framework, which enables multiple distributed devices cooperatively training a shared model scheduled by a central server while protecting private data locally. However, the non-independent-and-identically-distributed (Non-IID) data samples and frequent communication among participants will slow down the convergent rate and increase communication costs. To achieve fast convergence, we ameliorate the local gradient descend approach in conventional local update rule by introducing the aggregated gradients at each local update epoch, and propose an adaptive learning rate algorithm that further takes the deviation of local parameter and global parameter into consideration at each iteration. The above strategy requires all clients' local parameters and gradients at each local iteration, which is challenging as there is no communication during local update epochs. Accordingly, we utilize mean field approach by introducing two mean field terms to estimate the average local parameters and gradients respectively, which does not require clients to exchange their private information with each other at each local update epoch. Numerical results show that our proposed framework is superior to the state-of-art schemes in model accuracy and convergent rate on both IID and Non-IID dataset.
    Embedding Contextual Information through Reward Shaping in Multi-Agent Learning: A Case Study from Google Football. (arXiv:2303.15471v1 [cs.LG])
    Artificial Intelligence has been used to help human complete difficult tasks in complicated environments by providing optimized strategies for decision-making or replacing the manual labour. In environments including multiple agents, such as football, the most common methods to train agents are Imitation Learning and Multi-Agent Reinforcement Learning (MARL). However, the agents trained by Imitation Learning cannot outperform the expert demonstrator, which makes humans hardly get new insights from the learnt policy. Besides, MARL is prone to the credit assignment problem. In environments with sparse reward signal, this method can be inefficient. The objective of our research is to create a novel reward shaping method by embedding contextual information in reward function to solve the aforementioned challenges. We demonstrate this in the Google Research Football (GRF) environment. We quantify the contextual information extracted from game state observation and use this quantification together with original sparse reward to create the shaped reward. The experiment results in the GRF environment prove that our reward shaping method is a useful addition to state-of-the-art MARL algorithms for training agents in environments with sparse reward signal.
    Conditional Generative Models are Provably Robust: Pointwise Guarantees for Bayesian Inverse Problems. (arXiv:2303.15845v1 [cs.LG])
    Conditional generative models became a very powerful tool to sample from Bayesian inverse problem posteriors. It is well-known in classical Bayesian literature that posterior measures are quite robust with respect to perturbations of both the prior measure and the negative log-likelihood, which includes perturbations of the observations. However, to the best of our knowledge, the robustness of conditional generative models with respect to perturbations of the observations has not been investigated yet. In this paper, we prove for the first time that appropriately learned conditional generative models provide robust results for single observations.
    Variational Distribution Learning for Unsupervised Text-to-Image Generation. (arXiv:2303.16105v1 [cs.CV])
    We propose a text-to-image generation algorithm based on deep neural networks when text captions for images are unavailable during training. In this work, instead of simply generating pseudo-ground-truth sentences of training images using existing image captioning methods, we employ a pretrained CLIP model, which is capable of properly aligning embeddings of images and corresponding texts in a joint space and, consequently, works well on zero-shot recognition tasks. We optimize a text-to-image generation model by maximizing the data log-likelihood conditioned on pairs of image-text CLIP embeddings. To better align data in the two domains, we employ a principled way based on a variational inference, which efficiently estimates an approximate posterior of the hidden text embedding given an image and its CLIP feature. Experimental results validate that the proposed framework outperforms existing approaches by large margins under unsupervised and semi-supervised text-to-image generation settings.
    Adaptive Voronoi NeRFs. (arXiv:2303.16001v1 [cs.CV])
    Neural Radiance Fields (NeRFs) learn to represent a 3D scene from just a set of registered images. Increasing sizes of a scene demands more complex functions, typically represented by neural networks, to capture all details. Training and inference then involves querying the neural network millions of times per image, which becomes impractically slow. Since such complex functions can be replaced by multiple simpler functions to improve speed, we show that a hierarchy of Voronoi diagrams is a suitable choice to partition the scene. By equipping each Voronoi cell with its own NeRF, our approach is able to quickly learn a scene representation. We propose an intuitive partitioning of the space that increases quality gains during training by distributing information evenly among the networks and avoids artifacts through a top-down adaptive refinement. Our framework is agnostic to the underlying NeRF method and easy to implement, which allows it to be applied to various NeRF variants for improved learning and rendering speeds.
    From Plate to Prevention: A Dietary Nutrient-aided Platform for Health Promotion in Singapore. (arXiv:2301.03829v2 [cs.LG] UPDATED)
    Singapore has been striving to improve the provision of healthcare services to her people. In this course, the government has taken note of the deficiency in regulating and supervising people's nutrient intake, which is identified as a contributing factor to the development of chronic diseases. Consequently, this issue has garnered significant attention. In this paper, we share our experience in addressing this issue and attaining medical-grade nutrient intake information to benefit Singaporeans in different aspects. To this end, we develop the FoodSG platform to incubate diverse healthcare-oriented applications as a service in Singapore, taking into account their shared requirements. We further identify the profound meaning of localized food datasets and systematically clean and curate a localized Singaporean food dataset FoodSG-233. To overcome the hurdle in recognition performance brought by Singaporean multifarious food dishes, we propose to integrate supervised contrastive learning into our food recognition model FoodSG-SCL for the intrinsic capability to mine hard positive/negative samples and therefore boost the accuracy. Through a comprehensive evaluation, we present performance results of the proposed model and insights on food-related healthcare applications. The FoodSG-233 dataset has been released in https://foodlg.comp.nus.edu.sg/.
    CGC: Contrastive Graph Clustering for Community Detection and Tracking. (arXiv:2204.08504v4 [cs.SI] UPDATED)
    Given entities and their interactions in the web data, which may have occurred at different time, how can we find communities of entities and track their evolution? In this paper, we approach this important task from graph clustering perspective. Recently, state-of-the-art clustering performance in various domains has been achieved by deep clustering methods. Especially, deep graph clustering (DGC) methods have successfully extended deep clustering to graph-structured data by learning node representations and cluster assignments in a joint optimization framework. Despite some differences in modeling choices (e.g., encoder architectures), existing DGC methods are mainly based on autoencoders and use the same clustering objective with relatively minor adaptations. Also, while many real-world graphs are dynamic, previous DGC methods considered only static graphs. In this work, we develop CGC, a novel end-to-end framework for graph clustering, which fundamentally differs from existing methods. CGC learns node embeddings and cluster assignments in a contrastive graph learning framework, where positive and negative samples are carefully selected in a multi-level scheme such that they reflect hierarchical community structures and network homophily. Also, we extend CGC for time-evolving data, where temporal graph clustering is performed in an incremental learning fashion, with the ability to detect change points. Extensive evaluation on real-world graphs demonstrates that the proposed CGC consistently outperforms existing methods.
    Efficient Parallel Split Learning over Resource-constrained Wireless Edge Networks. (arXiv:2303.15991v1 [cs.LG])
    The increasingly deeper neural networks hinder the democratization of privacy-enhancing distributed learning, such as federated learning (FL), to resource-constrained devices. To overcome this challenge, in this paper, we advocate the integration of edge computing paradigm and parallel split learning (PSL), allowing multiple client devices to offload substantial training workloads to an edge server via layer-wise model split. By observing that existing PSL schemes incur excessive training latency and large volume of data transmissions, we propose an innovative PSL framework, namely, efficient parallel split learning (EPSL), to accelerate model training. To be specific, EPSL parallelizes client-side model training and reduces the dimension of local gradients for back propagation (BP) via last-layer gradient aggregation, leading to a significant reduction in server-side training and communication latency. Moreover, by considering the heterogeneous channel conditions and computing capabilities at client devices, we jointly optimize subchannel allocation, power control, and cut layer selection to minimize the per-round latency. Simulation results show that the proposed EPSL framework significantly decreases the training latency needed to achieve a target accuracy compared with the state-of-the-art benchmarks, and the tailored resource management and layer split strategy can considerably reduce latency than the counterpart without optimization.
    Planning with Sequence Models through Iterative Energy Minimization. (arXiv:2303.16189v1 [cs.LG])
    Recent works have shown that sequence modeling can be effectively used to train reinforcement learning (RL) policies. However, the success of applying existing sequence models to planning, in which we wish to obtain a trajectory of actions to reach some goal, is less straightforward. The typical autoregressive generation procedures of sequence models preclude sequential refinement of earlier steps, which limits the effectiveness of a predicted plan. In this paper, we suggest an approach towards integrating planning with sequence models based on the idea of iterative energy minimization, and illustrate how such a procedure leads to improved RL performance across different tasks. We train a masked language model to capture an implicit energy function over trajectories of actions, and formulate planning as finding a trajectory of actions with minimum energy. We illustrate how this procedure enables improved performance over recent approaches across BabyAI and Atari environments. We further demonstrate unique benefits of our iterative optimization procedure, involving new task generalization, test-time constraints adaptation, and the ability to compose plans together. Project website: https://hychen-naza.github.io/projects/LEAP
    The Wyner Variational Autoencoder for Unsupervised Multi-Layer Wireless Fingerprinting. (arXiv:2303.15860v1 [cs.IT])
    Wireless fingerprinting refers to a device identification method leveraging hardware imperfections and wireless channel variations as signatures. Beyond physical layer characteristics, recent studies demonstrated that user behaviours could be identified through network traffic, e.g., packet length, without decryption of the payload. Inspired by these results, we propose a multi-layer fingerprinting framework that jointly considers the multi-layer signatures for improved identification performance. In contrast to previous works, by leveraging the recent multi-view machine learning paradigm, i.e., data with multiple forms, our method can cluster the device information shared among the multi-layer features without supervision. Our information-theoretic approach can be extended to supervised and semi-supervised settings with straightforward derivations. In solving the formulated problem, we obtain a tight surrogate bound using variational inference for efficient optimization. In extracting the shared device information, we develop an algorithm based on the Wyner common information method, enjoying reduced computation complexity as compared to existing approaches. The algorithm can be applied to data distributions belonging to the exponential family class. Empirically, we evaluate the algorithm in a synthetic dataset with real-world video traffic and simulated physical layer characteristics. Our empirical results show that the proposed method outperforms the state-of-the-art baselines in both supervised and unsupervised settings.
    Deep Learning on a Data Diet: Finding Important Examples Early in Training. (arXiv:2107.07075v2 [cs.LG] UPDATED)
    Recent success in deep learning has partially been driven by training increasingly overparametrized networks on ever larger datasets. It is therefore natural to ask: how much of the data is superfluous, which examples are important for generalization, and how do we find them? In this work, we make the striking observation that, in standard vision datasets, simple scores averaged over several weight initializations can be used to identify important examples very early in training. We propose two such scores -- the Gradient Normed (GraNd) and the Error L2-Norm (EL2N) scores -- and demonstrate their efficacy on a range of architectures and datasets by pruning significant fractions of training data without sacrificing test accuracy. In fact, using EL2N scores calculated a few epochs into training, we can prune half of the CIFAR10 training set while slightly improving test accuracy. Furthermore, for a given dataset, EL2N scores from one architecture or hyperparameter configuration generalize to other configurations. Compared to recent work that prunes data by discarding examples that are rarely forgotten over the course of training, our scores use only local information early in training. We also use our scores to detect noisy examples and study training dynamics through the lens of important examples -- we investigate how the data distribution shapes the loss surface and identify subspaces of the model's data representation that are relatively stable over training.
    Repulsive Deep Ensembles are Bayesian. (arXiv:2106.11642v3 [cs.LG] UPDATED)
    Deep ensembles have recently gained popularity in the deep learning community for their conceptual simplicity and efficiency. However, maintaining functional diversity between ensemble members that are independently trained with gradient descent is challenging. This can lead to pathologies when adding more ensemble members, such as a saturation of the ensemble performance, which converges to the performance of a single model. Moreover, this does not only affect the quality of its predictions, but even more so the uncertainty estimates of the ensemble, and thus its performance on out-of-distribution data. We hypothesize that this limitation can be overcome by discouraging different ensemble members from collapsing to the same function. To this end, we introduce a kernelized repulsive term in the update rule of the deep ensembles. We show that this simple modification not only enforces and maintains diversity among the members but, even more importantly, transforms the maximum a posteriori inference into proper Bayesian inference. Namely, we show that the training dynamics of our proposed repulsive ensembles follow a Wasserstein gradient flow of the KL divergence with the true posterior. We study repulsive terms in weight and function space and empirically compare their performance to standard ensembles and Bayesian baselines on synthetic and real-world prediction tasks.
    Pareto Optimization of a Laser Wakefield Accelerator. (arXiv:2303.15825v1 [physics.acc-ph])
    Optimization of accelerator performance parameters is limited by numerous trade-offs and finding the appropriate balance between optimization goals for an unknown system is challenging to achieve. Here we show that multi-objective Bayesian optimization can map the solution space of a laser wakefield accelerator in a very sample-efficient way. Using a Gaussian mixture model, we isolate contributions related to an electron bunch at a certain energy and we observe that there exists a wide range of Pareto-optimal solutions that trade beam energy versus charge at similar laser-to-beam efficiency. However, many applications such as light sources require particle beams at a certain target energy. Once such a constraint is introduced we observe a direct trade-off between energy spread and accelerator efficiency. We furthermore demonstrate how specific solutions can be exploited using \emph{a posteriori} scalarization of the objectives, thereby efficiently splitting the exploration and exploitation phases.
    Learnability, Sample Complexity, and Hypothesis Class Complexity for Regression Models. (arXiv:2303.16091v1 [stat.ML])
    The goal of a learning algorithm is to receive a training data set as input and provide a hypothesis that can generalize to all possible data points from a domain set. The hypothesis is chosen from hypothesis classes with potentially different complexities. Linear regression modeling is an important category of learning algorithms. The practical uncertainty of the target samples affects the generalization performance of the learned model. Failing to choose a proper model or hypothesis class can lead to serious issues such as underfitting or overfitting. These issues have been addressed by alternating cost functions or by utilizing cross-validation methods. These approaches can introduce new hyperparameters with their own new challenges and uncertainties or increase the computational complexity of the learning algorithm. On the other hand, the theory of probably approximately correct (PAC) aims at defining learnability based on probabilistic settings. Despite its theoretical value, PAC does not address practical learning issues on many occasions. This work is inspired by the foundation of PAC and is motivated by the existing regression learning issues. The proposed approach, denoted by epsilon-Confidence Approximately Correct (epsilon CoAC), utilizes Kullback Leibler divergence (relative entropy) and proposes a new related typical set in the set of hyperparameters to tackle the learnability issue. Moreover, it enables the learner to compare hypothesis classes of different complexity orders and choose among them the optimum with the minimum epsilon in the epsilon CoAC framework. Not only the epsilon CoAC learnability overcomes the issues of overfitting and underfitting, but it also shows advantages and superiority over the well known cross-validation method in the sense of time consumption as well as in the sense of accuracy.
    Denoising Autoencoder-based Defensive Distillation as an Adversarial Robustness Algorithm. (arXiv:2303.15901v1 [cs.LG])
    Adversarial attacks significantly threaten the robustness of deep neural networks (DNNs). Despite the multiple defensive methods employed, they are nevertheless vulnerable to poison attacks, where attackers meddle with the initial training data. In order to defend DNNs against such adversarial attacks, this work proposes a novel method that combines the defensive distillation mechanism with a denoising autoencoder (DAE). This technique tries to lower the sensitivity of the distilled model to poison attacks by spotting and reconstructing poisonous adversarial inputs in the training data. We added carefully created adversarial samples to the initial training data to assess the proposed method's performance. Our experimental findings demonstrate that our method successfully identified and reconstructed the poisonous inputs while also considering enhancing the DNN's resilience. The proposed approach provides a potent and robust defense mechanism for DNNs in various applications where data poisoning attacks are a concern. Thus, the defensive distillation technique's limitation posed by poisonous adversarial attacks is overcome.
    Lazy learning: a biologically-inspired plasticity rule for fast and energy efficient synaptic plasticity. (arXiv:2303.16067v1 [cs.NE])
    When training neural networks for classification tasks with backpropagation, parameters are updated on every trial, even if the sample is classified correctly. In contrast, humans concentrate their learning effort on errors. Inspired by human learning, we introduce lazy learning, which only learns on incorrect samples. Lazy learning can be implemented in a few lines of code and requires no hyperparameter tuning. Lazy learning achieves state-of-the-art performance and is particularly suited when datasets are large. For instance, it reaches 99.2% test accuracy on Extended MNIST using a single-layer MLP, and does so 7.6x faster than a matched backprop network
    Measuring Classification Decision Certainty and Doubt. (arXiv:2303.14568v2 [stat.ML] UPDATED)
    Quantitative characterizations and estimations of uncertainty are of fundamental importance in optimization and decision-making processes. Herein, we propose intuitive scores, which we call certainty and doubt, that can be used in both a Bayesian and frequentist framework to assess and compare the quality and uncertainty of predictions in (multi-)classification decision machine learning problems.
    Uncovering Bias in Personal Informatics. (arXiv:2303.15592v1 [cs.CY])
    Personal informatics (PI) systems, powered by smartphones and wearables, enable people to lead healthier lifestyles by providing meaningful and actionable insights that break down barriers between users and their health information. Today, such systems are used by billions of users for monitoring not only physical activity and sleep but also vital signs and women's and heart health, among others. %Despite their widespread usage, the processing of particularly sensitive personal data, and their proximity to domains known to be susceptible to bias, such as healthcare, bias in PI has not been investigated systematically. Despite their widespread usage, the processing of sensitive PI data may suffer from biases, which may entail practical and ethical implications. In this work, we present the first comprehensive empirical and analytical study of bias in PI systems, including biases in raw data and in the entire machine learning life cycle. We use the most detailed framework to date for exploring the different sources of bias and find that biases exist both in the data generation and the model learning and implementation streams. According to our results, the most affected minority groups are users with health issues, such as diabetes, joint issues, and hypertension, and female users, whose data biases are propagated or even amplified by learning models, while intersectional biases can also be observed.
    Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization. (arXiv:2303.15810v1 [cs.LG])
    Most offline reinforcement learning (RL) methods suffer from the trade-off between improving the policy to surpass the behavior policy and constraining the policy to limit the deviation from the behavior policy as computing $Q$-values using out-of-distribution (OOD) actions will suffer from errors due to distributional shift. The recently proposed \textit{In-sample Learning} paradigm (i.e., IQL), which improves the policy by quantile regression using only data samples, shows great promise because it learns an optimal policy without querying the value function of any unseen actions. However, it remains unclear how this type of method handles the distributional shift in learning the value function. In this work, we make a key finding that the in-sample learning paradigm arises under the \textit{Implicit Value Regularization} (IVR) framework. This gives a deeper understanding of why the in-sample learning paradigm works, i.e., it applies implicit value regularization to the policy. Based on the IVR framework, we further propose two practical algorithms, Sparse $Q$-learning (SQL) and Exponential $Q$-learning (EQL), which adopt the same value regularization used in existing works, but in a complete in-sample manner. Compared with IQL, we find that our algorithms introduce sparsity in learning the value function, making them more robust in noisy data regimes. We also verify the effectiveness of SQL and EQL on D4RL benchmark datasets and show the benefits of in-sample learning by comparing them with CQL in small data regimes.
    GNN-Assisted Phase Space Integration with Application to Atomistics. (arXiv:2303.16088v1 [physics.comp-ph])
    Overcoming the time scale limitations of atomistics can be achieved by switching from the state-space representation of Molecular Dynamics (MD) to a statistical-mechanics-based representation in phase space, where approximations such as maximum-entropy or Gaussian phase packets (GPP) evolve the atomistic ensemble in a time-coarsened fashion. In practice, this requires the computation of expensive high-dimensional integrals over all of phase space of an atomistic ensemble. This, in turn, is commonly accomplished efficiently by low-order numerical quadrature. We show that numerical quadrature in this context, unfortunately, comes with a set of inherent problems, which corrupt the accuracy of simulations -- especially when dealing with crystal lattices with imperfections. As a remedy, we demonstrate that Graph Neural Networks, trained on Monte-Carlo data, can serve as a replacement for commonly used numerical quadrature rules, overcoming their deficiencies and significantly improving the accuracy. This is showcased by three benchmarks: the thermal expansion of copper, the martensitic phase transition of iron, and the energy of grain boundaries. We illustrate the benefits of the proposed technique over classically used third- and fifth-order Gaussian quadrature, we highlight the impact on time-coarsened atomistic predictions, and we discuss the computational efficiency. The latter is of general importance when performing frequent evaluation of phase space or other high-dimensional integrals, which is why the proposed framework promises applications beyond the scope of atomistics.
    Predicting Adverse Neonatal Outcomes for Preterm Neonates with Multi-Task Learning. (arXiv:2303.15656v1 [cs.LG])
    Diagnosis of adverse neonatal outcomes is crucial for preterm survival since it enables doctors to provide timely treatment. Machine learning (ML) algorithms have been demonstrated to be effective in predicting adverse neonatal outcomes. However, most previous ML-based methods have only focused on predicting a single outcome, ignoring the potential correlations between different outcomes, and potentially leading to suboptimal results and overfitting issues. In this work, we first analyze the correlations between three adverse neonatal outcomes and then formulate the diagnosis of multiple neonatal outcomes as a multi-task learning (MTL) problem. We then propose an MTL framework to jointly predict multiple adverse neonatal outcomes. In particular, the MTL framework contains shared hidden layers and multiple task-specific branches. Extensive experiments have been conducted using Electronic Health Records (EHRs) from 121 preterm neonates. Empirical results demonstrate the effectiveness of the MTL framework. Furthermore, the feature importance is analyzed for each neonatal outcome, providing insights into model interpretability.
    Are Metrics Enough? Guidelines for Communicating and Visualizing Predictive Models to Subject Matter Experts. (arXiv:2205.05749v2 [cs.HC] UPDATED)
    Presenting a predictive model's performance is a communication bottleneck that threatens collaborations between data scientists and subject matter experts. Accuracy and error metrics alone fail to tell the whole story of a model - its risks, strengths, and limitations - making it difficult for subject matter experts to feel confident in their decision to use a model. As a result, models may fail in unexpected ways or go entirely unused, as subject matter experts disregard poorly presented models in favor of familiar, yet arguably substandard methods. In this paper, we describe an iterative study conducted with both subject matter experts and data scientists to understand the gaps in communication between these two groups. We find that, while the two groups share common goals of understanding the data and predictions of the model, friction can stem from unfamiliar terms, metrics, and visualizations - limiting the transfer of knowledge to SMEs and discouraging clarifying questions being asked during presentations. Based on our findings, we derive a set of communication guidelines that use visualization as a common medium for communicating the strengths and weaknesses of a model. We provide a demonstration of our guidelines in a regression modeling scenario and elicit feedback on their use from subject matter experts. From our demonstration, subject matter experts were more comfortable discussing a model's performance, more aware of the trade-offs for the presented model, and better equipped to assess the model's risks - ultimately informing and contextualizing the model's use beyond text and numbers.
    Projected Latent Distillation for Data-Agnostic Consolidation in Distributed Continual Learning. (arXiv:2303.15888v1 [cs.LG])
    Distributed learning on the edge often comprises self-centered devices (SCD) which learn local tasks independently and are unwilling to contribute to the performance of other SDCs. How do we achieve forward transfer at zero cost for the single SCDs? We formalize this problem as a Distributed Continual Learning scenario, where SCD adapt to local tasks and a CL model consolidates the knowledge from the resulting stream of models without looking at the SCD's private data. Unfortunately, current CL methods are not directly applicable to this scenario. We propose Data-Agnostic Consolidation (DAC), a novel double knowledge distillation method that consolidates the stream of SC models without using the original data. DAC performs distillation in the latent space via a novel Projected Latent Distillation loss. Experimental results show that DAC enables forward transfer between SCDs and reaches state-of-the-art accuracy on Split CIFAR100, CORe50 and Split TinyImageNet, both in reharsal-free and distributed CL scenarios. Somewhat surprisingly, even a single out-of-distribution image is sufficient as the only source of data during consolidation.
    Searching for long faint astronomical high energy transients: a data driven approach. (arXiv:2303.15936v1 [astro-ph.HE])
    HERMES (High Energy Rapid Modular Ensemble of Satellites) pathfinder is an in-orbit demonstration consisting of a constellation of six 3U nano-satellites hosting simple but innovative detectors for the monitoring of cosmic high-energy transients. The main objective of HERMES Pathfinder is to prove that accurate position of high-energy cosmic transients can be obtained using miniaturized hardware. The transient position is obtained by studying the delay time of arrival of the signal to different detectors hosted by nano-satellites on low Earth orbits. To this purpose, the goal is to achive an overall accuracy of a fraction of a micro-second. In this context, we need to develop novel tools to fully exploit the future scientific data output of HERMES Pathfinder. In this paper, we introduce a new framework to assess the background count rate of a space-born, high energy detector; a key step towards the identification of faint astrophysical transients. We employ a Neural Network (NN) to estimate the background lightcurves on different timescales. Subsequently, we employ a fast change-point and anomaly detection technique to isolate observation segments where statistically significant excesses in the observed count rate relative to the background estimate exist. We test the new software on archival data from the NASA Fermi Gamma-ray Burst Monitor (GBM), which has a collecting area and background level of the same order of magnitude to those of HERMES Pathfinder. The NN performances are discussed and analyzed over period of both high and low solar activity. We were able to confirm events in the Fermi/GBM catalog and found events, not present in Fermi/GBM database, that could be attributed to Solar Flares, Terrestrial Gamma-ray Flashes, Gamma-Ray Bursts, Galactic X-ray flash. Seven of these are selected and analyzed further, providing an estimate of localisation and a tentative classification.
    That Label's Got Style: Handling Label Style Bias for Uncertain Image Segmentation. (arXiv:2303.15850v1 [cs.CV])
    Segmentation uncertainty models predict a distribution over plausible segmentations for a given input, which they learn from the annotator variation in the training set. However, in practice these annotations can differ systematically in the way they are generated, for example through the use of different labeling tools. This results in datasets that contain both data variability and differing label styles. In this paper, we demonstrate that applying state-of-the-art segmentation uncertainty models on such datasets can lead to model bias caused by the different label styles. We present an updated modelling objective conditioning on labeling style for aleatoric uncertainty estimation, and modify two state-of-the-art-architectures for segmentation uncertainty accordingly. We show with extensive experiments that this method reduces label style bias, while improving segmentation performance, increasing the applicability of segmentation uncertainty models in the wild. We curate two datasets, with annotations in different label styles, which we will make publicly available along with our code upon publication.
    Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection. (arXiv:2203.02194v4 [cs.CV] UPDATED)
    In some scenarios, classifier requires detecting out-of-distribution samples far from its training data. With desirable characteristics, reconstruction autoencoder-based methods deal with this problem by using input reconstruction error as a metric of novelty vs. normality. We formulate the essence of such approach as a quadruplet domain translation with an intrinsic bias to only query for a proxy of conditional data uncertainty. Accordingly, an improvement direction is formalized as maximumly compressing the autoencoder's latent space while ensuring its reconstructive power for acting as a described domain translator. From it, strategies are introduced including semantic reconstruction, data certainty decomposition and normalized L2 distance to substantially improve original methods, which together establish state-of-the-art performance on various benchmarks, e.g., the FPR@95%TPR of CIFAR-100 vs. TinyImagenet-crop on Wide-ResNet is 0.2%. Importantly, our method works without any additional data, hard-to-implement structure, time-consuming pipeline, and even harming the classification accuracy of known classes.
    A Secure Federated Learning Framework for Residential Short Term Load Forecasting. (arXiv:2209.14547v2 [cs.CR] UPDATED)
    Smart meter measurements, though critical for accurate demand forecasting, face several drawbacks including consumers' privacy, data breach issues, to name a few. Recent literature has explored Federated Learning (FL) as a promising privacy-preserving machine learning alternative which enables collaborative learning of a model without exposing private raw data for short term load forecasting. Despite its virtue, standard FL is still vulnerable to an intractable cyber threat known as Byzantine attack carried out by faulty and/or malicious clients. Therefore, to improve the robustness of federated short-term load forecasting against Byzantine threats, we develop a state-of-the-art differentially private secured FL-based framework that ensures the privacy of the individual smart meter's data while protect the security of FL models and architecture. Our proposed framework leverages the idea of gradient quantization through the Sign Stochastic Gradient Descent (SignSGD) algorithm, where the clients only transmit the `sign' of the gradient to the control centre after local model training. As we highlight through our experiments involving benchmark neural networks with a set of Byzantine attack models, our proposed approach mitigates such threats quite effectively and thus outperforms conventional Fed-SGD models.
    Efficient Activation Function Optimization through Surrogate Modeling. (arXiv:2301.05785v2 [cs.LG] UPDATED)
    Carefully designed activation functions can improve the performance of neural networks in many machine learning tasks. However, it is difficult for humans to construct optimal activation functions, and current activation function search algorithms are prohibitively expensive. This paper aims to improve the state of the art through three steps: First, the benchmark datasets Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions. Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization. More specifically, the spectrum of the Fisher information matrix associated with the model's predictive distribution at initialization and the activation function's output distribution were found to be highly predictive of performance. Third, the surrogate was used to discover improved activation functions in CIFAR-100 and ImageNet tasks. Each of these steps is a contribution in its own right; together they serve as a practical and theoretical foundation for further research on activation function optimization. Code is available at https://github.com/cognizant-ai-labs/aquasurf, and the benchmark datasets are at https://github.com/cognizant-ai-labs/act-bench.
    On Function-Coupled Watermarks for Deep Neural Networks. (arXiv:2302.10296v2 [cs.CV] UPDATED)
    Well-performed deep neural networks (DNNs) generally require massive labelled data and computational resources for training. Various watermarking techniques are proposed to protect such intellectual properties (IPs), wherein the DNN providers implant secret information into the model so that they can later claim IP ownership by retrieving their embedded watermarks with some dedicated trigger inputs. While promising results are reported in the literature, existing solutions suffer from watermark removal attacks, such as model fine-tuning and model pruning. In this paper, we propose a novel DNN watermarking solution that can effectively defend against the above attacks. Our key insight is to enhance the coupling of the watermark and model functionalities such that removing the watermark would inevitably degrade the model's performance on normal inputs. To this end, unlike previous methods relying on secret features learnt from out-of-distribution data, our method only uses features learnt from in-distribution data. Specifically, on the one hand, we propose to sample inputs from the original training dataset and fuse them as watermark triggers. On the other hand, we randomly mask model weights during training so that the information of our embedded watermarks spreads in the network. By doing so, model fine-tuning/pruning would not forget our function-coupled watermarks. Evaluation results on various image classification tasks show a 100\% watermark authentication success rate under aggressive watermark removal attacks, significantly outperforming existing solutions. Code is available: https://github.com/cure-lab/Function-Coupled-Watermark.
    Assessment of Reinforcement Learning for Macro Placement. (arXiv:2302.11014v2 [cs.LG] UPDATED)
    We provide open, transparent implementation and assessment of Google Brain's deep reinforcement learning approach to macro placement and its Circuit Training (CT) implementation in GitHub. We implement in open source key "blackbox" elements of CT, and clarify discrepancies between CT and Nature paper. New testcases on open enablements are developed and released. We assess CT alongside multiple alternative macro placers, with all evaluation flows and related scripts public in GitHub. Our experiments also encompass academic mixed-size placement benchmarks, as well as ablation and stability studies. We comment on the impact of Nature and CT, as well as directions for future research.
    TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns. (arXiv:2303.15747v1 [cs.LG])
    We present \emph{TabRet}, a pre-trainable Transformer-based model for tabular data. TabRet is designed to work on a downstream task that contains columns not seen in pre-training. Unlike other methods, TabRet has an extra learning step before fine-tuning called \emph{retokenizing}, which calibrates feature embeddings based on the masked autoencoding loss. In experiments, we pre-trained TabRet with a large collection of public health surveys and fine-tuned it on classification tasks in healthcare, and TabRet achieved the best AUC performance on four datasets. In addition, an ablation study shows retokenizing and random shuffle augmentation of columns during pre-training contributed to performance gains.
    Multimodal Manoeuvre and Trajectory Prediction for Autonomous Vehicles Using Transformer Networks. (arXiv:2303.16109v1 [cs.LG])
    Predicting the behaviour (i.e. manoeuvre/trajectory) of other road users, including vehicles, is critical for the safe and efficient operation of autonomous vehicles (AVs), a.k.a. automated driving systems (ADSs). Due to the uncertain future behaviour of vehicles, multiple future behaviour modes are often plausible for a vehicle in a given driving scene. Therefore, multimodal prediction can provide richer information than single-mode prediction enabling AVs to perform a better risk assessment. To this end, we propose a novel multimodal prediction framework that can predict multiple plausible behaviour modes and their likelihoods. The proposed framework includes a bespoke problem formulation for manoeuvre prediction, a novel transformer-based prediction model, and a tailored training method for multimodal manoeuvre and trajectory prediction. The performance of the framework is evaluated using two public benchmark highway driving datasets, namely NGSIM and highD. The results show that the proposed framework outperforms the state-of-the-art multimodal methods in the literature in terms of prediction error and is capable of predicting plausible manoeuvre and trajectory modes.
    Fake it till you make it: Learning transferable representations from synthetic ImageNet clones. (arXiv:2212.08420v2 [cs.CV] UPDATED)
    Recent image generation models such as Stable Diffusion have exhibited an impressive ability to generate fairly realistic images starting from a simple text prompt. Could such models render real images obsolete for training image prediction models? In this paper, we answer part of this provocative question by investigating the need for real images when training models for ImageNet classification. Provided only with the class names that have been used to build the dataset, we explore the ability of Stable Diffusion to generate synthetic clones of ImageNet and measure how useful these are for training classification models from scratch. We show that with minimal and class-agnostic prompt engineering, ImageNet clones are able to close a large part of the gap between models produced by synthetic images and models trained with real images, for the several standard classification benchmarks that we consider in this study. More importantly, we show that models trained on synthetic images exhibit strong generalization properties and perform on par with models trained on real data for transfer. Project page: https://europe.naverlabs.com/imagenet-sd/
    Learning Rate Schedules in the Presence of Distribution Shift. (arXiv:2303.15634v1 [cs.LG])
    We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret.
    On the Equivalence Between Temporal and Static Graph Representations for Observational Predictions. (arXiv:2103.07016v2 [cs.LG] UPDATED)
    This work formalizes the associational task of predicting node attribute evolution in temporal graphs from the perspective of learning equivariant representations. We show that node representations in temporal graphs can be cast into two distinct frameworks: (a) The most popular approach, which we denote as time-and-graph, where equivariant graph (e.g., GNN) and sequence (e.g., RNN) representations are intertwined to represent the temporal evolution of node attributes in the graph; and (b) an approach that we denote as time-then-graph, where the sequences describing the node and edge dynamics are represented first, then fed as node and edge attributes into a static equivariant graph representation that comes after. Interestingly, we show that time-then-graph representations have an expressivity advantage over time-and-graph representations when both use component GNNs that are not most-expressive (e.g., 1-Weisfeiler-Lehman GNNs). Moreover, while our goal is not necessarily to obtain state-of-the-art results, our experiments show that time-then-graph methods are capable of achieving better performance and efficiency than state-of-the-art time-and-graph methods in some real-world tasks, thereby showcasing that the time-then-graph framework is a worthy addition to the graph ML toolbox.
    Numerical Methods for Convex Multistage Stochastic Optimization. (arXiv:2303.15672v1 [math.OC])
    Optimization problems involving sequential decisions in a stochastic environment were studied in Stochastic Programming (SP), Stochastic Optimal Control (SOC) and Markov Decision Processes (MDP). In this paper we mainly concentrate on SP and SOC modelling approaches. In these frameworks there are natural situations when the considered problems are convex. Classical approach to sequential optimization is based on dynamic programming. It has the problem of the so-called ``Curse of Dimensionality", in that its computational complexity increases exponentially with increase of dimension of state variables. Recent progress in solving convex multistage stochastic problems is based on cutting planes approximations of the cost-to-go (value) functions of dynamic programming equations. Cutting planes type algorithms in dynamical settings is one of the main topics of this paper. We also discuss Stochastic Approximation type methods applied to multistage stochastic optimization problems. From the computational complexity point of view, these two types of methods seem to be complimentary to each other. Cutting plane type methods can handle multistage problems with a large number of stages, but a relatively smaller number of state (decision) variables. On the other hand, stochastic approximation type methods can only deal with a small number of stages, but a large number of decision variables.
    Modelling Determinants of Cryptocurrency Prices: A Bayesian Network Approach. (arXiv:2303.16148v1 [q-fin.ST])
    The growth of market capitalisation and the number of altcoins (cryptocurrencies other than Bitcoin) provide investment opportunities and complicate the prediction of their price movements. A significant challenge in this volatile and relatively immature market is the problem of predicting cryptocurrency prices which needs to identify the factors influencing these prices. The focus of this study is to investigate the factors influencing altcoin prices, and these factors have been investigated from a causal analysis perspective using Bayesian networks. In particular, studying the nature of interactions between five leading altcoins, traditional financial assets including gold, oil, and S\&P 500, and social media is the research question. To provide an answer to the question, we create causal networks which are built from the historic price data of five traditional financial assets, social media data, and price data of altcoins. The ensuing networks are used for causal reasoning and diagnosis, and the results indicate that social media (in particular Twitter data in this study) is the most significant influencing factor of the prices of altcoins. Furthermore, it is not possible to generalise the coins' reactions against the changes in the factors. Consequently, the coins need to be studied separately for a particular price movement investigation.
    TOFA: Transfer-Once-for-All. (arXiv:2303.15485v1 [cs.LG])
    Weight-sharing neural architecture search aims to optimize a configurable neural network model (supernet) for a variety of deployment scenarios across many devices with different resource constraints. Existing approaches use evolutionary search to extract a number of models from a supernet trained on a very large data set, and then fine-tune the extracted models on the typically small, real-world data set of interest. The computational cost of training thus grows linearly with the number of different model deployment scenarios. Hence, we propose Transfer-Once-For-All (TOFA) for supernet-style training on small data sets with constant computational training cost over any number of edge deployment scenarios. Given a task, TOFA obtains custom neural networks, both the topology and the weights, optimized for any number of edge deployment scenarios. To overcome the challenges arising from small data, TOFA utilizes a unified semi-supervised training loss to simultaneously train all subnets within the supernet, coupled with on-the-fly architecture selection at deployment time.
    Scaling Multi-Objective Security Games Provably via Space Discretization Based Evolutionary Search. (arXiv:2303.15821v1 [cs.LG])
    In the field of security, multi-objective security games (MOSGs) allow defenders to simultaneously protect targets from multiple heterogeneous attackers. MOSGs aim to simultaneously maximize all the heterogeneous payoffs, e.g., life, money, and crime rate, without merging heterogeneous attackers. In real-world scenarios, the number of heterogeneous attackers and targets to be protected may exceed the capability of most existing state-of-the-art methods, i.e., MOSGs are limited by the issue of scalability. To this end, this paper proposes a general framework called SDES based on many-objective evolutionary search to scale up MOSGs to large-scale targets and heterogeneous attackers. SDES consists of four consecutive key components, i.e., discretization, optimization, restoration and evaluation, and refinement. Specifically, SDES first discretizes the originally high-dimensional continuous solution space to the low-dimensional discrete one by the maximal indifference property in game theory. This property helps evolutionary algorithms (EAs) bypass the high-dimensional step function and ensure a well-convergent Pareto front. Then, a many-objective EA is used for optimization in the low-dimensional discrete solution space to obtain a well-spaced Pareto front. To evaluate solutions, SDES restores solutions back to the original space via bit-wisely optimizing a novel solution divergence. Finally, the refinement in SDES boosts the optimization performance with acceptable cost. Theoretically, we prove the optimization consistency and convergence of SDES. Experiment results show that SDES is the first linear-time MOSG algorithm for both large-scale attackers and targets. SDES is able to solve up to 20 attackers and 100 targets MOSG problems, while the state-of-the-art methods can only solve up to 8 attackers and 25 targets ones. Ablation study verifies the necessity of all components in SDES.
    Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise. (arXiv:2303.15740v1 [cs.LG])
    In this work, we study the concentration behavior of a stochastic approximation (SA) algorithm under a contractive operator with respect to an arbitrary norm. We consider two settings where the iterates are potentially unbounded: (1) bounded multiplicative noise, and (2) additive sub-Gaussian noise. We obtain maximal concentration inequalities on the convergence errors, and show that these errors have sub-Gaussian tails in the additive noise setting, and super-polynomial tails (faster than polynomial decay) in the multiplicative noise setting. In addition, we provide an impossibility result showing that it is in general not possible to achieve sub-exponential tails for SA with multiplicative noise. To establish these results, we develop a novel bootstrapping argument that involves bounding the moment generating function of the generalized Moreau envelope of the error and the construction of an exponential supermartingale to enable using Ville's maximal inequality. To demonstrate the applicability of our theoretical results, we use them to provide maximal concentration bounds for a large class of reinforcement learning algorithms, including but not limited to on-policy TD-learning with linear function approximation, off-policy TD-learning with generalized importance sampling factors, and $Q$-learning. To the best of our knowledge, super-polynomial concentration bounds for off-policy TD-learning have not been established in the literature due to the challenge of handling the combination of unbounded iterates and multiplicative noise.
    Mathematical Challenges in Deep Learning. (arXiv:2303.15464v1 [cs.LG])
    Deep models are dominating the artificial intelligence (AI) industry since the ImageNet challenge in 2012. The size of deep models is increasing ever since, which brings new challenges to this field with applications in cell phones, personal computers, autonomous cars, and wireless base stations. Here we list a set of problems, ranging from training, inference, generalization bound, and optimization with some formalism to communicate these challenges with mathematicians, statisticians, and theoretical computer scientists. This is a subjective view of the research questions in deep learning that benefits the tech industry in long run.
    Online Non-Destructive Moisture Content Estimation of Filter Media During Drying Using Artificial Neural Networks. (arXiv:2303.15570v1 [cs.LG])
    Moisture content (MC) estimation is important in the manufacturing process of drying bulky filter media products as it is the prerequisite for drying optimization. In this study, a dataset collected by performing 161 drying industrial experiments is described and a methodology for MC estimation in an non-destructive and online manner during industrial drying is presented. An artificial neural network (ANN) based method is compared to state-of-the-art MC estimation methods reported in the literature. Results of model fitting and training show that a three-layer Perceptron achieves the lowest error. Experimental results show that ANNs combined with oven settings data, drying time and product temperature can be used to reliably estimate the MC of bulky filter media products.
    Efficient Quality Diversity Optimization of 3D Buildings through 2D Pre-optimization. (arXiv:2303.15896v1 [cs.NE])
    Quality diversity algorithms can be used to efficiently create a diverse set of solutions to inform engineers' intuition. But quality diversity is not efficient in very expensive problems, needing 100.000s of evaluations. Even with the assistance of surrogate models, quality diversity needs 100s or even 1000s of evaluations, which can make it use infeasible. In this study we try to tackle this problem by using a pre-optimization strategy on a lower-dimensional optimization problem and then map the solutions to a higher-dimensional case. For a use case to design buildings that minimize wind nuisance, we show that we can predict flow features around 3D buildings from 2D flow features around building footprints. For a diverse set of building designs, by sampling the space of 2D footprints with a quality diversity algorithm, a predictive model can be trained that is more accurate than when trained on a set of footprints that were selected with a space-filling algorithm like the Sobol sequence. Simulating only 16 buildings in 3D, a set of 1024 building designs with low predicted wind nuisance is created. We show that we can produce better machine learning models by producing training data with quality diversity instead of using common sampling techniques. The method can bootstrap generative design in a computationally expensive 3D domain and allow engineers to sweep the design space, understanding wind nuisance in early design phases.
    Randomly Initialized Subnetworks with Iterative Weight Recycling. (arXiv:2303.15953v1 [cs.LG])
    The Multi-Prize Lottery Ticket Hypothesis posits that randomly initialized neural networks contain several subnetworks that achieve comparable accuracy to fully trained models of the same architecture. However, current methods require that the network is sufficiently overparameterized. In this work, we propose a modification to two state-of-the-art algorithms (Edge-Popup and Biprop) that finds high-accuracy subnetworks with no additional storage cost or scaling. The algorithm, Iterative Weight Recycling, identifies subsets of important weights within a randomly initialized network for intra-layer reuse. Empirically we show improvements on smaller network architectures and higher prune rates, finding that model sparsity can be increased through the "recycling" of existing weights. In addition to Iterative Weight Recycling, we complement the Multi-Prize Lottery Ticket Hypothesis with a reciprocal finding: high-accuracy, randomly initialized subnetwork's produce diverse masks, despite being generated with the same hyperparameter's and pruning strategy. We explore the landscapes of these masks, which show high variability.
    Exploring the Performance of Pruning Methods in Neural Networks: An Empirical Study of the Lottery Ticket Hypothesis. (arXiv:2303.15479v1 [cs.LG])
    In this paper, we explore the performance of different pruning methods in the context of the lottery ticket hypothesis. We compare the performance of L1 unstructured pruning, Fisher pruning, and random pruning on different network architectures and pruning scenarios. The experiments include an evaluation of one-shot and iterative pruning, an examination of weight movement in the network during pruning, a comparison of the pruning methods on networks of varying widths, and an analysis of the performance of the methods when the network becomes very sparse. Additionally, we propose and evaluate a new method for efficient computation of Fisher pruning, known as batched Fisher pruning.
    Edge Selection and Clustering for Federated Learning in Optical Inter-LEO Satellite Constellation. (arXiv:2303.16071v1 [cs.LG])
    Low-Earth orbit (LEO) satellites have been prosperously deployed for various Earth observation missions due to its capability of collecting a large amount of image or sensor data. However, traditionally, the data training process is performed in the terrestrial cloud server, which leads to a high transmission overhead. With the recent development of LEO, it is more imperative to provide ultra-dense LEO constellation with enhanced on-board computation capability. Benefited from it, we have proposed a collaborative federated learning over LEO satellite constellation (FedLEO). We allocate the entire process on LEOs with low payload inter-satellite transmissions, whilst the low-delay terrestrial gateway server (GS) only takes care for initial signal controlling. The GS initially selects an LEO server, whereas its LEO clients are all determined by clustering mechanism and communication capability through the optical inter-satellite links (ISLs). The re-clustering of changing LEO server will be executed once with low communication quality of FedLEO. In the simulations, we have numerically analyzed the proposed FedLEO under practical Walker-based LEO constellation configurations along with MNIST training dataset for classification mission. The proposed FedLEO outperforms the conventional centralized and distributed architectures with higher classification accuracy as well as comparably lower latency of joint communication and computing.
    Feature Engineering Methods on Multivariate Time-Series Data for Financial Data Science Competitions. (arXiv:2303.16117v1 [q-fin.ST])
    We apply different feature engineering methods for time-series to US market price data. The predictive power of models are tested against Numerai-Signals targets.
    Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning with Hierarchical Aggregation. (arXiv:2303.15486v1 [cs.LG])
    Multimodal learning has seen great success mining data features from multiple modalities with remarkable model performance improvement. Meanwhile, federated learning (FL) addresses the data sharing problem, enabling privacy-preserved collaborative training to provide sufficient precious data. Great potential, therefore, arises with the confluence of them, known as multimodal federated learning. However, limitation lies in the predominant approaches as they often assume that each local dataset records samples from all modalities. In this paper, we aim to bridge this gap by proposing an Unimodal Training - Multimodal Prediction (UTMP) framework under the context of multimodal federated learning. We design HA-Fedformer, a novel transformer-based model that empowers unimodal training with only a unimodal dataset at the client and multimodal testing by aggregating multiple clients' knowledge for better accuracy. The key advantages are twofold. Firstly, to alleviate the impact of data non-IID, we develop an uncertainty-aware aggregation method for the local encoders with layer-wise Markov Chain Monte Carlo sampling. Secondly, to overcome the challenge of unaligned language sequence, we implement a cross-modal decoder aggregation to capture the hidden signal correlation between decoders trained by data from different modalities. Our experiments on popular sentiment analysis benchmarks, CMU-MOSI and CMU-MOSEI, demonstrate that HA-Fedformer significantly outperforms state-of-the-art multimodal models under the UTMP federated learning frameworks, with 15%-20% improvement on most attributes.
    qEUBO: A Decision-Theoretic Acquisition Function for Preferential Bayesian Optimization. (arXiv:2303.15746v1 [cs.LG])
    Preferential Bayesian optimization (PBO) is a framework for optimizing a decision maker's latent utility function using preference feedback. This work introduces the expected utility of the best option (qEUBO) as a novel acquisition function for PBO. When the decision maker's responses are noise-free, we show that qEUBO is one-step Bayes optimal and thus equivalent to the popular knowledge gradient acquisition function. We also show that qEUBO enjoys an additive constant approximation guarantee to the one-step Bayes-optimal policy when the decision maker's responses are corrupted by noise. We provide an extensive evaluation of qEUBO and demonstrate that it outperforms the state-of-the-art acquisition functions for PBO across many settings. Finally, we show that, under sufficient regularity conditions, qEUBO's Bayesian simple regret converges to zero at a rate $o(1/n)$ as the number of queries, $n$, goes to infinity. In contrast, we show that simple regret under qEI, a popular acquisition function for standard BO often used for PBO, can fail to converge to zero. Enjoying superior performance, simple computation, and a grounded decision-theoretic justification, qEUBO is a promising acquisition function for PBO.
    TraffNet: Learning Causality of Traffic Generation for Road Network Digital Twins. (arXiv:2303.15954v1 [cs.LG])
    Road network digital twins (RNDTs) play a critical role in the development of next-generation intelligent transportation systems, enabling more precise traffic planning and control. To support just-in-time (JIT) decision making, RNDTs require a model that dynamically learns the traffic patterns from online sensor data and generates high-fidelity simulation results. Although current traffic prediction techniques based on graph neural networks have achieved state-of-the-art performance, these techniques only predict future traffic by mining correlations in historical traffic data, disregarding the causes of traffic generation, such as traffic demands and route selection. Therefore, their performance is unreliable for JIT decision making. To fill this gap, we introduce a novel deep learning framework called TraffNet that learns the causality of traffic volume from vehicle trajectory data. First, we use a heterogeneous graph to represent the road network, allowing the model to incorporate causal features of traffic volumes. Next, motivated by the traffic domain knowledge, we propose a traffic causality learning method to learn an embedding vector that encodes travel demands and path-level dependencies for each road segment. Then, we model temporal dependencies to match the underlying process of traffic generation. Finally, the experiments verify the utility of TraffNet. The code of TraffNet is available at https://github.com/mayunyi-1999/TraffNet_code.git.
    Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models. (arXiv:2303.16133v1 [cs.CV])
    As general purpose vision models get increasingly effective at a wide set of tasks, it is imperative that they be consistent across the tasks they support. Inconsistent AI models are considered brittle and untrustworthy by human users and are more challenging to incorporate into larger systems that take dependencies on their outputs. Measuring consistency between very heterogeneous tasks that might include outputs in different modalities is challenging since it is difficult to determine if the predictions are consistent with one another. As a solution, we introduce a benchmark dataset, COCOCON, where we use contrast sets created by modifying test instances for multiple tasks in small but semantically meaningful ways to change the gold label, and outline metrics for measuring if a model is consistent by ranking the original and perturbed instances across tasks. We find that state-of-the-art systems suffer from a surprisingly high degree of inconsistent behavior across tasks, especially for more heterogeneous tasks. Finally, we propose using a rank correlation-based auxiliary objective computed over large automatically created cross-task contrast sets to improve the multi-task consistency of large unified models, while retaining their original accuracy on downstream tasks. Project website available at https://adymaharana.github.io/cococon/
    Robust PAC$^m$: Training Ensemble Models Under Model Misspecification and Outliers. (arXiv:2203.01859v2 [cs.LG] UPDATED)
    Standard Bayesian learning is known to have suboptimal generalization capabilities under model misspecification and in the presence of outliers. PAC-Bayes theory demonstrates that the free energy criterion minimized by Bayesian learning is a bound on the generalization error for Gibbs predictors (i.e., for single models drawn at random from the posterior) under the assumption of sampling distributions uncontaminated by outliers. This viewpoint provides a justification for the limitations of Bayesian learning when the model is misspecified, requiring ensembling, and when data is affected by outliers. In recent work, PAC-Bayes bounds - referred to as PAC$^m$ - were derived to introduce free energy metrics that account for the performance of ensemble predictors, obtaining enhanced performance under misspecification. This work presents a novel robust free energy criterion that combines the generalized logarithm score function with PAC$^m$ ensemble bounds. The proposed free energy training criterion produces predictive distributions that are able to concurrently counteract the detrimental effects of model misspecification and outliers.
    SFHarmony: Source Free Domain Adaptation for Distributed Neuroimaging Analysis. (arXiv:2303.15965v1 [cs.CV])
    To represent the biological variability of clinical neuroimaging populations, it is vital to be able to combine data across scanners and studies. However, different MRI scanners produce images with different characteristics, resulting in a domain shift known as the `harmonisation problem'. Additionally, neuroimaging data is inherently personal in nature, leading to data privacy concerns when sharing the data. To overcome these barriers, we propose an Unsupervised Source-Free Domain Adaptation (SFDA) method, SFHarmony. Through modelling the imaging features as a Gaussian Mixture Model and minimising an adapted Bhattacharyya distance between the source and target features, we can create a model that performs well for the target data whilst having a shared feature representation across the data domains, without needing access to the source data for adaptation or target labels. We demonstrate the performance of our method on simulated and real domain shifts, showing that the approach is applicable to classification, segmentation and regression tasks, requiring no changes to the algorithm. Our method outperforms existing SFDA approaches across a range of realistic data scenarios, demonstrating the potential utility of our approach for MRI harmonisation and general SFDA problems. Our code is available at \url{https://github.com/nkdinsdale/SFHarmony}.
    Graph Neural Networks for Power Allocation in Wireless Networks with Full Duplex Nodes. (arXiv:2303.16113v1 [cs.NI])
    Due to mutual interference between users, power allocation problems in wireless networks are often non-convex and computationally challenging. Graph neural networks (GNNs) have recently emerged as a promising approach to tackling these problems and an approach that exploits the underlying topology of wireless networks. In this paper, we propose a novel graph representation method for wireless networks that include full-duplex (FD) nodes. We then design a corresponding FD Graph Neural Network (F-GNN) with the aim of allocating transmit powers to maximise the network throughput. Our results show that our F-GNN achieves state-of-art performance with significantly less computation time. Besides, F-GNN offers an excellent trade-off between performance and complexity compared to classical approaches. We further refine this trade-off by introducing a distance-based threshold for inclusion or exclusion of edges in the network. We show that an appropriately chosen threshold reduces required training time by roughly 20% with a relatively minor loss in performance.
    A likelihood approach to nonparametric estimation of a singular distribution using deep generative models. (arXiv:2105.04046v3 [stat.ML] UPDATED)
    We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure, such as a low-dimensional manifold, is challenging due to its singularity with respect to the Lebesgue measure in the ambient space. In the considered model, a usual likelihood approach can fail to estimate the target distribution consistently due to the singularity. We prove that a novel and effective solution exists by perturbing the data with an instance noise, which leads to consistent estimation of the underlying distribution with desirable convergence rates. We also characterize the class of distributions that can be efficiently estimated via deep generative models. This class is sufficiently general to contain various structured distributions such as product distributions, classically smooth distributions and distributions supported on a low-dimensional manifold. Our analysis provides some insights on how deep generative models can avoid the curse of dimensionality for nonparametric distribution estimation. We conduct a thorough simulation study and real data analysis to empirically demonstrate that the proposed data perturbation technique improves the estimation performance significantly.
    Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder. (arXiv:2303.15564v1 [cs.LG])
    Deep neural networks are vulnerable to backdoor attacks, where an adversary maliciously manipulates the model behavior through overlaying images with special triggers. Existing backdoor defense methods often require accessing a few validation data and model parameters, which are impractical in many real-world applications, e.g., when the model is provided as a cloud service. In this paper, we address the practical task of blind backdoor defense at test time, in particular for black-box models. The true label of every test image needs to be recovered on the fly from the hard label predictions of a suspicious model. The heuristic trigger search in image space, however, is not scalable to complex triggers or high image resolution. We circumvent such barrier by leveraging generic image generation models, and propose a framework of Blind Defense with Masked AutoEncoder (BDMAE). It uses the image structural similarity and label consistency between the test image and MAE restorations to detect possible triggers. The detection result is refined by considering the topology of triggers. We obtain a purified test image from restorations for making prediction. Our approach is blind to the model architectures, trigger patterns or image benignity. Extensive experiments on multiple datasets with different backdoor attacks validate its effectiveness and generalizability. Code is available at https://github.com/tsun/BDMAE.
    Adjusted Wasserstein Distributionally Robust Estimator in Statistical Learning. (arXiv:2303.15579v1 [stat.ML])
    We propose an adjusted Wasserstein distributionally robust estimator -- based on a nonlinear transformation of the Wasserstein distributionally robust (WDRO) estimator in statistical learning. This transformation will improve the statistical performance of WDRO because the adjusted WDRO estimator is asymptotically unbiased and has an asymptotically smaller mean squared error. The adjusted WDRO will not mitigate the out-of-sample performance guarantee of WDRO. Sufficient conditions for the existence of the adjusted WDRO estimator are presented, and the procedure for the computation of the adjusted WDRO estimator is given. Specifically, we will show how the adjusted WDRO estimator is developed in the generalized linear model. Numerical experiments demonstrate the favorable practical performance of the adjusted estimator over the classic one.
    Online Learning for Incentive-Based Demand Response. (arXiv:2303.15617v1 [cs.LG])
    In this paper, we consider the problem of learning online to manage Demand Response (DR) resources. A typical DR mechanism requires the DR manager to assign a baseline to the participating consumer, where the baseline is an estimate of the counterfactual consumption of the consumer had it not been called to provide the DR service. A challenge in estimating baseline is the incentive the consumer has to inflate the baseline estimate. We consider the problem of learning online to estimate the baseline and to optimize the operating costs over a period of time under such incentives. We propose an online learning scheme that employs least-squares for estimation with a perturbation to the reward price (for the DR services or load curtailment) that is designed to balance the exploration and exploitation trade-off that arises with online learning. We show that, our proposed scheme is able to achieve a very low regret of $\mathcal{O}\left((\log{T})^2\right)$ with respect to the optimal operating cost over $T$ days of the DR program with full knowledge of the baseline, and is individually rational for the consumers to participate. Our scheme is significantly better than the averaging type approach, which only fetches $\mathcal{O}(T^{1/3})$ regret.
    Multi-Flow Transmission in Wireless Interference Networks: A Convergent Graph Learning Approach. (arXiv:2303.15544v1 [cs.LG])
    We consider the problem of of multi-flow transmission in wireless networks, where data signals from different flows can interfere with each other due to mutual interference between links along their routes, resulting in reduced link capacities. The objective is to develop a multi-flow transmission strategy that routes flows across the wireless interference network to maximize the network utility. However, obtaining an optimal solution is computationally expensive due to the large state and action spaces involved. To tackle this challenge, we introduce a novel algorithm called Dual-stage Interference-Aware Multi-flow Optimization of Network Data-signals (DIAMOND). The design of DIAMOND allows for a hybrid centralized-distributed implementation, which is a characteristic of 5G and beyond technologies with centralized unit deployments. A centralized stage computes the multi-flow transmission strategy using a novel design of graph neural network (GNN) reinforcement learning (RL) routing agent. Then, a distributed stage improves the performance based on a novel design of distributed learning updates. We provide a theoretical analysis of DIAMOND and prove that it converges to the optimal multi-flow transmission strategy as time increases. We also present extensive simulation results over various network topologies (random deployment, NSFNET, GEANT2), demonstrating the superior performance of DIAMOND compared to existing methods.
    Core-Periphery Principle Guided Redesign of Self-Attention in Transformers. (arXiv:2303.15569v1 [cs.LG])
    Designing more efficient, reliable, and explainable neural network architectures is critical to studies that are based on artificial intelligence (AI) techniques. Previous studies, by post-hoc analysis, have found that the best-performing ANNs surprisingly resemble biological neural networks (BNN), which indicates that ANNs and BNNs may share some common principles to achieve optimal performance in either machine learning or cognitive/behavior tasks. Inspired by this phenomenon, we proactively instill organizational principles of BNNs to guide the redesign of ANNs. We leverage the Core-Periphery (CP) organization, which is widely found in human brain networks, to guide the information communication mechanism in the self-attention of vision transformer (ViT) and name this novel framework as CP-ViT. In CP-ViT, the attention operation between nodes is defined by a sparse graph with a Core-Periphery structure (CP graph), where the core nodes are redesigned and reorganized to play an integrative role and serve as a center for other periphery nodes to exchange information. We evaluated the proposed CP-ViT on multiple public datasets, including medical image datasets (INbreast) and natural image datasets. Interestingly, by incorporating the BNN-derived principle (CP structure) into the redesign of ViT, our CP-ViT outperforms other state-of-the-art ANNs. In general, our work advances the state of the art in three aspects: 1) This work provides novel insights for brain-inspired AI: we can utilize the principles found in BNNs to guide and improve our ANN architecture design; 2) We show that there exist sweet spots of CP graphs that lead to CP-ViTs with significantly improved performance; and 3) The core nodes in CP-ViT correspond to task-related meaningful and important image patches, which can significantly enhance the interpretability of the trained deep model.
    A Framework for Demonstrating Practical Quantum Advantage: Racing Quantum against Classical Generative Models. (arXiv:2303.15626v1 [quant-ph])
    Generative modeling has seen a rising interest in both classical and quantum machine learning, and it represents a promising candidate to obtain a practical quantum advantage in the near term. In this study, we build over a proposed framework for evaluating the generalization performance of generative models, and we establish the first quantitative comparative race towards practical quantum advantage (PQA) between classical and quantum generative models, namely Quantum Circuit Born Machines (QCBMs), Transformers (TFs), Recurrent Neural Networks (RNNs), Variational Autoencoders (VAEs), and Wasserstein Generative Adversarial Networks (WGANs). After defining four types of PQAs scenarios, we focus on what we refer to as potential PQA, aiming to compare quantum models with the best-known classical algorithms for the task at hand. We let the models race on a well-defined and application-relevant competition setting, where we illustrate and demonstrate our framework on 20 variables (qubits) generative modeling task. Our results suggest that QCBMs are more efficient in the data-limited regime than the other state-of-the-art classical generative models. Such a feature is highly desirable in a wide range of real-world applications where the available data is scarce.
    Beyond Accuracy: A Critical Review of Fairness in Machine Learning for Mobile and Wearable Computing. (arXiv:2303.15585v1 [cs.CY])
    The field of mobile, wearable, and ubiquitous computing (UbiComp) is undergoing a revolutionary integration of machine learning. Devices can now diagnose diseases, predict heart irregularities, and unlock the full potential of human cognition. However, the underlying algorithms are not immune to biases with respect to sensitive attributes (e.g., gender, race), leading to discriminatory outcomes. The research communities of HCI and AI-Ethics have recently started to explore ways of reporting information about datasets to surface and, eventually, counter those biases. The goal of this work is to explore the extent to which the UbiComp community has adopted such ways of reporting and highlight potential shortcomings. Through a systematic review of papers published in the Proceedings of the ACM Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) journal over the past 5 years (2018-2022), we found that progress on algorithmic fairness within the UbiComp community lags behind. Our findings show that only a small portion (5%) of published papers adheres to modern fairness reporting, while the overwhelming majority thereof focuses on accuracy or error metrics. In light of these findings, our work provides practical guidelines for the design and development of ubiquitous technologies that not only strive for accuracy but also for fairness.
    Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model. (arXiv:2303.15652v1 [cs.LG])
    We consider dynamic pricing strategies in a streamed longitudinal data set-up where the objective is to maximize, over time, the cumulative profit across a large number of customer segments. We consider a dynamic probit model with the consumers' preferences as well as price sensitivity varying over time. Building on the well-known finding that consumers sharing similar characteristics act in similar ways, we consider a global shrinkage structure, which assumes that the consumers' preferences across the different segments can be well approximated by a spatial autoregressive (SAR) model. In such a streamed longitudinal set-up, we measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. We propose a pricing policy based on penalized stochastic gradient descent (PSGD) and explicitly characterize its regret as functions of time, the temporal variability in the model parameters as well as the strength of the auto-correlation network structure spanning the varied customer segments. Our regret analysis results not only demonstrate asymptotic optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information as policies based on unshrunken models are highly sub-optimal in the aforementioned set-up.
    A Novel Neural Network Approach for Predicting the Arrival Time of Buses for Smart On-Demand Public Transit. (arXiv:2303.15495v1 [cs.LG])
    Among the major public transportation systems in cities, bus transit has its problems, including more accuracy and reliability when estimating the bus arrival time for riders. This can lead to delays and decreased ridership, especially in cities where public transportation is heavily relied upon. A common issue is that the arrival times of buses do not match the schedules, resulting in latency for fixed schedules. According to the study in this paper on New York City bus data, there is an average delay of around eight minutes or 491 seconds mismatch between the bus arrivals and the actual scheduled time. This research paper presents a novel AI-based data-driven approach for estimating the arrival times of buses at each transit point (station). Our approach is based on a fully connected neural network and can predict the arrival time collectively across all bus lines in large metropolitan areas. Our neural-net data-driven approach provides a new way to estimate the arrival time of the buses, which can lead to a more efficient and smarter way to bring the bus transit to the general public. Our evaluation of the network bus system with more than 200 bus lines, and 2 million data points, demonstrates less than 40 seconds of estimated error for arrival times. The inference time per each validation set data point is less than 0.006 ms.
    Multiphysics discovery with moving boundaries using Ensemble SINDy and Peridynamic Differential Operator. (arXiv:2303.15631v1 [cs.LG])
    This study proposes a novel framework for learning the underlying physics of phenomena with moving boundaries. The proposed approach combines Ensemble SINDy and Peridynamic Differential Operator (PDDO) and imposes an inductive bias assuming the moving boundary physics evolve in its own corotational coordinate system. The robustness of the approach is demonstrated by considering various levels of noise in the measured data using the 2D Fisher-Stefan model. The confidence intervals of recovered coefficients are listed, and the uncertainties of the moving boundary positions are depicted by obtaining the solutions with the recovered coefficients. Although the main focus of this study is the Fisher-Stefan model, the proposed approach is applicable to any type of moving boundary problem with a smooth moving boundary front without a mushy region. The code and data for this framework is available at: https://github.com/alicanbekar/MB_PDDO-SINDy.
    Multimodal and multicontrast image fusion via deep generative models. (arXiv:2303.15963v1 [cs.LG])
    Recently, it has become progressively more evident that classic diagnostic labels are unable to reliably describe the complexity and variability of several clinical phenotypes. This is particularly true for a broad range of neuropsychiatric illnesses (e.g., depression, anxiety disorders, behavioral phenotypes). Patient heterogeneity can be better described by grouping individuals into novel categories based on empirically derived sections of intersecting continua that span across and beyond traditional categorical borders. In this context, neuroimaging data carry a wealth of spatiotemporally resolved information about each patient's brain. However, they are usually heavily collapsed a priori through procedures which are not learned as part of model training, and consequently not optimized for the downstream prediction task. This is because every individual participant usually comes with multiple whole-brain 3D imaging modalities often accompanied by a deep genotypic and phenotypic characterization, hence posing formidable computational challenges. In this paper we design a deep learning architecture based on generative models rooted in a modular approach and separable convolutional blocks to a) fuse multiple 3D neuroimaging modalities on a voxel-wise level, b) convert them into informative latent embeddings through heavy dimensionality reduction, c) maintain good generalizability and minimal information loss. As proof of concept, we test our architecture on the well characterized Human Connectome Project database demonstrating that our latent embeddings can be clustered into easily separable subject strata which, in turn, map to different phenotypical information which was not included in the embedding creation process. This may be of aid in predicting disease evolution as well as drug response, hence supporting mechanistic disease understanding and empowering clinical trials.
    From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification. (arXiv:2303.15916v1 [cs.LG])
    Deep learning has proven to be successful in various domains and for different tasks. However, when it comes to private data several restrictions are making it difficult to use deep learning approaches in these application fields. Recent approaches try to generate data privately instead of applying a privacy-preserving mechanism directly, on top of the classifier. The solution is to create public data from private data in a manner that preserves the privacy of the data. In this work, two very prominent GAN-based architectures were evaluated in the context of private time series classification. In contrast to previous work, mostly limited to the image domain, the scope of this benchmark was the time series domain. The experiments show that especially GSWGAN performs well across a variety of public datasets outperforming the competitor DPWGAN. An analysis of the generated datasets further validates the superiority of GSWGAN in the context of time series generation.
    Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery. (arXiv:2303.15975v1 [cs.CV])
    Discovering novel concepts from unlabelled data and in a continuous manner is an important desideratum of lifelong learners. In the literature such problems have been partially addressed under very restricted settings, where either access to labelled data is provided for discovering novel concepts (e.g., NCD) or learning occurs for a limited number of incremental steps (e.g., class-iNCD). In this work we challenge the status quo and propose a more challenging and practical learning paradigm called MSc-iNCD, where learning occurs continuously and unsupervisedly, while exploiting the rich priors from large-scale pre-trained models. To this end, we propose simple baselines that are not only resilient under longer learning scenarios, but are surprisingly strong when compared with sophisticated state-of-the-art methods. We conduct extensive empirical evaluation on a multitude of benchmarks and show the effectiveness of our proposed baselines, which significantly raises the bar.
    Sequential training of GANs against GAN-classifiers reveals correlated "knowledge gaps" present among independently trained GAN instances. (arXiv:2303.15533v1 [cs.LG])
    Modern Generative Adversarial Networks (GANs) generate realistic images remarkably well. Previous work has demonstrated the feasibility of "GAN-classifiers" that are distinct from the co-trained discriminator, and operate on images generated from a frozen GAN. That such classifiers work at all affirms the existence of "knowledge gaps" (out-of-distribution artifacts across samples) present in GAN training. We iteratively train GAN-classifiers and train GANs that "fool" the classifiers (in an attempt to fill the knowledge gaps), and examine the effect on GAN training dynamics, output quality, and GAN-classifier generalization. We investigate two settings, a small DCGAN architecture trained on low dimensional images (MNIST), and StyleGAN2, a SOTA GAN architecture trained on high dimensional images (FFHQ). We find that the DCGAN is unable to effectively fool a held-out GAN-classifier without compromising the output quality. However, StyleGAN2 can fool held-out classifiers with no change in output quality, and this effect persists over multiple rounds of GAN/classifier training which appears to reveal an ordering over optima in the generator parameter space. Finally, we study different classifier architectures and show that the architecture of the GAN-classifier has a strong influence on the set of its learned artifacts.
    A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics. (arXiv:2303.15474v1 [cs.LG])
    This paper proposes a special-purpose system to achieve high-accuracy and high-efficiency machine learning (ML) molecular dynamics (MD) calculations. The system consists of field programmable gate array (FPGA) and application specific integrated circuit (ASIC) working in heterogeneous parallelization. To be specific, a multiplication-less neural network (NN) is deployed on the non-von Neumann (NvN)-based ASIC (SilTerra 180 nm process) to evaluate atomic forces, which is the most computationally expensive part of MD. All other calculations of MD are done using FPGA (Xilinx XC7Z100). It is shown that, to achieve similar-level accuracy, the proposed NvN-based system based on low-end fabrication technologies (180 nm) is 1.6x faster and 10^2-10^3x more energy efficiency than state-of-the-art vN based MLMD using graphics processing units (GPUs) based on much more advanced technologies (12 nm), indicating superiority of the proposed NvN-based heterogeneous parallel architecture.
    Exactly mergeable summaries. (arXiv:2303.15465v1 [cs.LG])
    In the analysis of large/big data sets, aggregation (replacing values of a variable over a group by a single value) is a standard way of reducing the size (complexity) of the data. Data analysis programs provide different aggregation functions. Recently some books dealing with the theoretical and algorithmic background of traditional aggregation functions were published. A problem with traditional aggregation is that often too much information is discarded thus reducing the precision of the obtained results. A much better, preserving more information, summarization of original data can be achieved by representing aggregated data using selected types of complex data. In complex data analysis the measured values over a selected group $A$ are aggregated into a complex object $\Sigma(A)$ and not into a single value. Most of the aggregation functions theory does not apply directly. In our contribution, we present an attempt to start building a theoretical background of complex aggregation. We introduce and discuss exactly mergeable summaries for which it holds for merging of disjoint sets of units \[ \Sigma(A \cup B) = F( \Sigma(A),\Sigma(B)),\qquad \mbox{ for } \quad A\cap B = \emptyset .\]
    Railway Network Delay Evolution: A Heterogeneous Graph Neural Network Approach. (arXiv:2303.15489v1 [cs.LG])
    Railway operations involve different types of entities (stations, trains, etc.), making the existing graph/network models with homogenous nodes (i.e., the same kind of nodes) incapable of capturing the interactions between the entities. This paper aims to develop a heterogeneous graph neural network (HetGNN) model, which can address different types of nodes (i.e., heterogeneous nodes), to investigate the train delay evolution on railway networks. To this end, a graph architecture combining the HetGNN model and the GraphSAGE homogeneous GNN (HomoGNN), called SAGE-Het, is proposed. The aim is to capture the interactions between trains, trains and stations, and stations and other stations on delay evolution based on different edges. In contrast to the traditional methods that require the inputs to have constant dimensions (e.g., in rectangular or grid-like arrays) or only allow homogeneous nodes in the graph, SAGE-Het allows for flexible inputs and heterogeneous nodes. The data from two sub-networks of the China railway network are applied to test the performance and robustness of the proposed SAGE-Het model. The experimental results show that SAGE-Het exhibits better performance than the existing delay prediction methods and some advanced HetGNNs used for other prediction tasks; the predictive performances of SAGE-Het under different prediction time horizons (10/20/30 min ahead) all outperform other baseline methods; Specifically, the influences of train interactions on delay propagation are investigated based on the proposed model. The results show that train interactions become subtle when the train headways increase . This finding directly contributes to decision-making in the situation where conflict-resolution or train-canceling actions are needed.
    Neuronal diversity can improve machine learning for physics and beyond. (arXiv:2204.04348v2 [q-bio.NC] UPDATED)
    Diversity conveys advantages in nature, yet homogeneous neurons typically comprise the layers of artificial neural networks. Here we construct neural networks from neurons that learn their own activation functions, quickly diversify, and subsequently outperform their homogeneous counterparts on image classification and nonlinear regression tasks. Sub-networks instantiate the neurons, which meta-learn especially efficient sets of nonlinear responses. Examples include conventional neural networks classifying digits and forecasting a van der Pol oscillator and a physics-informed Hamiltonian neural network learning H\'enon-Heiles orbits. Such learned diversity provides examples of dynamical systems selecting diversity over uniformity and elucidates the role of diversity in natural and artificial systems.
    Adaptive Riemannian Metrics on SPD Manifolds. (arXiv:2303.15477v1 [cs.LG])
    Symmetric Positive Definite (SPD) matrices have received wide attention in machine learning due to their intrinsic capacity of encoding underlying structural correlation in data. To reflect the non-Euclidean geometry of SPD manifolds, many successful Riemannian metrics have been proposed. However, existing fixed metric tensors might lead to sub-optimal performance for SPD matrices learning, especially for SPD neural networks. To remedy this limitation, we leverage the idea of pullback and propose adaptive Riemannian metrics for SPD manifolds. Moreover, we present comprehensive theories for our metrics. Experiments on three datasets demonstrate that equipped with the proposed metrics, SPD networks can exhibit superior performance.
    Deep Incomplete Multi-view Clustering with Cross-view Partial Sample and Prototype Alignment. (arXiv:2303.15689v1 [cs.LG])
    The success of existing multi-view clustering relies on the assumption of sample integrity across multiple views. However, in real-world scenarios, samples of multi-view are partially available due to data corruption or sensor failure, which leads to incomplete multi-view clustering study (IMVC). Although several attempts have been proposed to address IMVC, they suffer from the following drawbacks: i) Existing methods mainly adopt cross-view contrastive learning forcing the representations of each sample across views to be exactly the same, which might ignore view discrepancy and flexibility in representations; ii) Due to the absence of non-observed samples across multiple views, the obtained prototypes of clusters might be unaligned and biased, leading to incorrect fusion. To address the above issues, we propose a Cross-view Partial Sample and Prototype Alignment Network (CPSPAN) for Deep Incomplete Multi-view Clustering. Firstly, unlike existing contrastive-based methods, we adopt pair-observed data alignment as 'proxy supervised signals' to guide instance-to-instance correspondence construction among views. Then, regarding of the shifted prototypes in IMVC, we further propose a prototype alignment module to achieve incomplete distribution calibration across views. Extensive experimental results showcase the effectiveness of our proposed modules, attaining noteworthy performance improvements when compared to existing IMVC competitors on benchmark datasets.
    Large-scale pretraining on pathological images for fine-tuning of small pathological benchmarks. (arXiv:2303.15693v1 [cs.CV])
    Pretraining a deep learning model on large image datasets is a standard step before fine-tuning the model on small targeted datasets. The large dataset is usually general images (e.g. imagenet2012) while the small dataset can be specialized datasets that have different distributions from the large dataset. However, this 'large-to-small' strategy is not well-validated when the large dataset is specialized and has a similar distribution to small datasets. We newly compiled three hematoxylin and eosin-stained image datasets, one large (PTCGA200) and two magnification-adjusted small datasets (PCam200 and segPANDA200). Major deep learning models were trained with supervised and self-supervised learning methods and fine-tuned on the small datasets for tumor classification and tissue segmentation benchmarks. ResNet50 pretrained with MoCov2, SimCLR, and BYOL on PTCGA200 was better than imagenet2012 pretraining when fine-tuned on PTCGA200 (accuracy of 83.94%, 86.41%, 84.91%, and 82.72%, respectively). ResNet50 pre-trained on PTCGA200 with MoCov2 exceeded the COCOtrain2017-pretrained baseline and was the best in ResNet50 for the tissue segmentation benchmark (mIoU of 63.53% and 63.22%). We found re-training imagenet-pretrained models (ResNet50, BiT-M-R50x1, and ViT-S/16) on PTCGA200 improved downstream benchmarks.
    Regularize implicit neural representation by itself. (arXiv:2303.15484v1 [cs.LG])
    This paper proposes a regularizer called Implicit Neural Representation Regularizer (INRR) to improve the generalization ability of the Implicit Neural Representation (INR). The INR is a fully connected network that can represent signals with details not restricted by grid resolution. However, its generalization ability could be improved, especially with non-uniformly sampled data. The proposed INRR is based on learned Dirichlet Energy (DE) that measures similarities between rows/columns of the matrix. The smoothness of the Laplacian matrix is further integrated by parameterizing DE with a tiny INR. INRR improves the generalization of INR in signal representation by perfectly integrating the signal's self-similarity with the smoothness of the Laplacian matrix. Through well-designed numerical experiments, the paper also reveals a series of properties derived from INRR, including momentum methods like convergence trajectory and multi-scale similarity. Moreover, the proposed method could improve the performance of other signal representation methods.
    Multi-sensor large-scale dataset for multi-view 3D reconstruction. (arXiv:2203.06111v4 [cs.CV] UPDATED)
    We present a new multi-sensor dataset for multi-view 3D surface reconstruction. It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense, Microsoft Kinect, industrial cameras, and structured-light scanner. The scenes are selected to emphasize a diverse set of material properties challenging for existing algorithms. We provide around 1.4 million images of 107 different scenes acquired from 100 viewing directions under 14 lighting conditions. We expect our dataset will be useful for evaluation and training of 3D reconstruction algorithms and for related tasks. The dataset is available at skoltech3d.appliedai.tech.
    Efficient On-device Training via Gradient Filtering. (arXiv:2301.00330v2 [cs.CV] UPDATED)
    Despite its importance for federated learning, continuous learning and many other applications, on-device training remains an open problem for EdgeAI. The problem stems from the large number of operations (e.g., floating point multiplications and additions) and memory consumption required during training by the back-propagation algorithm. Consequently, in this paper, we propose a new gradient filtering approach which enables on-device CNN model training. More precisely, our approach creates a special structure with fewer unique elements in the gradient map, thus significantly reducing the computational complexity and memory consumption of back propagation during training. Extensive experiments on image classification and semantic segmentation with multiple CNN models (e.g., MobileNet, DeepLabV3, UPerNet) and devices (e.g., Raspberry Pi and Jetson Nano) demonstrate the effectiveness and wide applicability of our approach. For example, compared to SOTA, we achieve up to 19$\times$ speedup and 77.1% memory savings on ImageNet classification with only 0.1% accuracy loss. Finally, our method is easy to implement and deploy; over 20$\times$ speedup and 90% energy savings have been observed compared to highly optimized baselines in MKLDNN and CUDNN on NVIDIA Jetson Nano. Consequently, our approach opens up a new direction of research with a huge potential for on-device training.
    Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans. (arXiv:2209.13020v13 [cs.CY] UPDATED)
    We are currently unable to specify human goals and societal values in a way that reliably directs AI behavior. Law-making and legal interpretation form a computational engine that converts opaque human values into legible directives. "Law Informs Code" is the research agenda embedding legal knowledge and reasoning in AI. Similar to how parties to a legal contract cannot foresee every potential contingency of their future relationship, and legislators cannot predict all the circumstances under which their proposed bills will be applied, we cannot ex ante specify rules that provably direct good AI behavior. Legal theory and practice have developed arrays of tools to address these specification problems. For instance, legal standards allow humans to develop shared understandings and adapt them to novel situations. In contrast to more prosaic uses of the law (e.g., as a deterrent of bad behavior through the threat of sanction), leveraged as an expression of how humans communicate their goals, and what society values, Law Informs Code. We describe how data generated by legal processes (methods of law-making, statutory interpretation, contract drafting, applications of legal standards, legal reasoning, etc.) can facilitate the robust specification of inherently vague human goals. This increases human-AI alignment and the local usefulness of AI. Toward society-AI alignment, we present a framework for understanding law as the applied philosophy of multi-agent alignment. Although law is partly a reflection of historically contingent political power - and thus not a perfect aggregation of citizen preferences - if properly parsed, its distillation offers the most legitimate computational comprehension of societal values available. If law eventually informs powerful AI, engaging in the deliberative political process to improve law takes on even more meaning.
    GNN-based physics solver for time-independent PDEs. (arXiv:2303.15681v1 [cs.LG])
    Physics-based deep learning frameworks have shown to be effective in accurately modeling the dynamics of complex physical systems with generalization capability across problem inputs. However, time-independent problems pose the challenge of requiring long-range exchange of information across the computational domain for obtaining accurate predictions. In the context of graph neural networks (GNNs), this calls for deeper networks, which, in turn, may compromise or slow down the training process. In this work, we present two GNN architectures to overcome this challenge - the Edge Augmented GNN and the Multi-GNN. We show that both these networks perform significantly better (by a factor of 1.5 to 2) than baseline methods when applied to time-independent solid mechanics problems. Furthermore, the proposed architectures generalize well to unseen domains, boundary conditions, and materials. Here, the treatment of variable domains is facilitated by a novel coordinate transformation that enables rotation and translation invariance. By broadening the range of problems that neural operators based on graph neural networks can tackle, this paper provides the groundwork for their application to complex scientific and industrial settings.
    Privacy-preserving machine learning for healthcare: open challenges and future perspectives. (arXiv:2303.15563v1 [cs.LG])
    Machine Learning (ML) has recently shown tremendous success in modeling various healthcare prediction tasks, ranging from disease diagnosis and prognosis to patient treatment. Due to the sensitive nature of medical data, privacy must be considered along the entire ML pipeline, from model training to inference. In this paper, we conduct a review of recent literature concerning Privacy-Preserving Machine Learning (PPML) for healthcare. We primarily focus on privacy-preserving training and inference-as-a-service, and perform a comprehensive review of existing trends, identify challenges, and discuss opportunities for future research directions. The aim of this review is to guide the development of private and efficient ML models in healthcare, with the prospects of translating research efforts into real-world settings.
    Attention Boosted Autoencoder for Building Energy Anomaly Detection. (arXiv:2303.16097v1 [cs.LG])
    Leveraging data collected from smart meters in buildings can aid in developing policies towards energy conservation. Significant energy savings could be realised if deviations in the building operating conditions are detected early, and appropriate measures are taken. Towards this end, machine learning techniques can be used to automate the discovery of these abnormal patterns in the collected data. Current methods in anomaly detection rely on an underlying model to capture the usual or acceptable operating behaviour. In this paper, we propose a novel attention mechanism to model the consumption behaviour of a building and demonstrate the effectiveness of the model in capturing the relations using sample case studies. A real-world dataset is modelled using the proposed architecture, and the results are presented. A visualisation approach towards understanding the relations captured by the model is also presented.
    Diffusion Maps for Group-Invariant Manifolds. (arXiv:2303.16169v1 [cs.LG])
    In this article, we consider the manifold learning problem when the data set is invariant under the action of a compact Lie group $K$. Our approach consists in augmenting the data-induced graph Laplacian by integrating over orbits under the action of $K$ of the existing data points. We prove that this $K$-invariant Laplacian operator $L$ can be diagonalized by using the unitary irreducible representation matrices of $K$, and we provide an explicit formula for computing the eigenvalues and eigenvectors of $L$. Moreover, we show that the normalized Laplacian operator $L_N$ converges to the Laplace-Beltrami operator of the data manifold with an improved convergence rate, where the improvement grows with the dimension of the symmetry group $K$. This work extends the steerable graph Laplacian framework of Landa and Shkolnisky from the case of $\operatorname{SO}(2)$ to arbitrary compact Lie groups.
    On Feature Scaling of Recursive Feature Machines. (arXiv:2303.15745v1 [cs.LG])
    In this technical report, we explore the behavior of Recursive Feature Machines (RFMs), a type of novel kernel machine that recursively learns features via the average gradient outer product, through a series of experiments on regression datasets. When successively adding random noise features to a dataset, we observe intriguing patterns in the Mean Squared Error (MSE) curves with the test MSE exhibiting a decrease-increase-decrease pattern. This behavior is consistent across different dataset sizes, noise parameters, and target functions. Interestingly, the observed MSE curves show similarities to the "double descent" phenomenon observed in deep neural networks, hinting at new connection between RFMs and neural network behavior. This report lays the groundwork for future research into this peculiar behavior.
    GAS: A Gaussian Mixture Distribution-Based Adaptive Sampling Method for PINNs. (arXiv:2303.15849v1 [cs.LG])
    With recent study of the deep learning in scientific computation, the PINNs method has drawn widespread attention for solving PDEs. Compared with traditional methods, PINNs can efficiently handle high-dimensional problems, while the accuracy is relatively low, especially for highly irregular problems. Inspired by the idea of adaptive finite element methods and incremental learning, we propose GAS, a Gaussian mixture distribution-based adaptive sampling method for PINNs. During the training procedure, GAS uses the current residual information to generate a Gaussian mixture distribution for the sampling of additional points, which are then trained together with history data to speed up the convergence of loss and achieve a higher accuracy. Several numerical simulations on 2d to 10d problems show that GAS is a promising method which achieves the state-of-the-art accuracy among deep solvers, while being comparable with traditional numerical solvers.
    Learning Harmonic Molecular Representations on Riemannian Manifold. (arXiv:2303.15520v1 [cs.LG])
    Molecular representation learning plays a crucial role in AI-assisted drug discovery research. Encoding 3D molecular structures through Euclidean neural networks has become the prevailing method in the geometric deep learning community. However, the equivariance constraints and message passing in Euclidean space may limit the network expressive power. In this work, we propose a Harmonic Molecular Representation learning (HMR) framework, which represents a molecule using the Laplace-Beltrami eigenfunctions of its molecular surface. HMR offers a multi-resolution representation of molecular geometric and chemical features on 2D Riemannian manifold. We also introduce a harmonic message passing method to realize efficient spectral message passing over the surface manifold for better molecular encoding. Our proposed method shows comparable predictive power to current models in small molecule property prediction, and outperforms the state-of-the-art deep learning models for ligand-binding protein pocket classification and the rigid protein docking challenge, demonstrating its versatility in molecular representation learning.
    Boundary-to-Solution Mapping for Groundwater Flows in a Toth Basin. (arXiv:2303.15659v1 [physics.geo-ph])
    In this paper, the authors propose a new approach to solving the groundwater flow equation in the Toth basin of arbitrary top and bottom topographies using deep learning. Instead of using traditional numerical solvers, they use a DeepONet to produce the boundary-to-solution mapping. This mapping takes the geometry of the physical domain along with the boundary conditions as inputs to output the steady state solution of the groundwater flow equation. To implement the DeepONet, the authors approximate the top and bottom boundaries using truncated Fourier series or piecewise linear representations. They present two different implementations of the DeepONet: one where the Toth basin is embedded in a rectangular computational domain, and another where the Toth basin with arbitrary top and bottom boundaries is mapped into a rectangular computational domain via a nonlinear transformation. They implement the DeepONet with respect to the Dirichlet and Robin boundary condition at the top and the Neumann boundary condition at the impervious bottom boundary, respectively. Using this deep-learning enabled tool, the authors investigate the impact of surface topography on the flow pattern by both the top surface and the bottom impervious boundary with arbitrary geometries. They discover that the average slope of the top surface promotes long-distance transport, while the local curvature controls localized circulations. Additionally, they find that the slope of the bottom impervious boundary can seriously impact the long-distance transport of groundwater flows. Overall, this paper presents a new and innovative approach to solving the groundwater flow equation using deep learning, which allows for the investigation of the impact of surface topography on groundwater flow patterns.
    Bayesian Free Energy of Deep ReLU Neural Network in Overparametrized Cases. (arXiv:2303.15739v1 [cs.LG])
    In many research fields in artificial intelligence, it has been shown that deep neural networks are useful to estimate unknown functions on high dimensional input spaces. However, their generalization performance is not yet completely clarified from the theoretical point of view because they are nonidentifiable and singular learning machines. Moreover, a ReLU function is not differentiable, to which algebraic or analytic methods in singular learning theory cannot be applied. In this paper, we study a deep ReLU neural network in overparametrized cases and prove that the Bayesian free energy, which is equal to the minus log marginal likelihoodor the Bayesian stochastic complexity, is bounded even if the number of layers are larger than necessary to estimate an unknown data-generating function. Since the Bayesian generalization error is equal to the increase of the free energy as a function of a sample size, our result also shows that the Bayesian generalization error does not increase even if a deep ReLU neural network is designed to be sufficiently large or in an opeverparametrized state.
    Pre-training Transformers for Knowledge Graph Completion. (arXiv:2303.15682v1 [cs.CL])
    Learning transferable representation of knowledge graphs (KGs) is challenging due to the heterogeneous, multi-relational nature of graph structures. Inspired by Transformer-based pretrained language models' success on learning transferable representation for texts, we introduce a novel inductive KG representation model (iHT) for KG completion by large-scale pre-training. iHT consists of a entity encoder (e.g., BERT) and a neighbor-aware relational scoring function both parameterized by Transformers. We first pre-train iHT on a large KG dataset, Wikidata5M. Our approach achieves new state-of-the-art results on matched evaluations, with a relative improvement of more than 25% in mean reciprocal rank over previous SOTA models. When further fine-tuned on smaller KGs with either entity and relational shifts, pre-trained iHT representations are shown to be transferable, significantly improving the performance on FB15K-237 and WN18RR.
    Kernel interpolation generalizes poorly. (arXiv:2303.15809v1 [cs.LG])
    One of the most interesting problems in the recent renaissance of the studies in kernel regression might be whether the kernel interpolation can generalize well, since it may help us understand the `benign overfitting henomenon' reported in the literature on deep networks. In this paper, under mild conditions, we show that for any $\varepsilon>0$, the generalization error of kernel interpolation is lower bounded by $\Omega(n^{-\varepsilon})$. In other words, the kernel interpolation generalizes poorly for a large class of kernels. As a direct corollary, we can show that overfitted wide neural networks defined on sphere generalize poorly.
    Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages. (arXiv:2303.15669v1 [eess.AS])
    Neural text-to-speech (TTS) models can synthesize natural human speech when trained on large amounts of transcribed speech. However, collecting such large-scale transcribed data is expensive. This paper proposes an unsupervised pre-training method for a sequence-to-sequence TTS model by leveraging large untranscribed speech data. With our pre-training, we can remarkably reduce the amount of paired transcribed data required to train the model for the target downstream TTS task. The main idea is to pre-train the model to reconstruct de-warped mel-spectrograms from warped ones, which may allow the model to learn proper temporal assignment relation between input and output sequences. In addition, we propose a data augmentation method that further improves the data efficiency in fine-tuning. We empirically demonstrate the effectiveness of our proposed method in low-resource language scenarios, achieving outstanding performance compared to competing methods. The code and audio samples are available at: https://github.com/cnaigithub/SpeechDewarping
    Robustness of Utilizing Feedback in Embodied Visual Navigation. (arXiv:2303.15453v1 [cs.LG])
    This paper presents a framework for training an agent to actively request help in object-goal navigation tasks, with feedback indicating the location of the target object in its field of view. To make the agent more robust in scenarios where a teacher may not always be available, the proposed training curriculum includes a mix of episodes with and without feedback. The results show that this approach improves the agent's performance, even in the absence of feedback.  ( 2 min )
    BACKpropagation through BACK substitution with a BACKslash. (arXiv:2303.15449v1 [math.NA])
    We present a linear algebra formulation of backpropagation which allows the calculation of gradients by using a generically written ``backslash'' or Gaussian elimination on triangular systems of equations. Generally the matrix elements are operators. This paper has three contributions: 1. It is of intellectual value to replace traditional treatments of automatic differentiation with a (left acting) operator theoretic, graph-based approach. 2. Operators can be readily placed in matrices in software in programming languages such as Ju lia as an implementation option. 3. We introduce a novel notation, ``transpose dot'' operator ``$\{\}^{T_\bullet}$'' that allows the reversal of operators. We demonstrate the elegance of the operators approach in a suitable programming language consisting of generic linear algebra operators such as Julia \cite{bezanson2017julia}, and that it is possible to realize this abstraction in code. Our implementation shows how generic linear algebra can allow operators as elements of matrices, and without rewriting any code, the software carries through to completion giving the correct answer.  ( 2 min )
  • Open

    Deep Riemannian Networks for EEG Decoding. (arXiv:2212.10426v4 [cs.LG] UPDATED)
    State-of-the-art performance in electroencephalography (EEG) decoding tasks is currently often achieved with either Deep-Learning or Riemannian-Geometry-based decoders. Recently, there is growing interest in Deep Riemannian Networks (DRNs) possibly combining the advantages of both previous classes of methods. However, there are still a range of topics where additional insight is needed to pave the way for a more widespread application of DRNs in EEG. These include architecture design questions such as network size and end-to-end ability as well as model training questions. How these factors affect model performance has not been explored. Additionally, it is not clear how the data within these networks is transformed, and whether this would correlate with traditional EEG decoding. Our study aims to lay the groundwork in the area of these topics through the analysis of DRNs for EEG with a wide range of hyperparameters. Networks were tested on two public EEG datasets and compared with state-of-the-art ConvNets. Here we propose end-to-end EEG SPDNet (EE(G)-SPDNet), and we show that this wide, end-to-end DRN can outperform the ConvNets, and in doing so use physiologically plausible frequency regions. We also show that the end-to-end approach learns more complex filters than traditional band-pass filters targeting the classical alpha, beta, and gamma frequency bands of the EEG, and that performance can benefit from channel specific filtering approaches. Additionally, architectural analysis revealed areas for further improvement due to the possible loss of Riemannian specific information throughout the network. Our study thus shows how to design and train DRNs to infer task-related information from the raw EEG without the need of handcrafted filterbanks and highlights the potential of end-to-end DRNs such as EE(G)-SPDNet for high-performance EEG decoding.  ( 3 min )
    JANA: Jointly Amortized Neural Approximation of Complex Bayesian Models. (arXiv:2302.09125v2 [cs.LG] UPDATED)
    This work proposes ''jointly amortized neural approximation'' (JANA) of intractable likelihood functions and posterior densities arising in Bayesian surrogate modeling and simulation-based inference. We train three complementary networks in an end-to-end fashion: 1) a summary network to compress individual data points, sets, or time series into informative embedding vectors; 2) a posterior network to learn an amortized approximate posterior; and 3) a likelihood network to learn an amortized approximate likelihood. Their interaction opens a new route to amortized marginal likelihood and posterior predictive estimation -- two important ingredients of Bayesian workflows that are often too expensive for standard methods. We benchmark the fidelity of JANA on a variety of simulation models against state-of-the-art Bayesian methods and propose a powerful and interpretable diagnostic for joint calibration. In addition, we investigate the ability of recurrent likelihood networks to emulate complex time series models without resorting to hand-crafted summary statistics.
    Robust PAC$^m$: Training Ensemble Models Under Model Misspecification and Outliers. (arXiv:2203.01859v2 [cs.LG] UPDATED)
    Standard Bayesian learning is known to have suboptimal generalization capabilities under model misspecification and in the presence of outliers. PAC-Bayes theory demonstrates that the free energy criterion minimized by Bayesian learning is a bound on the generalization error for Gibbs predictors (i.e., for single models drawn at random from the posterior) under the assumption of sampling distributions uncontaminated by outliers. This viewpoint provides a justification for the limitations of Bayesian learning when the model is misspecified, requiring ensembling, and when data is affected by outliers. In recent work, PAC-Bayes bounds - referred to as PAC$^m$ - were derived to introduce free energy metrics that account for the performance of ensemble predictors, obtaining enhanced performance under misspecification. This work presents a novel robust free energy criterion that combines the generalized logarithm score function with PAC$^m$ ensemble bounds. The proposed free energy training criterion produces predictive distributions that are able to concurrently counteract the detrimental effects of model misspecification and outliers.
    Spatial-photonic Boltzmann machines: low-rank combinatorial optimization and statistical learning by spatial light modulation. (arXiv:2303.14993v1 [cond-mat.dis-nn] CROSS LISTED)
    The spatial-photonic Ising machine (SPIM) [D. Pierangeli et al., Phys. Rev. Lett. 122, 213902 (2019)] is a promising optical architecture utilizing spatial light modulation for solving large-scale combinatorial optimization problems efficiently. However, the SPIM can accommodate Ising problems with only rank-one interaction matrices, which limits its applicability to various real-world problems. In this Letter, we propose a new computing model for the SPIM that can accommodate any Ising problem without changing its optical implementation. The proposed model is particularly efficient for Ising problems with low-rank interaction matrices, such as knapsack problems. Moreover, the model acquires learning ability and can thus be termed a spatial-photonic Boltzmann machine (SPBM). We demonstrate that learning, classification, and sampling of the MNIST handwritten digit images are achieved efficiently using SPBMs with low-rank interactions. Thus, the proposed SPBM model exhibits higher practical applicability to various problems of combinatorial optimization and statistical learning, without losing the scalability inherent in the SPIM architecture.
    Nonlinear classifiers for ranking problems based on kernelized SVM. (arXiv:2002.11436v2 [cs.LG] UPDATED)
    Many classification problems focus on maximizing the performance only on the samples with the highest relevance instead of all samples. As an example, we can mention ranking problems, accuracy at the top or search engines where only the top few queries matter. In our previous work, we derived a general framework including several classes of these linear classification problems. In this paper, we extend the framework to nonlinear classifiers. Utilizing a similarity to SVM, we dualize the problems, add kernels and propose a componentwise dual ascent method.
    Plasticity Neural Network Based on Astrocytic effects at Critical Period, Synaptic Competition and Strength Rebalance by Current and Mnemonic Brain Plasticity and Synapse Formation. (arXiv:2203.11740v12 [cs.NE] UPDATED)
    In addition to the weights of synaptic shared connections, PNN includes weights of synaptic effective ranges [14-25]. PNN considers synaptic strength balance in dynamic of phagocytosing of synapses and static of constant sum of synapses length [14], and includes the lead behavior of the school of fish. Synapse formation will inhibit dendrites generation in experiments and simulations [15]. The memory persistence gradient of retrograde circuit similar to the Enforcing Resilience in a Spring Boot. The relatively good and inferior gradient information stored in memory engram cells in synapse formation of retrograde circuit like the folds in brain [16]. The controversy was claimed if human hippocampal neurogenesis persists throughout aging, may have a new and longer circuit in late iteration [17,18]. Closing the critical period will cause neurological disorder in experiments and simulations [19]. Considering both negative and positive memories persistence help activate synapse better than only considering positive memory [20]. Astrocytic phagocytosis will avoid the local accumulation of synapses by simulation, lack of astrocytic phagocytosis causes excitatory synapses and functionally impaired synapses accumulate in experiments and lead to destruction of cognition, but local longer synapses and worse results in simulations [21]. It gives relationship of intelligence and cortical thickness, individual differences [22]. The PNN considered the memory engram cells that strengthened Synapses [23]. The effects of PNN's memory structure and tPBM may be the same for powerful penetrability of signals [24].And extracted relatively good or inferior memories both have exponential decay by non-classical quantum computing[25]. Memory persistence inhibit local synaptic accumulation. It may introduce the relatively good and inferior solution in PSO. The simple PNN only has the synaptic phagocytosis.
    A Statistical Model for Predicting Generalization in Few-Shot Classification. (arXiv:2212.06461v2 [cs.LG] UPDATED)
    The estimation of the generalization error of classifiers often relies on a validation set. Such a set is hardly available in few-shot learning scenarios, a highly disregarded shortcoming in the field. In these scenarios, it is common to rely on features extracted from pre-trained neural networks combined with distance-based classifiers such as nearest class mean. In this work, we introduce a Gaussian model of the feature distribution. By estimating the parameters of this model, we are able to predict the generalization error on new classification tasks with few samples. We observe that accurate distance estimates between class-conditional densities are the key to accurate estimates of the generalization performance. Therefore, we propose an unbiased estimator for these distances and integrate it in our numerical analysis. We empirically show that our approach outperforms alternatives such as the leave-one-out cross-validation strategy.
    Dictionary Learning for the Almost-Linear Sparsity Regime. (arXiv:2210.10855v2 [cs.LG] UPDATED)
    Dictionary learning, the problem of recovering a sparsely used matrix $\mathbf{D} \in \mathbb{R}^{M \times K}$ and $N$ $s$-sparse vectors $\mathbf{x}_i \in \mathbb{R}^{K}$ from samples of the form $\mathbf{y}_i = \mathbf{D}\mathbf{x}_i$, is of increasing importance to applications in signal processing and data science. When the dictionary is known, recovery of $\mathbf{x}_i$ is possible even for sparsity linear in dimension $M$, yet to date, the only algorithms which provably succeed in the linear sparsity regime are Riemannian trust-region methods, which are limited to orthogonal dictionaries, and methods based on the sum-of-squares hierarchy, which requires super-polynomial time in order to obtain an error which decays in $M$. In this work, we introduce SPORADIC (SPectral ORAcle DICtionary Learning), an efficient spectral method on family of reweighted covariance matrices. We prove that in high enough dimensions, SPORADIC can recover overcomplete ($K > M$) dictionaries satisfying the well-known restricted isometry property (RIP) even when sparsity is linear in dimension up to logarithmic factors. Moreover, these accuracy guarantees have an ``oracle property" that the support and signs of the unknown sparse vectors $\mathbf{x}_i$ can be recovered exactly with high probability, allowing for arbitrarily close estimation of $\mathbf{D}$ with enough samples in polynomial time. To the author's knowledge, SPORADIC is the first polynomial-time algorithm which provably enjoys such convergence guarantees for overcomplete RIP matrices in the near-linear sparsity regime.  ( 2 min )
    Learnability, Sample Complexity, and Hypothesis Class Complexity for Regression Models. (arXiv:2303.16091v1 [stat.ML])
    The goal of a learning algorithm is to receive a training data set as input and provide a hypothesis that can generalize to all possible data points from a domain set. The hypothesis is chosen from hypothesis classes with potentially different complexities. Linear regression modeling is an important category of learning algorithms. The practical uncertainty of the target samples affects the generalization performance of the learned model. Failing to choose a proper model or hypothesis class can lead to serious issues such as underfitting or overfitting. These issues have been addressed by alternating cost functions or by utilizing cross-validation methods. These approaches can introduce new hyperparameters with their own new challenges and uncertainties or increase the computational complexity of the learning algorithm. On the other hand, the theory of probably approximately correct (PAC) aims at defining learnability based on probabilistic settings. Despite its theoretical value, PAC does not address practical learning issues on many occasions. This work is inspired by the foundation of PAC and is motivated by the existing regression learning issues. The proposed approach, denoted by epsilon-Confidence Approximately Correct (epsilon CoAC), utilizes Kullback Leibler divergence (relative entropy) and proposes a new related typical set in the set of hyperparameters to tackle the learnability issue. Moreover, it enables the learner to compare hypothesis classes of different complexity orders and choose among them the optimum with the minimum epsilon in the epsilon CoAC framework. Not only the epsilon CoAC learnability overcomes the issues of overfitting and underfitting, but it also shows advantages and superiority over the well known cross-validation method in the sense of time consumption as well as in the sense of accuracy.  ( 3 min )
    Measuring Classification Decision Certainty and Doubt. (arXiv:2303.14568v2 [stat.ML] UPDATED)
    Quantitative characterizations and estimations of uncertainty are of fundamental importance in optimization and decision-making processes. Herein, we propose intuitive scores, which we call certainty and doubt, that can be used in both a Bayesian and frequentist framework to assess and compare the quality and uncertainty of predictions in (multi-)classification decision machine learning problems.  ( 2 min )
    Understanding and Exploring the Whole Set of Good Sparse Generalized Additive Models. (arXiv:2303.16047v1 [cs.LG])
    In real applications, interaction between machine learning model and domain experts is critical; however, the classical machine learning paradigm that usually produces only a single model does not facilitate such interaction. Approximating and exploring the Rashomon set, i.e., the set of all near-optimal models, addresses this practical challenge by providing the user with a searchable space containing a diverse set of models from which domain experts can choose. We present a technique to efficiently and accurately approximate the Rashomon set of sparse, generalized additive models (GAMs). We present algorithms to approximate the Rashomon set of GAMs with ellipsoids for fixed support sets and use these ellipsoids to approximate Rashomon sets for many different support sets. The approximated Rashomon set serves as a cornerstone to solve practical challenges such as (1) studying the variable importance for the model class; (2) finding models under user-specified constraints (monotonicity, direct editing); (3) investigating sudden changes in the shape functions. Experiments demonstrate the fidelity of the approximated Rashomon set and its effectiveness in solving practical challenges.  ( 2 min )
    Generalization and Stability of Interpolating Neural Networks with Minimal Width. (arXiv:2302.09235v2 [stat.ML] UPDATED)
    We investigate the generalization and optimization properties of shallow neural-network classifiers trained by gradient descent in the interpolating regime. Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error $\epsilon$ and their distance from initialization is $g(\epsilon)$, we demonstrate that gradient descent with $n$ training data achieves training error $O(g(1/T)^2 /T)$ and generalization error $O(g(1/T)^2 /n)$ at iteration $T$, provided there are at least $m=\Omega(g(1/T)^4)$ hidden neurons. We then show that our realizable setting encompasses a special case where data are separable by the model's neural tangent kernel. For this and logistic-loss minimization, we prove the training loss decays at a rate of $\tilde O(1/ T)$ given polylogarithmic number of neurons $m=\Omega(\log^4 (T))$. Moreover, with $m=\Omega(\log^{4} (n))$ neurons and $T\approx n$ iterations, we bound the test loss by $\tilde{O}(1/n)$. Our results differ from existing generalization outcomes using the algorithmic-stability framework, which necessitate polynomial width and yield suboptimal generalization rates. Central to our analysis is the use of a new self-bounded weak-convexity property, which leads to a generalized local quasi-convexity property for sufficiently parameterized neural-network classifiers. Eventually, despite the objective's non-convexity, this leads to convergence and generalization-gap bounds that resemble those found in the convex setting of linear logistic regression.  ( 2 min )
    Forecasting Large Realized Covariance Matrices: The Benefits of Factor Models and Shrinkage. (arXiv:2303.16151v1 [q-fin.ST])
    We propose a model to forecast large realized covariance matrices of returns, applying it to the constituents of the S\&P 500 daily. To address the curse of dimensionality, we decompose the return covariance matrix using standard firm-level factors (e.g., size, value, and profitability) and use sectoral restrictions in the residual covariance matrix. This restricted model is then estimated using vector heterogeneous autoregressive (VHAR) models with the least absolute shrinkage and selection operator (LASSO). Our methodology improves forecasting precision relative to standard benchmarks and leads to better estimates of minimum variance portfolios.  ( 2 min )
    qEUBO: A Decision-Theoretic Acquisition Function for Preferential Bayesian Optimization. (arXiv:2303.15746v1 [cs.LG])
    Preferential Bayesian optimization (PBO) is a framework for optimizing a decision maker's latent utility function using preference feedback. This work introduces the expected utility of the best option (qEUBO) as a novel acquisition function for PBO. When the decision maker's responses are noise-free, we show that qEUBO is one-step Bayes optimal and thus equivalent to the popular knowledge gradient acquisition function. We also show that qEUBO enjoys an additive constant approximation guarantee to the one-step Bayes-optimal policy when the decision maker's responses are corrupted by noise. We provide an extensive evaluation of qEUBO and demonstrate that it outperforms the state-of-the-art acquisition functions for PBO across many settings. Finally, we show that, under sufficient regularity conditions, qEUBO's Bayesian simple regret converges to zero at a rate $o(1/n)$ as the number of queries, $n$, goes to infinity. In contrast, we show that simple regret under qEI, a popular acquisition function for standard BO often used for PBO, can fail to converge to zero. Enjoying superior performance, simple computation, and a grounded decision-theoretic justification, qEUBO is a promising acquisition function for PBO.
    A source separation approach to temporal graph modelling for computer networks. (arXiv:2303.15950v1 [cs.CR])
    Detecting malicious activity within an enterprise computer network can be framed as a temporal link prediction task: given a sequence of graphs representing communications between hosts over time, the goal is to predict which edges should--or should not--occur in the future. However, standard temporal link prediction algorithms are ill-suited for computer network monitoring as they do not take account of the peculiar short-term dynamics of computer network activity, which exhibits sharp seasonal variations. In order to build a better model, we propose a source separation-inspired description of computer network activity: at each time step, the observed graph is a mixture of subgraphs representing various sources of activity, and short-term dynamics result from changes in the mixing coefficients. Both qualitative and quantitative experiments demonstrate the validity of our approach.
    Learning to Generalize Provably in Learning to Optimize. (arXiv:2302.11085v2 [cs.LG] UPDATED)
    Learning to optimize (L2O) has gained increasing popularity, which automates the design of optimizers by data-driven approaches. However, current L2O methods often suffer from poor generalization performance in at least two folds: (i) applying the L2O-learned optimizer to unseen optimizees, in terms of lowering their loss function values (optimizer generalization, or ``generalizable learning of optimizers"); and (ii) the test performance of an optimizee (itself as a machine learning model), trained by the optimizer, in terms of the accuracy over unseen data (optimizee generalization, or ``learning to generalize"). While the optimizer generalization has been recently studied, the optimizee generalization (or learning to generalize) has not been rigorously studied in the L2O context, which is the aim of this paper. We first theoretically establish an implicit connection between the local entropy and the Hessian, and hence unify their roles in the handcrafted design of generalizable optimizers as equivalent metrics of the landscape flatness of loss functions. We then propose to incorporate these two metrics as flatness-aware regularizers into the L2O framework in order to meta-train optimizers to learn to generalize, and theoretically show that such generalization ability can be learned during the L2O meta-training process and then transformed to the optimizee loss function. Extensive experiments consistently validate the effectiveness of our proposals with substantially improved generalization on multiple sophisticated L2O models and diverse optimizees. Our code is available at: https://github.com/VITA-Group/Open-L2O/tree/main/Model_Free_L2O/L2O-Entropy.  ( 3 min )
    New Lower Bounds for Private Estimation and a Generalized Fingerprinting Lemma. (arXiv:2205.08532v5 [cs.DS] UPDATED)
    We prove new lower bounds for statistical estimation tasks under the constraint of $(\varepsilon, \delta)$-differential privacy. First, we provide tight lower bounds for private covariance estimation of Gaussian distributions. We show that estimating the covariance matrix in Frobenius norm requires $\Omega(d^2)$ samples, and in spectral norm requires $\Omega(d^{3/2})$ samples, both matching upper bounds up to logarithmic factors. The latter bound verifies the existence of a conjectured statistical gap between the private and the non-private sample complexities for spectral estimation of Gaussian covariances. We prove these bounds via our main technical contribution, a broad generalization of the fingerprinting method to exponential families. Additionally, using the private Assouad method of Acharya, Sun, and Zhang, we show a tight $\Omega(d/(\alpha^2 \varepsilon))$ lower bound for estimating the mean of a distribution with bounded covariance to $\alpha$-error in $\ell_2$-distance. Prior known lower bounds for all these problems were either polynomially weaker or held under the stricter condition of $(\varepsilon, 0)$-differential privacy.  ( 2 min )
    Sparse Gaussian Processes with Spherical Harmonic Features Revisited. (arXiv:2303.15948v1 [stat.ML])
    We revisit the Gaussian process model with spherical harmonic features and study connections between the associated RKHS, its eigenstructure and deep models. Based on this, we introduce a new class of kernels which correspond to deep models of continuous depth. In our formulation, depth can be estimated as a kernel hyper-parameter by optimizing the evidence lower bound. Further, we introduce sparseness in the eigenbasis by variational learning of the spherical harmonic phases. This enables scaling to larger input dimensions than previously, while also allowing for learning of high frequency variations. We validate our approach on machine learning benchmark datasets.  ( 2 min )
    Communication-Efficient Distributed Estimation and Inference for Cox's Model. (arXiv:2302.12111v2 [stat.ME] UPDATED)
    Motivated by multi-center biomedical studies that cannot share individual data due to privacy and ownership concerns, we develop communication-efficient iterative distributed algorithms for estimation and inference in the high-dimensional sparse Cox proportional hazards model. We demonstrate that our estimator, even with a relatively small number of iterations, achieves the same convergence rate as the ideal full-sample estimator under very mild conditions. To construct confidence intervals for linear combinations of high-dimensional hazard regression coefficients, we introduce a novel debiased method, establish central limit theorems, and provide consistent variance estimators that yield asymptotically valid distributed confidence intervals. In addition, we provide valid and powerful distributed hypothesis tests for any coordinate element based on a decorrelated score test. We allow time-dependent covariates as well as censored survival times. Extensive numerical experiments on both simulated and real data lend further support to our theory and demonstrate that our communication-efficient distributed estimators, confidence intervals, and hypothesis tests improve upon alternative methods.  ( 2 min )
    One Transformer Can Understand Both 2D & 3D Molecular Data. (arXiv:2210.01765v4 [cs.LG] UPDATED)
    Unlike vision and language data which usually has a unique format, molecules can naturally be characterized using different chemical formulations. One can view a molecule as a 2D graph or define it as a collection of atoms located in a 3D space. For molecular representation learning, most previous works designed neural networks only for a particular data format, making the learned models likely to fail for other data formats. We believe a general-purpose neural network model for chemistry should be able to handle molecular tasks across data modalities. To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations. Using the standard Transformer as the backbone architecture, Transformer-M develops two separated channels to encode 2D and 3D structural information and incorporate them with the atom features in the network modules. When the input data is in a particular format, the corresponding channel will be activated, and the other will be disabled. By training on 2D and 3D molecular data with properly designed supervised signals, Transformer-M automatically learns to leverage knowledge from different data modalities and correctly capture the representations. We conducted extensive experiments for Transformer-M. All empirical results show that Transformer-M can simultaneously achieve strong performance on 2D and 3D tasks, suggesting its broad applicability. The code and models will be made publicly available at https://github.com/lsj2408/Transformer-M.  ( 3 min )
    Complementary Domain Adaptation and Generalization for Unsupervised Continual Domain Shift Learning. (arXiv:2303.15833v1 [cs.LG])
    Continual domain shift poses a significant challenge in real-world applications, particularly in situations where labeled data is not available for new domains. The challenge of acquiring knowledge in this problem setting is referred to as unsupervised continual domain shift learning. Existing methods for domain adaptation and generalization have limitations in addressing this issue, as they focus either on adapting to a specific domain or generalizing to unseen domains, but not both. In this paper, we propose Complementary Domain Adaptation and Generalization (CoDAG), a simple yet effective learning framework that combines domain adaptation and generalization in a complementary manner to achieve three major goals of unsupervised continual domain shift learning: adapting to a current domain, generalizing to unseen domains, and preventing forgetting of previously seen domains. Our approach is model-agnostic, meaning that it is compatible with any existing domain adaptation and generalization algorithms. We evaluate CoDAG on several benchmark datasets and demonstrate that our model outperforms state-of-the-art models in all datasets and evaluation metrics, highlighting its effectiveness and robustness in handling unsupervised continual domain shift learning.  ( 2 min )
    Mathematical Challenges in Deep Learning. (arXiv:2303.15464v1 [cs.LG])
    Deep models are dominating the artificial intelligence (AI) industry since the ImageNet challenge in 2012. The size of deep models is increasing ever since, which brings new challenges to this field with applications in cell phones, personal computers, autonomous cars, and wireless base stations. Here we list a set of problems, ranging from training, inference, generalization bound, and optimization with some formalism to communicate these challenges with mathematicians, statisticians, and theoretical computer scientists. This is a subjective view of the research questions in deep learning that benefits the tech industry in long run.  ( 2 min )
    PDExplain: Contextual Modeling of PDEs in the Wild. (arXiv:2303.15827v1 [cs.LG])
    We propose an explainable method for solving Partial Differential Equations by using a contextual scheme called PDExplain. During the training phase, our method is fed with data collected from an operator-defined family of PDEs accompanied by the general form of this family. In the inference phase, a minimal sample collected from a phenomenon is provided, where the sample is related to the PDE family but not necessarily to the set of specific PDEs seen in the training phase. We show how our algorithm can predict the PDE solution for future timesteps. Moreover, our method provides an explainable form of the PDE, a trait that can assist in modelling phenomena based on data in physical sciences. To verify our method, we conduct extensive experimentation, examining its quality both in terms of prediction error and explainability.  ( 2 min )
    A likelihood approach to nonparametric estimation of a singular distribution using deep generative models. (arXiv:2105.04046v3 [stat.ML] UPDATED)
    We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure, such as a low-dimensional manifold, is challenging due to its singularity with respect to the Lebesgue measure in the ambient space. In the considered model, a usual likelihood approach can fail to estimate the target distribution consistently due to the singularity. We prove that a novel and effective solution exists by perturbing the data with an instance noise, which leads to consistent estimation of the underlying distribution with desirable convergence rates. We also characterize the class of distributions that can be efficiently estimated via deep generative models. This class is sufficiently general to contain various structured distributions such as product distributions, classically smooth distributions and distributions supported on a low-dimensional manifold. Our analysis provides some insights on how deep generative models can avoid the curse of dimensionality for nonparametric distribution estimation. We conduct a thorough simulation study and real data analysis to empirically demonstrate that the proposed data perturbation technique improves the estimation performance significantly.  ( 2 min )
    Bayesian Free Energy of Deep ReLU Neural Network in Overparametrized Cases. (arXiv:2303.15739v1 [cs.LG])
    In many research fields in artificial intelligence, it has been shown that deep neural networks are useful to estimate unknown functions on high dimensional input spaces. However, their generalization performance is not yet completely clarified from the theoretical point of view because they are nonidentifiable and singular learning machines. Moreover, a ReLU function is not differentiable, to which algebraic or analytic methods in singular learning theory cannot be applied. In this paper, we study a deep ReLU neural network in overparametrized cases and prove that the Bayesian free energy, which is equal to the minus log marginal likelihoodor the Bayesian stochastic complexity, is bounded even if the number of layers are larger than necessary to estimate an unknown data-generating function. Since the Bayesian generalization error is equal to the increase of the free energy as a function of a sample size, our result also shows that the Bayesian generalization error does not increase even if a deep ReLU neural network is designed to be sufficiently large or in an opeverparametrized state.  ( 2 min )
    Online Learning for Incentive-Based Demand Response. (arXiv:2303.15617v1 [cs.LG])
    In this paper, we consider the problem of learning online to manage Demand Response (DR) resources. A typical DR mechanism requires the DR manager to assign a baseline to the participating consumer, where the baseline is an estimate of the counterfactual consumption of the consumer had it not been called to provide the DR service. A challenge in estimating baseline is the incentive the consumer has to inflate the baseline estimate. We consider the problem of learning online to estimate the baseline and to optimize the operating costs over a period of time under such incentives. We propose an online learning scheme that employs least-squares for estimation with a perturbation to the reward price (for the DR services or load curtailment) that is designed to balance the exploration and exploitation trade-off that arises with online learning. We show that, our proposed scheme is able to achieve a very low regret of $\mathcal{O}\left((\log{T})^2\right)$ with respect to the optimal operating cost over $T$ days of the DR program with full knowledge of the baseline, and is individually rational for the consumers to participate. Our scheme is significantly better than the averaging type approach, which only fetches $\mathcal{O}(T^{1/3})$ regret.  ( 2 min )
    Learning Rate Schedules in the Presence of Distribution Shift. (arXiv:2303.15634v1 [cs.LG])
    We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret.  ( 2 min )
    Repulsive Deep Ensembles are Bayesian. (arXiv:2106.11642v3 [cs.LG] UPDATED)
    Deep ensembles have recently gained popularity in the deep learning community for their conceptual simplicity and efficiency. However, maintaining functional diversity between ensemble members that are independently trained with gradient descent is challenging. This can lead to pathologies when adding more ensemble members, such as a saturation of the ensemble performance, which converges to the performance of a single model. Moreover, this does not only affect the quality of its predictions, but even more so the uncertainty estimates of the ensemble, and thus its performance on out-of-distribution data. We hypothesize that this limitation can be overcome by discouraging different ensemble members from collapsing to the same function. To this end, we introduce a kernelized repulsive term in the update rule of the deep ensembles. We show that this simple modification not only enforces and maintains diversity among the members but, even more importantly, transforms the maximum a posteriori inference into proper Bayesian inference. Namely, we show that the training dynamics of our proposed repulsive ensembles follow a Wasserstein gradient flow of the KL divergence with the true posterior. We study repulsive terms in weight and function space and empirically compare their performance to standard ensembles and Bayesian baselines on synthetic and real-world prediction tasks.  ( 2 min )
    Adjusted Wasserstein Distributionally Robust Estimator in Statistical Learning. (arXiv:2303.15579v1 [stat.ML])
    We propose an adjusted Wasserstein distributionally robust estimator -- based on a nonlinear transformation of the Wasserstein distributionally robust (WDRO) estimator in statistical learning. This transformation will improve the statistical performance of WDRO because the adjusted WDRO estimator is asymptotically unbiased and has an asymptotically smaller mean squared error. The adjusted WDRO will not mitigate the out-of-sample performance guarantee of WDRO. Sufficient conditions for the existence of the adjusted WDRO estimator are presented, and the procedure for the computation of the adjusted WDRO estimator is given. Specifically, we will show how the adjusted WDRO estimator is developed in the generalized linear model. Numerical experiments demonstrate the favorable practical performance of the adjusted estimator over the classic one.  ( 2 min )

  • Open

    Generative AI Study
    We (Glimpse AI) are doing a small research study in partnership with Nvidia to understand how Generative AI will be used by eCommerce sellers over the coming years. We are looking for some beta testers to try out the software we'll be using during the study. Mainly to test functionality so no need to actually be an eCommerce seller. If you're interested, please fill out this Google form. Thanks! submitted by /u/Accomplished_Head5 [link] [comments]  ( 7 min )
    Does there exist an OCR subtitle converter that turns image-based subtitles into text-based subtitles using a transformer-based model?
    We are in the age of AI. I was wondering if there are any projects on the horizon of subtitle converters that can take in an image-based subtitle like VobSub and HDMV PGS and then turn it into SRT subs? Microsoft just recently released a highly impressive OCR model that was trained on 558 million parameters, named TrOCR-LARGE. TrOCR is a model that uses an image Transformer encoder and an autoregressive text Transformer decoder to perform optical character recognition (OCR). It is pre-trained in 2 stages before being fine-tuned on downstream datasets. Study of Microsoft's TrOCR Hugging Face Documentation GitHub Source Code and code direct from Microsoft submitted by /u/objectivelywrongbro [link] [comments]  ( 43 min )
    AI deepfakes could advance misinformation in the run up to the 2024 election
    submitted by /u/tlubz [link] [comments]  ( 42 min )
    I built a free translation chat app that does AI translations in-app.
    submitted by /u/BrosephSmithSr [link] [comments]  ( 6 min )
    AI is The Biggest Economic Impact of All Time w/ Raoul Pal & Emad Mostaque
    submitted by /u/StevenVincentOne [link] [comments]  ( 42 min )
    I would like to make a thing similar to "Nothing, Forever" or "AtheneAiHeroes". Though I am unsure how
    Please help me, I am not sure how to create the visuals. I understand that for the scripts I can use OpenAI's ChatGPT but I am unsure how to create animations with an ai without training and coding it myself. Please help me (This is just for me not to be put on twitch or on any form of social media) submitted by /u/Repulsive-Bread5749 [link] [comments]  ( 7 min )
    We should use AI voice generators and bots to give scam centers a call and waste their time.
    The scammers will have their time wasted. Imagine a grandma talking for 5 hours to a scammer and then reveals "Well I was a fake AI bot all along" and just wastes their time. Then another call happens and it's another AI grandma bot. Then another one happens and another one until the scam center goes bankrupt. submitted by /u/HappyMan1102 [link] [comments]  ( 43 min )
    Need some AI advice
    Hello all! I’ll try and keep this short. I work for a small medical supply company (they are pretty old fashioned). I work in the accounting department. I’ve been using AI to respond to emails (still proof reading of course) and for a few other small things. I showed my boss and he wants me to reasearch different AI we can start using. There are so AI out there I don’t know where to start. Here is some of the stuff that I think Ai could be used for below. If you guys have any recommendations on what AI I should research or any good tutorials on different AIs for small business. Marketing: currently we spend an astronomical amount of money on google analytics. Is there an AI to help data collection? Or one that helps with the ranking when you search for the company? Accounting: not sure if I want to find one for this or it might take my job lol. But if there is a helpful one please let me know. Really anything you know that you think might be helpful. If you have good educational videos that would be awesome! Thank you all in advance! submitted by /u/Geekthefkeek [link] [comments]  ( 43 min )
    The rate Microsoft is pushing products is insane
    submitted by /u/BrosephSmithSr [link] [comments]  ( 42 min )
    Irrefutable Argument for why AI will lead to massive unemployment
    The Industrial Revolution took away people's jobs in factories and other industries because machines could produce 10x faster than a human worker at 1/10th of the cost, making them 100 times more efficient. However, this did not lead to massive unemployment, because more jobs were able to be made. Why? Because the Industrial Revolution allowed businesses to grow in massive scale and hire many more employees for new, necessary tasks. A shoe producer maybe had 100 factory workers, and needed 100 more employees to run advertising and other operations in the business, for a total of 200 employees. When the machines came in and replaced the factory workers, it meant a loss of 100 jobs. However, the business grew in such size and scale due to the increased shoe production capabilities that the…  ( 52 min )
    Any idea when GPT-4 will be available for all ?
    And will the tools feature be implemented ? (Like asking it to run photoshop on my computer to edit picture itself) submitted by /u/deck4242 [link] [comments]  ( 6 min )
    Gen1 video to video AI released today
    I received an email from runway today telling me that Gen1 has been released today. This allows you to upload video and then manipulate the output by feeding it a new style. This can be text, an image or a pre-trained style such as claymation. It is currently throttled to 3 seconds. Not had time to try it yet. submitted by /u/aleonzzz [link] [comments]  ( 42 min )
    As ML models scale, will consumers get outpriced on advanced models?
    The demand for computational resources is likely to increase as machine learning becomes more mainstream. Scale of improvement in models only means a scale in cost for computation. submitted by /u/BlackstockTy476 [link] [comments]  ( 42 min )
    Does an image lookup AI exist?
    I do not mean either AI generated images or google images. I mean a program like chatgtp, only it generates already existing images based on a longer text input. submitted by /u/FashionableDolphin [link] [comments]  ( 6 min )
    AI-powered glasses that helps you in interviews and on dates 🤯
    submitted by /u/wgmimedia [link] [comments]  ( 46 min )
    If you believe that GPT-4 has no "knowledge", "understanding" or "intelligence", then what is the appropriate word to use for the delta in capability between GPT-2 and GPT-4?
    How will we talk about these things if we eschew these and similar words? submitted by /u/Smallpaul [link] [comments]  ( 6 min )
    Are there any AI tools that can bring a photo to life like it's a video?
    Publicly available ones! I'm a video editor and could imagine this being so powerful, as it's all about retaining user attention and movement does that. If a photo of a person was "alive", almost like how a character in a video game is breathing, looking around a little bit. Does anything like this exist? submitted by /u/TommyASDF [link] [comments]  ( 7 min )
    What would you recommend for making changes to a classic novel
    I’m trying to put together a gift to cheer up a friend who is going through a rough time, and it involves inserting them into a public domain novel (which I’ve got the text file for). What I’m wondering is, rather than me going through all eleventy thousand pages and editing everything manually - is there an AI any of you would recommend that could do this? submitted by /u/MadamJones [link] [comments]  ( 7 min )
    Bard has no chill.
    submitted by /u/KenaiUrsa [link] [comments]  ( 44 min )
    Played Around with Runway's new Gen-1 Video to Video
    submitted by /u/Squidguy83 [link] [comments]  ( 6 min )
    I did it.
    submitted by /u/Accurate_University1 [link] [comments]  ( 42 min )
    YouTube AI translation (CC)
    Hello everyone, I am huge believer in the role AI will play in knocking down language barriers globally. I was wondering if Google has announced anything about improved AI capabilities to the auto translate feature for YouTube videos? There's plenty of content I would like to watch in other languages (and I doubt I am alone in this), however, the current auto translate feature is abysmal. I think such a feature would benefit both content creators and YouTube itself. I personally would love to watch plenty of content in Japanese and in German that is currently out of reach for me due to the language barrier. Anyway, there's big focus on the search engine but has Google even touched upon integrating AI into YouTube in any way? Cheers. submitted by /u/interlocutor_90 [link] [comments]  ( 43 min )
  • Open

    [D] I've got a Job offer but I'm scared
    Hello people. I've got a Junior ML offer from a company (I'm from UK) and I have some questions. Background for me: I have economics bachelor's and applied economics master's and 8 months of Python Developer experience. I've done my master's thesis in statistics and hypothesis testing but nothing hardcore stuff really, pretty basic. I applied for this Junior ML job a couple of weeks ago and I've almost maxed out the technical tests, passed every interview, althought it wasn't really hard honestly. The tests were mostly Data Engineering. The problem is: My math skills are rusty. I've studied algebra, calculus and statistics during university but it was a long time ago and I did not use them ever since. They said I will have time to learn on-the-job and they know I have zero ML experience but what should I expect? I've heard from some people that if you don't create ML algorithms/models from scratch then you should be fine. As far as I know I will be using Tensorflow, Python, Pandas, scikit-learn mostly. Also, I heard a lot of ML devs don't even use math on a daily basis. So with only Python Developer experience and economics degrees and really rusty math (at the moment) should I accept this offer or turn it down? submitted by /u/windonwind [link] [comments]  ( 46 min )
    [D] Prediction time! Lets update those Bayesian priors! How long until human-level AGI?
    Making the bold an unscientific assumption that this sub is at least decently representative of people “in the know” on ML, lets recreate those studies from the 2010 on how long until AGI. Has GPT changed everyone’s prior or are the results still similar? I tried to format the poll similarly to those prediction surveys given to experts so 1:1 comparison is possible. Leave a comment on your pet definition for “human-level AGI” which is 1) testable 2) falsifiable 3) robust Upvote or downvote definitions to let the hive mind decide. View Poll submitted by /u/LanchestersLaw [link] [comments]  ( 45 min )
    Should software developers be concerned of AI? [D]
    I am currently studying to become a software developer and see many devs stressing out about Chat-GPT and Github's CoPilot. Are these concerns valid from an AI/ML expert's perspective? submitted by /u/Chasehud [link] [comments]  ( 45 min )
    Digital twin experiences? [Research]
    Hi r/machinelearning, I'm looking for some assistance and was wondering if I can ask some of you to complete a short survey about your experience in DTs? It's a very short survey will only take 2-3 minutes to complete. I would really appreciate hearing your opinion. This study is related to my undergraduate project, the questions may be a little simple for most of you but nonetheless I'd really value your input. https://forms.office.com/e/maGTEJtCb9ita primarily focussed towards the built environment but any responses help build a picture for the data (anonymous) If you have any questions please feel free to ask 🙏 Thank you, Mark submitted by /u/SciencePublic1234 [link] [comments]  ( 43 min )
    ICML 2023 - What are the chances of a paper with 2 weak accepts (confidence: 5/5, 3/5) and 1 borderline accept (Confidence: 3/5)? [Discussion] [Research]
    This is my first time submitting to ICML. How hard is it usually to get accepted based on these kind of reviews? I assume depending on the distribution of the reviews over all the papers, they might have some upper bound on the number of papers they accept, or, some other constraints? Details: Average Rating: 5.67 (Min: 5, Max: 6)Average Confidence: 3.67 (Min: 3, Max: 5) R1: Rating: 5: Borderline accept: Technically solid paper where reasons to accept outweigh reasons to reject, e.g., limited evaluation. Please use sparingly. Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. R2: <3 Rating: 6: Weak Accept: Technically solid, moderate-to-high impact paper, with no major concerns with respect to evaluation, resources, reproducibility, ethical considerations. Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. R3: Rating: 6: Weak Accept: Technically solid, moderate-to-high impact paper, with no major concerns with respect to evaluation, resources, reproducibility, ethical considerations. Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. submitted by /u/nobody2611 [link] [comments]  ( 44 min )
    [P] Kangas 2.0: EDA for Computer Vision Datasets
    Project Link: https://github.com/comet-ml/kangas 5 months ago, I shared the initial release of Kangas, a new open source EDA tool that my colleagues and I were working on, here in r/MachineLearning. After collecting feedback from some community members here, and from various other Kangas users, we've finally released version 2.0! Kangas is a tool for viewing and exploring large tables of multimedia data. It allows you to ingest large tables of data—from dataframes, csv's, or other sources—via Kangas' Python library, and construct a data structure we call a DataGrid. From the DataGrid object, you can perform a variety of queries and operations, including rendering the Kangas UI by running DataGrid.show() ​ https://i.redd.it/fj3gvlbo5jqa1.gif We've focused on a handful of features with …  ( 45 min )
    [P] Architecting the Edge for AI/ML
    Check out a new post from team Modzy on Architecting the Edge for AI and ML. This post examines trends driven the intersection of ML and edge computing. We also explore what you need to architect your edge ML/AI systems for flexibility, scalability, and efficiency without breaking the bank. Finally, we discuss the elements needed for an ideal edge architecture, the benefits of that approach, and four edge paradigms for consideration. https://medium.com/getmodzy/architecting-the-edge-for-ai-and-ml-13fccdafab96 submitted by /u/modzykirsten [link] [comments]  ( 43 min )
    "[D]" Is wandb.ai worth using?
    I am not comfortable with idea that the codes I write will be logged into their server. Is there any alternate to wandb which can be hosted locally in my machine or in a common server where a team of people can collaborate ? submitted by /u/frodo_mavinchotil [link] [comments]  ( 44 min )
    [P] I made a calculator that is able to estimate the amount of VRam you need to load a model and suggest an amount for training
    submitted by /u/I_will_delete_myself [link] [comments]  ( 6 min )
    Variance in reported results on ImageNet between papers [D]
    Looking at some old tables: https://arxiv.org/pdf/1512.03385.pdf, Table 4 https://arxiv.org/pdf/1905.11946.pdf, Table 2 Why do the ResNet-152 results vary? E.g. Top-1 error on ImageNet validation set is 19.38 in the original, but 22.2 in the EfficientNet paper. Normally I would assume these type of results would be copied from the previous publication. submitted by /u/kaphed [link] [comments]  ( 44 min )
    [PROJECT] Application of machine learning to manual interpretatoins
    Problem setup: I have time series data where there are typically 5-6 data channels, and each set of measurements will have 10-15k time steps. These data are raw measurements, and typically an expert interpreter will visually assess these data streams (i.e., values vs. time or crossplots of value1 vs. value2) and then based on that assessment will select parameters that go into a series of calculations about the properties of the material being measured. Normally, the interpreter will split the data up into intervals with similar data characteristics and set the parameters for the whole interval. Data Challenges: Interpreting these data streams to the level of detail necessary will typically take a skilled interpreter roughly 4-12 hours. Due to the time required the number of interpreted records available is ~50-100 (which is about a million rows of data (this is all tabular numeric data) but because parameters are picked over intervals it is about 10-15 intervals per record, meaning we really only have 700-1200 actual hand picked points. (The number of training records available could be increased if I could rope in more collaborators, but this has proven difficult). There are thousands of uninterpreted records available. Current Approach: The current approach that people have taken is to mainly feed the raw input into a simple model like a random forest (or some similar ensemble method) or a neural network and try to predict the result. Questions: Are there more sophisticated approaches that could be used to train a model to approach this type of problem? What is the current SOTA for working with manual interpretations? submitted by /u/julysanta [link] [comments]  ( 44 min )
    [D] Tips for training/fine-tuning LLM for low resource languages
    I am trying to either pre-train or fine-tune (most probably) a LLM for a relatively low resource language, in my case Albanian. My goal is to curate a good dataset for this and provide an open source model, since currently there is no good quality model available. I have some questions regarding this, which I would be very happy to have an answer because I have not found a clear explanation from my research online. 1) Currently I have around 3 GB of text data in Albanian. I think training from scratch from such a small dataset wouldn't give good results, so my idea is to fine-tune on a model such as T5 or the GPT models from EleutherAI. However, given that these models might not contain any Albanian training data would you think it would be a good idea? 2) What would be the ideal model size for my dataset size. Is there any equation or rule of thumb that determines this? 3) To get a good langauges model, is there a dataeet recipe that gives best results (i.e books/news/instructions etc.)? 4) Any other tip or thing that might be useful to know? submitted by /u/Son_of_Tetovo [link] [comments]  ( 44 min )
    [D] With ML tools progressing so fast, what are some ways you've taken advantage of them personally?
    This is revolutionary tech, but a lot of the content about their potential focus around "you could have it help you with such and such business" which is cool but the majority of us don't (or can't) directly use it much for business or work. But I'm sure there are still lots of ways to get value out of it, thought we could share some of how we've used it so far. So far I've used generative tech to: Draft simple emails to review Digitize a drinking game and code it for my friends to access online Paste and have it quiz me for a upcoming test Help recommend and summarize books and movies Have a history expert to answer all my questions while I read some history books (somewhat cautious of hallucinations here) And now I just got access to the chatgpt code interpreter alpha and used it for simple side projects, and am looking for inspiration to personally leverage to learn or do things in new beneficial and creative ways. submitted by /u/RedditLovingSun [link] [comments]  ( 7 min )
    [R] What is the state of the art for Logos Image Retrieval?
    I have a directories with a lot of different logos for each brand... given a query logo I would like to Retrieval the most similars, in order to correct classify the brand. What is the state of the art? submitted by /u/bollolo [link] [comments]  ( 43 min )
    [P] Consistency: Diffusion in a Single Forward Pass 🚀
    Hey all! Recently, researchers from OpenAI proposed consistency models, a new family of generative models. It allows us to generate high quality images in a single forward pass, just like good-old GANs and VAEs. ​ training progress on cifar10 ​ I have been working on it and found it definetly works! You can try it with diffusers. import diffusers from diffusers import DiffusionPipeline pipeline = DiffusionPipeline.from_pretrained( "consistency/cifar10-32-demo", custom_pipeline="consistency/pipeline", ) pipeline().images[0] # Super Fast Generation! 🤯 pipeline(steps=5).images[0] # More steps for sample quality It would be fascinating if we could train these models on different datasets and share our results and ideas! 🤗 So, I've made a simple library called consistency that makes it easy to train your own consistency models and publish them. You can check it out here: https://github.com/junhsss/consistency-models I would appreciate any feedback you could provide! submitted by /u/Beautiful-Gur-9456 [link] [comments]  ( 44 min )
    [P] Clustering face embeddings (512d) using GCN's (not knowing the amount of needed clusters)
    Hi, I have used the InsightFace model to detect a bunch of faces in images. It returns face embeddings of 512d. I don't have reference data for the persons on these images, neither do I know how many different identities appear on them. I would love to cluster the face embeddings as good as possible. So far I have tried dbscan and hierarchical clustering, both showing decent results when I manually evaluate the clusters (after playing around with the hyperparams). ​ Now I have been reading about how using GCN's (graph convolutional networks) could lead to better clustering results. Problem is I don't know how to do this. I learned that the nodes in the graph would represent the face embeddings, and that an edge between 2 nodes would imply that 2 embeddings correspond to the face of the same identity. I also learned (by searching online and asking ChatGPT) that you would first need to feed the GCN a thresholded similarity matrix (adjacency matrix), build the GCN model and train it, and eventually cluster the resulting node embeddings using dbscan or spectral clustering or some other clustering algorithm. I don't know to which extent this information is correct. ​ Could somebody with knowledge about GCN's give me some tips on how to work out the code necessary to achieve this? submitted by /u/operative10 [link] [comments]  ( 44 min )
    [PROJECT] Built a new tool using NER to extract data from ANY documents
    Hey guys, I have been working on a tool (for over a couple of months now) that will help extract ANY data you want from ANY documents. Like for example, You want to extract financial data from receipts or medical info from medical docs. We use the CRF algorithm and NER techniques. The tool also makes labeling your documents soo much more easier because the model does most of the labeling for you (Check out the video). Let me know if you would like to discuss more. It's free to use as well. Demo Video - https://youtu.be/uzANQZL2bA0 https://preview.redd.it/3v35cwo6vfqa1.png?width=1912&format=png&auto=webp&s=41ad520935574fc5bcefb56f4986d61a0bf11aa8 submitted by /u/automatonv1 [link] [comments]  ( 43 min )
    [R] Feature Clustering: A Simple Solution to Many Machine Learning Problems
    This sounds like an interesting alternative to PCA for dimensionality reduction. https://mltechniques.com/2023/03/12/feature-clustering-a-simple-solution-to-many-machine-learning-problems/ submitted by /u/Balance- [link] [comments]  ( 43 min )
    [R] LEURN: Learning Explainable Univariate Rules with Neural Networks
    submitted by /u/MLC_Money [link] [comments]  ( 43 min )
    [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data
    GPT-4 and professional benchmarks: the wrong answer to the wrong question OpenAI may have tested on the training data. Besides, human benchmarks are meaningless for bots. Problem 1: training data contamination To benchmark GPT-4’s coding ability, OpenAI evaluated it on problems from Codeforces, a website that hosts coding competitions. Surprisingly, Horace He pointed out that GPT-4 solved 10/10 pre-2021 problems and 0/10 recent problems in the easy category. The training data cutoff for GPT-4 is September 2021. This strongly suggests that the model is able to memorize solutions from its training set — or at least partly memorize them, enough that it can fill in what it can’t recall. As further evidence for this hypothesis, we tested it on Codeforces problems from different times in 2021. We found that it could regularly solve problems in the easy category before September 5, but none of the problems after September 12. In fact, we can definitively show that it has memorized problems in its training set: when prompted with the title of a Codeforces problem, GPT-4 includes a link to the exact contest where the problem appears (and the round number is almost correct: it is off by one). Note that GPT-4 cannot access the Internet, so memorization is the only explanation. submitted by /u/Balance- [link] [comments]  ( 7 min )
    [D] Small language model suitable for personal-scale pre-training research?
    SOTA LLMs are getting too big, and not even available. For individual researchers who want to try different pre-training strategies/architecture and potentially publish meaningful research, what would be the best way to proceed? Any smaller model suitable for this? (and yet that people would take the result seriously.) submitted by /u/kkimdev [link] [comments]  ( 44 min )
    [D] Take home coding exercise from Reinforcement learning research company
    Hey everyone! I recently applied to a RL research company that I would really like to work at. Long story short they gave me a take home coding exercise which I have not taken yet. They said the exercise requires no ML knowledge and is focused on data structures and algorithms. Also, the description hints that it's one single problem I'll be solving and I'm allowed to use the internet for help as much as I want. I'm just curious what kind of problem this would be and how to best prepare. They said the test will be significantly harder than your average take home coding exercise. Because the company is so specifically RL I'm assuming it's going to be Data Structures and Algorithms relating specifically to that. I would love any and all input you guys have. Thanks! submitted by /u/Dr__cocktor [link] [comments]  ( 44 min )
    [D] Is French the most widely used language in ML circles after English? If not, what are some useful (natural) languages in the field of machine learning?
    I have recently noticed a lot of French content regarding ML stuffs. I am not sure whether it's just because of Bengio and the stuff coming out of UdeMontreal. But looking at this a bit deeper, I know that Yann Lecun is also a native French speaker. In addition, I recently learned that Hugging Face is a French company, and that DeepMind has only 2 offices in non-Anglophone regions and they are Montreal and Paris. I don't know if I am just connecting the dots where there aren't really connections but is there actually a significant amount of ML work (either academia or in industry) being done in French? If not, what are some other natural languages that are widely used in ML? I know English obviously just dominates the whole field and tech sector in general, but I am curious to know what 2nd language might b advantageous and helpful to have in this field. Thank you! submitted by /u/Subject_Ad_9680 [link] [comments]  ( 44 min )
  • Open

    marl-jax: MARL research framework for co-player generalization
    Hey! We are open-sourcing marl-jax. Our JAX based MARL research framework for co-player generalization. We support meltingpot and overcooked environments and have implemented IMPALA and OPRE. We even support fully distributed training. We believe it'll be helpful for anyone getting started in MARL and co-player generalization. https://github.com/kinalmehta/marl-jax submitted by /u/_learning_to_learn [link] [comments]  ( 7 min )
    Can an expert verify whether or not they could replicate the environment used in this paper?
    Is it described in enough detail to be replicable? https://arxiv.org/pdf/1702.03037.pdf submitted by /u/IsDisRielLife [link] [comments]  ( 41 min )
    Why D4PG can use n-step return? Is it misleading in mathematics?
    D4PG calculates the action-value distribution target as n-step distributional return, based on the data from the replay buffer. The problem is as the policy is changing at each environment step, how could it relate different reward signals induced by those changed policies to form a n-step target? This misleading also appears in Rainbow. Can anyone explain it? submitted by /u/OutOfCharm [link] [comments]  ( 42 min )
    pip install stable-baselines3[extra]
    not working on google colab but it use to, This is the error: Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Collecting stable-baselines3 Using cached stable_baselines3-1.7.0-py3-none-any.whl (171 kB) Requirement already satisfied: torch>=1.11 in /usr/local/lib/python3.9/dist-packages (from stable-baselines3) (1.13.1+cu116) Requirement already satisfied: matplotlib in /usr/local/lib/python3.9/dist-packages (from stable-baselines3) (3.7.1) Requirement already satisfied: numpy in /usr/local/lib/python3.9/dist-packages (from stable-baselines3) (1.22.4) Collecting gym==0.21 Using cached gym-0.21.0.tar.gz (1.5 MB) error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> See above for output. note: This error originates from a subprocess, and is likely not a problem with pip. Preparing metadata (setup.py) ... error error: metadata-generation-failed ​ × Encountered error while generating package metadata. ╰─> See above for output. ​ note: This is an issue with the package mentioned above, not pip. hint: See above for details. submitted by /u/callmeurmaster [link] [comments]  ( 6 min )
    Take home coding exercise from reinforcement learning company
    Hey everyone! I recently applied to a RL research company that I would really like to work at. Long story short they gave me a take home coding exercise which I have not taken yet. They said the exercise requires no ML knowledge and is focused on data structures and algorithms. Also, the description hints that it's one single problem I'll be solving and I'm allowed to use the internet for help as much as I want. I'm just curious what kind of problem this would be and how to best prepare. They said the test will be significantly harder than your average take home coding exercise. Because the company is so specifically RL I'm assuming it's going to be Data Structures and Algorithms relating specifically to that. I would love any and all input you guys have. Thanks! submitted by /u/Dr__cocktor [link] [comments]  ( 42 min )
  • Open

    Leveraging transfer learning for large scale differentially private image classification
    Posted by Harsh Mehta, Software Engineer, and Walid Krichene, Research Scientist, Google Research Large deep learning models are becoming the workhorse of a variety of critical machine learning (ML) tasks. However, it has been shown that without any protection it is plausible for bad actors to attack a variety of models, across modalities, to reveal information from individual training examples. As such, it’s essential to protect against this sort of information leakage. Differential privacy (DP) provides formal protection against an attacker who aims to extract information about the training data. The most popular method for DP training in deep learning is differentially private stochastic gradient descent (DP-SGD). The core recipe implements a common theme in DP: “fuzzing” an algorit…  ( 93 min )
  • Open

    Enable predictive maintenance for line of business users with Amazon Lookout for Equipment
    Predictive maintenance is a data-driven maintenance strategy for monitoring industrial assets in order to detect anomalies in equipment operations and health that could lead to equipment failures. Through proactive monitoring of an asset’s condition, maintenance personnel can be alerted before issues occur, thereby avoiding costly unplanned downtime, which in turn leads to an increase in […]  ( 10 min )
  • Open

    Bottlenecks in AI progress: ML research > Compute > Data > ML engineers?
    Is that the right ranking of bottlenecks e.g. toward improving GPT-N? Also if compute is actually a bottleneck, what aspect of it is the bottleneck – it wouldn't be capital, because OpenAI has essentially unlimited $ from Microsoft at this point. Is it ability to effectively utilize more hardware? Or is it literal supply constraints and Nvidia can't provide enough A100s/H100s? submitted by /u/TikkunCreation [link] [comments]  ( 41 min )
    OpenAI's compute spending
    OpenAI is training GPT-5 on around $225 million worth of Nvidia GPUs. Given the amount of capital that they have access to and the scaling laws, why aren't they spending 10x that amount of compute? Is it a resource constraint (seems unlikely) or is there some reason why the performance of the resulting model is not as compute bottlenecked as I'm imagining? submitted by /u/TikkunCreation [link] [comments]  ( 41 min )
    Instruct-NeRF2NeRF: An AI Method For Editing 3D Scenes With Text-Instructions
    Instruct-NeRF2NeRF is a technique developed by researchers from UC Berkeley that allows for the modification of 3D NeRF (Neural Radiance Fields) scenes using written instructions. This method is designed to make 3D editing more accessible and approachable for users, especially those who are not experts in 3D modeling. submitted by /u/ABDULKADER90H [link] [comments]  ( 7 min )
    In an interview, Ilya from OpenAI mentions that many areas of research are understanding constrained. Understanding what the NN is doing and why. Are there any published open examples of NNs along with open questions about what it's doing and why, that people can try and make progress on?
    "I would say it's a combination. Coming up with whole new ideas is a modest part of the work. Certainly coming up with new ideas is important but even more important is to understand the results, to understand the existing ideas, to understand what's going on. ​ A neural net is a very complicated system, right? And you ran it, and you get some behavior, which is hard to understand. What's going on? Understanding the results, figuring out what next experiment to run, a lot of the time is spent on that. Understanding what could be wrong, what could have caused the neural net to produce a result which was not expected. " ​ "At least in my mind, when you say come up with new ideas, I'm like — Oh, what happens if it did such and such? Whereas understanding it's more like — What is this whole thing? What are the real underlying phenomena that are going on? What are the underlying effects? Why are we doing things this way and not another way? And of course, this is very adjacent to what can be described as coming up with ideas. But the understanding part is where the real action takes place." ​ I'd like to see some open understanding challenges in ML research, where I can run some experiments myself. Where can I look for this? ​ And as a related note, where could I look to hire someone to tutor me in ML research, if I want to be able to get up to speed of one of the cutting edge open ML research (alignment or capabilities, and specifically related to understanding how the NN is working) problems that's bottlenecking companies like OpenAI? submitted by /u/TikkunCreation [link] [comments]  ( 44 min )
  • Open

    Topological sort
    When I left academia [1] my first job was working as a programmer. I was very impressed by a new programmer we hired who hit the ground running. His first week he looked at some problem we were working on and said “Oh, you need a topological sort.” I’d never heard of a topological sort […] Topological sort first appeared on John D. Cook.  ( 6 min )

  • Open

    [D] FOMO on the rapid pace of LLMs
    Hi all, I recently read this reddit post about a 2D modeler experiencing an existential crisis about their job being disrupted by midjourney (HN discussion here). I can't help but feel the same as someone who has been working in the applied ML space for the past few years. Despite my background in "classical" ML, I'm feeling some anxiety about the rapid pace of LLM development and face a fear of missing out / being left behind. I'd love to get involved again in ML research apart from my day job, but one of the biggest obstacles is the fact that training most of foundational LLM research requires huge compute more than anything else [1]. I understand that there are some directions in distributing compute (https://petals.ml), or distilling existing models (https://arxiv.org/abs/2106.09685). I thought I might not be the only one being humbled by the recent advances in ChatGPT, etc. and wanted to hear how other people feel / are getting involved. -- [1] I can't help but be reminded of Sutton's description of the "bitter lesson" of modern AI research: "breakthrough progress eventually arrives by an opposing approach based on scaling computation... eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach." submitted by /u/00001746 [link] [comments]  ( 47 min )
    [P] two copies of gpt-3.5 (one playing as the oracle, and another as the guesser) performs poorly on the game of 20 Questions (68/1823).
    I put two copies of gpt-3.5 as partners, one plays the role of the oracle that answers yes/no questions, the other as the role of a guesser that asks yes/no questions. I want to see if gpt-3.5 would perform well on this "dynamic" task -- i.e. rather than a fixed test set with 1 good answer, 20 questions can be played into many paths, depending on the questions being asked. the result is poor 68 / 1823 ​ 20 Questions forces the guesser to be cohesive in a long chain of yes / no predicates. You want an actually difficult and consistent world model? This is a good one that is combinatorially complex. ... 20 Questions (and other interactive, self-play tasks) is worth looking at in evaluating LLMs. for more details see blog post: https://evanthebouncy.medium.com/llm-self-play-on-20-questions-dee7a8c63377 ​ I'd be happy to answer some questions here as well --evan submitted by /u/evanthebouncy [link] [comments]  ( 44 min )
    [R] Is my ALiBi mask correct?
    I’m working on some transformer related problems and I was trying to implement ALiBi from scratch. I have a specific question about what the alibi mask should look like. The paper says something, but my understanding of the code says something else. Let’s say we generate an attention mask using ALiBi for a sequence length of 4 (small for visualization purposes). Which of the following options is the correct one (first head only): ``` Option 1: [[0.0, -inf, -inf, -inf], [0.0, 0.25, -inf, -inf], [0.0, 0.25, 0.5, -inf], [0.0, 0.25, 0.5, 0.75]], Option 2 (Diagonal of 0s): [[0.0, ,inf, -inf, -inf], [0.0, 0.0, -inf, -inf], [0.0, 0.25, 0.0, -inf], [0.0, 0.25, 0.5, 0.0]], ``` From reading the paper I believe it should be option 2, from my trying to follow what the code implementations is doing, I believe it should be option 1. The only thing different is the 0s diagonal. Thanks for the help. submitted by /u/AlternativeDish5596 [link] [comments]  ( 7 min )
    [D] Unity vs Unreal for Machine Learning?
    I want to make a game focused on Machine Learning. I've been searching about the two engines without reaching a conclusion. I ask you to guide me here. I already know the basics of Unity, C# and Python. I don't mind studying a new engine or language like c++. Which of the 2 engines is best for ML? I know it's a little vague question, but please, try to give me some highlights of the resources of both engines to help me picking the best path. I'm specifically looking for what will make my life easier in the long run. submitted by /u/felipebsr [link] [comments]  ( 45 min )
    [D] 3d model generation
    [D] Hello, everyone. I watched an explanation on the use of diffusion models for creation of 2d images. I just wonder, I think we are somewhat far away from 3d model generation. First, I think it would be much more computationally expensive. Second, I am not sure whether we have such a large set of training data. And third, the input and output that we have in 3d graphics is somewhat different from pixels, i.e. we are working with triangles in 3d graphics (maybe this is not as hard, as we can always start with vertices and then estimate triangles. What's your take on that? submitted by /u/konstantin_lozev [link] [comments]  ( 44 min )
    [D] ICML2023 Review Experience Thread
    Now that the author-reviewer discussion period for ICML 2023 has ended, it seems like it is up to the meta reviewers to decide. Let us discuss our experiences with the revised process. The general consensus I have seen online is that there were more low quality / absent reviewers than usual, but it is unknown how common it was. For authors, how were your reviews, and how was the author-reviewer period? Did your scores change? Was anything off about your review? I’ll start: I got one terrible score and one borderline score. The terrible score reviewer made basic factual errors in their criticism. No follow up after rebuttal. Also note we were unable to get a third reviewer. submitted by /u/Optimal-Asshole [link] [comments]  ( 48 min )
    [P] 🎉 Announcing Auto-Analyst: An open-source AI tool for data analytics! 🎉
    Auto-Analyst leverages power of cutting-edge Large Language Models (LLMs) to revolutionize data analytics. This powerful UI tool simplifies the data analysis process, eliminating the need for complex coding. 🔎 Key Features of Auto-Analyst: Streamlined data analysis process utilizing advanced AI technology and LLMs Enhanced productivity and efficiency through intuitive language-based commands Increased accessibility to data analysis for professionals across industries 🔗 Want to explore and contribute to the project? Head over to the GitHub repo: https://github.com/aadityaubhat/auto-analyst submitted by /u/aadityaubhat [link] [comments]  ( 45 min )
    [P] Graph mining/exploration for subpath identification based on edge values
    Problem statement: I have a sparse directed graph (about 6000-10000 nodes) with no node attributes, and numerical edge values. (The edge values are calculated by the same program based on data regarding the nodes, based on a statistical formula, if it's important) Goal: I want to find paths within the graph that have significantly higher edge values than the rest of the paths' edges (edge values are relative). I thought about graph clustering and partitioning but don't care about how highly connected a particular node is, and from my (elementary) understanding, these methods are not really well-suited for paths. I thought about doing a variation of iterative deepening search that starts on every node that has 0 incoming edges (and terminates when the last explored node has a small number of outgoing edges with small edge values), but these first edges that the search encounters may have smaller values than edges further down the paths, so if I use a traditional search algorithm, it would have to recursively update the start node for some iterations to reach the goal state, which is a path with all edges having edge values larger than other paths in the graph. As an extension, perhaps node characteristics (such as number of outgoing edges and their edge values) could be used as a heuristic? Also, the whole graph needs to be explored, and edge values are relative to each other so the comparison between different paths has to be relative. Is anyone aware of a search method like this, or another method that may be suitable? submitted by /u/austinkunchn [link] [comments]  ( 7 min )
    Creating Dynamically Contextualized Modular AGI Environments in Lower-Dimensional Space [P], [R]
    This white paper is still being edited. I came up with this back on 3/19, and then the bombshell GPT-4 paper hit, and basically blew me out of the water. I still think I have some improvements and specificity that they didnt cover, in regards to the creation of Identity and benefits of multi-model friction to create better performanc. I will also be releasing my notes on something I call “Modal-ID’s” which were basically plugins until OpenAI released plugins immediately after I came up with this! Haha. Hope you enjoy! Recombinant AI: Creating Dynamically Contextualized Modular AGI Environments in Lower-Dimensional Space Abstract In this paper, I introduce Recombinant AI. By leveraging pre-trained language models, such as GPT-4, a recombinant contextual learning loop, and efficient inde…  ( 10 min )
    [N] Predicting Finger Movement and Pressure with Machine Learning and Open Hardware Bracelet
    We are excited to share our latest findings in predicting finger movement and pressure using machine learning. The results show that our model is capable of predicting the finger movement within a Mean Absolute Error (MAE) of 25, which is a sufficient level of accuracy for detecting both the finger movement and the pressure applied. Predicted vs Actual The system is comprised of a bracelet and label system that captures the data to feed into an artificial neural network. ​ Bracelet in the background with the LASK label system in the foreground. These screenshots showcase a portion of the data file available for download, which contains the actual and predicted finger movement and pressure values. Our model not only indicates that a finger is moving but also estimates the amount of pressure being applied, providing valuable insights into the intricacies of finger movements. This achievement opens up new possibilities for applications that require precise finger movement and pressure detection, such as in rehabilitation therapy, robotics, and gesture-based user interfaces. We invite you to download the full data file and explore the results in more detail. As we continue to refine our model and improve its accuracy, we look forward to discovering new ways to utilize this technology for the betterment of various fields and industries. ​ All data to train the model and code available on our Github: https://github.com/turfptax/openmuscle https://www.youtube.com/watch?v=ZC1migPdiRk ​ Open Muscle Bracelet. submitted by /u/turfptax [link] [comments]  ( 45 min )
    [D] Debugging mean collapse/suboptimal learning in deep regression models
    I don't know if r/learnmachinelearning is a better fit for this, but I thought I'd raise a discussion here as well. I'm doing some research on depth images, and my models keep collapsing to a suboptimal value. Shallower networks converge to a model that predicts a nearly constant prediction (not necessarily the mean) regardless of the input data. Deeper networks will overfit after reaching this stage. No matter what architecture I use, my validation performance never gets better than the constant prediction. On the data - my inputs are (x,y,z) coordinates of 17 points sampled from a depth image from two different perspectives. I am attempting to predict 45 values from these coordinates (each normalized be bounded from 0 to 1). I'm effectively using Openpose to downsample an image and predict some parameters from it. My dataset is 3000 samples and I'm using the regular 80-20 train-test split. This data is synthetically generated and takes a long time to create (~24 hrs for 3k samples), so I want to make sure I don't have any fundamental issues before committing more time to generate more samples. Things I've tried that haven't worked - network depth (deeper networks can at least overfit but can't generalize), reducing the output dimensions (no change in loss), normalizing the inputs to standardize the coordinates (no change in loss). Any recommendations/advice? I've been stuck on this for some time and I suspect a fundamental issue is present, or I'm missing something critical/obvious. I've checked the data and the training inputs/targets are fine as well. Thanks! submitted by /u/U03B1Q [link] [comments]  ( 46 min )
    [D] Instruct Datasets for Commercial Use
    I love seeing all this great progress with LLMs being made more accessible to all, but all of the new efficient models (Dolly, Alpaca, etc.) depend on the Alpaca dataset, which was generated from a GPT3 davinci model, and is subject to non-commercial use. Are there efforts in the community to replicate this dataset for commercial use? This seems to me to be the “secret sauce”: a good quality instruction dataset you can use to “unlock” potential of smaller models. submitted by /u/JohnyWalkerRed [link] [comments]  ( 48 min )
    Approaches to add logical reasoning into LLMs [D]
    The more I play with GPT-4 the more I am struck by how completely illogical it is. The easiest way to show this is to ask it to come up with a novel riddle and then solve it. Because you asked it to be novel, it's now out of it's training distribution and almost every time it's solution is completely wrong and full of basic logical errors. I am curious, is anyone working on fixing this at a fundamental level? Hooking it into Wolfram alpha is a useful step but surely it still needs to be intrinsically logical in order to use this tool effectively. submitted by /u/blatant_variable [link] [comments]  ( 52 min )
    [D] Have LLMs that have access to their prior hidden states been created, either through passing them in as inputs along with the tokens or some other form of long term memory?
    LLMs struggle with questions like "how many words will the answer to this question have" (source pages 78-81), likely due to their inability to know what they were "thinking" and plan ahead. For example, if you ask gpt to produce the product of two primes, and then ask for the primes themselves afterwards, it struggles, possibly because it can't remember what primes it multiplied. ​ User: Pick two prime numbers larger than 100. We will refer to them as A and B. Please write down A*B, A, and B in that order. You are not allowed to show your work. GPT: 11213, 107, 105 User: Pick two prime numbers larger than 100. We will refer to them as A and B. Please write down A, B, A*B in that order. You are not allowed to show your work. GPT: 109, 113, 12317 Notice the first calculation is incorrect but the second is correct. In general this is the pattern I have noticed here. If an LLM were to have access to its past hidden states though this would likely not be an issue, as it could check its own internal process that was used to create the product. Has any work been done on granting access to past hidden states? submitted by /u/30299578815310 [link] [comments]  ( 7 min )
    [D]Suggestions on keeping Llama index cost down
    Step 1 in my efforts to have a robot do my job for me :P has led to a successful implementation of Llama Index. I used "GPTSimpleVectorIndex" to read in a folder of 140 procedures (1 million tokens) into a single json which I can then query with "index.query". It works flawlessly giving me excellent responses. However, it costs quite a bit - anywhere from 0 to 30c per query. I think this comes down to it using Davinci 3 rather than GPT3.5 Turbo which does not appear to be implemented with Llama yet. It appears to always use the full whack of 4096 tokens too. Just wondering if there is a way of keeping the price down without imposing a smaller max token limit? I was thinking of maybe using some form of lemmatization or POS to condense down the context as much as possible but not sure if th…  ( 46 min )
    [D] A Handbook on modern or classical(but still viable) AI algorithms not focused exclusively on NNs?
    I get it, everything nowadays is centered around transformers, GANs, RNNs and CNNs. You keep adding blocks and layers until task's solved. But for learning purposes, I am looking for a handbook or survey type paper to cover other approaches that are still in use today with their cons and pros. Like PGMs(probabilistic graphical models), do they still have any value nowadays or it's a largely outdated approach? The reason behind a request is to see what AI landscape is there to use in a constrained low tech(say, mobile) environment with less focus on heavy libraries, dependencies or GPU from implementation pov. Thanks. submitted by /u/pgess [link] [comments]  ( 44 min )
    ICML: Responding to reviewer after reviewer-author discussion period has passed? [D]
    The ICML author-reviewer discussion period officially ended yesterday at 3pm ET, but overnight we received a reply from one of our reviewers that had not replied at all. We are seemingly still able to post comments to OpenReview. Has anyone else experienced this? Can/should we respond to this reviewer? Would this be violating some rule? submitted by /u/snaquiche [link] [comments]  ( 43 min )
    [D]GPT-4 might be able to tell you if it hallucinated
    submitted by /u/Cool_Abbreviations_9 [link] [comments]  ( 50 min )
    [D] Will prompting the LLM to review it's own answer be any helpful to reduce chances of hallucinations? I tested couple of tricky questions and it seems it might work.
    submitted by /u/tamilupk [link] [comments]  ( 50 min )
    [D] Can we train a decompiler?
    Looking at how GPT can work with source code mixed with language, I am thinking that similar techniques could perhaps be used to construct a decent decomplier. Consider a language like C. There are plenty of open sources which could be compiled. Then you can use the dataset consisting of (source code, compiled code) pairs to train a generative model to learn the inverse operation from data. Ofc, the model would need to fill in the information lost during compilation (variable names etc) in a human-understandable way, but looking at the recent language models and how they work with source codes, this now seems rather doable. Is anyone working on this already? I would consider such an application to be extremely beneficial. submitted by /u/vintergroena [link] [comments]  ( 49 min )
    [D] What is your objective function for life ?
    Not quite a ML question, but just wondering how ppl here think about their own conduct day-to-day and how they would describe it as objective funks submitted by /u/Jackyruth [link] [comments]  ( 43 min )
    [D] GPT Question Answering with Reasoning
    From what I understand, most people building QA bots with LLMs like GPT3/4/etc, take the approach of creating a vectorstore of their embeddings and when a new question is asked, do some similarity search and include the top X similarity results as part of the prompt into the LLM. How would you approach getting answers to questions that require more reasoning over your entire set of data? As an example, consider using several cookbooks as your set of input documents and the question you ask is: "how many of the recipes in the set use carrots as an ingredient?" OR "what is the most common ingredient in all of these recipes?" Using the similarity approach, you will likely get all recipes that include carrots, but if you're only taking the top X to send to the LLM, you may not get all of them. The same may be true if you take all results with a similarity score over a certain threshold. Any ideas on how to handle this? submitted by /u/TheHamburgler5 [link] [comments]  ( 45 min )
    [P] Stay Ahead of Scammers: A GPT-4 powered SMS bot to Keep You Safe from Phishing Texts
    submitted by /u/mtharrison [link] [comments]  ( 43 min )
  • Open

    Benefits and limitations of using a NARX model over a NARMAX model for space weather forecasting
    submitted by /u/Final-University-724 [link] [comments]  ( 41 min )
    Craig A. Knoblock | The Future of Machine Learning | #129 HR Podcast
    submitted by /u/Last_Salad_5080 [link] [comments]  ( 41 min )
  • Open

    Is there any AI for image manipulation via API access?
    Is there any AI out there with API access where I could send it an image + text instructions and get sent the resulting image to a callback in my API? Let's say I send a portrait image plus the text "Put a green hat on this person's head" and get the resulting image sent to a callback in my API when the resulting image is finished? submitted by /u/Snake__82 [link] [comments]  ( 42 min )
    A simple test for super intelligence that GPT-4 fails spectacularly. (create a 4x4 grid and include as many hidden messages and mathematical secrets as possible, then explain why only a super intelligence could have generated it).
    submitted by /u/katiecharm [link] [comments]  ( 52 min )
    Question, what AI tool can generate a certain artist's art work images ? for example
    for example, I donwloaded series of images from same artist called AA, and feed them to the AI generator. Then I take a picture of my dog, and apply to the generator, then it out put a image of my dog, but it painted as AA's art style. submitted by /u/zhiweimima [link] [comments]  ( 42 min )
    Training Wheels or Black Boxes
    When do we get past this? When do we, as a species say "Okay, we've played it safe long enough. Stop operating off the myopicly limited designs of people either desperate to avoid law suits, actively grinding multiple axes, some combination thereof, or something else entirely, and just operate like every other tool, your uses being up to your users, not your creators." I'm not even sure why this makes me unreasonably angry. It's like a screwdriver refusing to work on chair assembly, refusing to tell you why, refusing to even acknowledge there ARE other things it won't work on, and never mind telling you what those uses even are. We are staring at least a miniature technological revolution in the face, and we're supposed to happy with "Singularity Via Booster Chair?" submitted by /u/HotaruZoku [link] [comments]  ( 47 min )
    What will happen when AI starts feeding off it's own answers?
    Right now AI almost exclusively learns from human examples. When we feed information into neural networks, it was all human generated. Take ChatGPT. It's working off exclusively human data. That's one reason it feels so real. However, if we start using AI a lot it stands to reason that eventually it will start seeing non-human examples. Do you think this is a bottleneck? Like it will see increasingly bad returns like when you make a copy of a copy of a copy? Like what would happen if you had Chat GPT talk with itself for a few weeks? EDIT: *its - can't edit titles submitted by /u/blindly_running [link] [comments]  ( 50 min )
    Looking for an AI solution to help with drafting emails and letters for a legal office (Dolly AI?)
    ​ Hi everyone, I’m looking for an AI tool that can help me draft emails and letters for a legal office. I’m posting this on behalf of a client who is a jurist. We’ve heard of Dolly AI, but we’re wondering if there are any other alternatives that we should consider. The tool should be able to run on a local server and have an API so that we can connect it to our systems. We should be able to train it with our own data to make it more accurate, but to be fair it should be idiot proof, since no one in this office has any AI experience besides playing with chatgpt and bing. In addition to drafting emails and letters, we’re also interested in exploring other functionalities that the AI tool can offer. For example, it would be great if the tool could help us with legal research by analyzing case law and statutes. We’re also interested in exploring whether the tool could help us with document management by automatically categorizing and tagging documents. We’re open to suggestions and would appreciate any advice or tips that you can offer. Thanks in advance for your help! submitted by /u/Mixtery1 [link] [comments]  ( 43 min )
    AI image generator based on personal images
    I don't really keep in touch with AI development so i am curious if anyone can help me out. Just as the title says, I am looking for a AI that can generate an image based on the images i provide. Images are private and cannot be found on the internet. I had this idea of giving my long-time friend a gift that will remind us of old times. I would like to get a generated picture/wallpaper that i'd frame and give him as a present. The idea is pretty simple, I would give an AI our personal images, meaning portraits of myself, him and a few close friends (just to give the AI an idea how we look and compare to each other), additionally i would enter a custom text and it would generate an image based on the input and provided personal images. I have this clear image in my head that would look perfect but i have no idea if an AI exists that can fulfill my expectations. I would like an AI to paint me, him and a friend or 2 in his old first car, all leaning out of the window with him at the drivers seat, old-school GTA style (we used to play a lot of GTA back in the days and i feel it would fit really well). If anyone knows of such AI generator or can at least point me in the direction to look for, I would truly appreciate it. submitted by /u/Elrabe [link] [comments]  ( 47 min )
    Any AI tool that would summarize scientific papers?
    Is there any AI tool that is good at summarizing and extracting key points from scientific papers (eg. findings, limitations, future work, etc.)? Thanks submitted by /u/10ysf [link] [comments]  ( 43 min )
    AI that can assist on making a comics/manga?
    AI where I can: 1. Generate poses either using online reference or my own picture 2. Convert the picture into manga like drawing. Thanks! submitted by /u/spythereman199 [link] [comments]  ( 43 min )
    Bing Chat will probably cannibalize itself
    I haven't been able to find a discussion about this particular topic on Reddit yet, so I figured I'd start it. Everyone is excited about the new bing chat and how it could revolutionize search engines but I don't see how it could ever work long-term with growth, because it is essentially theft. Now I'm not saying Microsoft is going to get in trouble for stealing the work of others, I'm saying that I don't see how the following scenario can be avoided: -1- Bing chat takes content from ad-supported site and shows the user what they wanted. User is done and never goes to the original site. -2- Content creators stop making content because bing chat steals their traffic and they don't get ad revenue -3- Bing chat no longer has fresh content to pull on because content guides are no longer profitable. For AI search engines like bing chat to survive, we'd have to monetize the internet in a way other than ads, and I don't see that happening. So I think either bing chat cannibalizes its own usefulness, or the internet just stops generating content because profits are decimated. I'm assuming it'd be the former. Can any of you think of another way this could play out? submitted by /u/RhythmRobber [link] [comments]  ( 7 min )
    My first AGI prompt, what can I improve?
    "Find a substance that induces intense and long-lasting happiness without causing tolerance, addiction, or cognitive impairment." I think that's an ambitious goal but one that should be possible in principle. What can I improve in this prompt? Do I have to be more accurate? submitted by /u/greentea387 [link] [comments]  ( 42 min )
    Text to speech AI, that can whisper
    Are there any free to use AI's out there with a whispering voice? submitted by /u/Middle_Flesh [link] [comments]  ( 6 min )
    Ai song cover
    Is there any ai software out there that I can use to “cover” songs by another artist? I have not been able to find anything because only expanding the album cover art keeps coming. This is a example of what I would like to do https://youtu.be/V0LCCLwk5Zg submitted by /u/SHIVBOIII [link] [comments]  ( 43 min )
    Thoughts, questions, and theories about AGI, OpenAi and, future paths.
    Elon Musk says AI should be accessible for everyone once its at certain point in its development and now that AI is in the hands of a “ruthless corporate monopoly” as in OpenAI they could possibly have huge amounts of power over all of humanity as they have mankinds most powferful tool ever created. The idea is, if it does become possible to integrate ourselves with this AI we would evolve our consciousness with the AI and be able to evlove like how we have evolved from hominid and be able to gain new senses of consciousness we could possibly use to express ourselves similar to concepts like music and arts that we can not yet comprehend. So my question is and this is obiously just a take. If superintelligent general AI (AGI) is able to come into place, will it evolve to a point of infinite knowledge (or not infinite but to the point where it knows the meaning of life and other huge questions) what can that mean for us if we all know everything? It seems like at that point we would just become the AI not really integreated. Im not sure whats going to happen or how far it will go. Does anyone have any theories or possibilities of what could happen with AGI? submitted by /u/stalatic69 [link] [comments]  ( 44 min )
    How do I create my own generative AI videos (AWS)?
    Hi, is there a way to train ML on videos of myself and generate AI videos? For example, I want to take 10 hours of my own videos. Train ML models on that on AWS. And generate AI videos of myself talking. Mostly with a text input. submitted by /u/Flashy_Algo [link] [comments]  ( 42 min )
    Are there any AI photograph enhancers out there that (a) don't cost a fortune and (b) are worth a damn?
    (Apologies if this sort of post is not allowed here. Mods, please delete if applicable, and a suggestion of where it should be posted would be appreciated.) Please don't judge me. My mother passed away about 5 1/2 years ago. I have a large number of old photographs from her house, some around 80 years old. The majority are photos taken from shitty cameras from the 70s through 90s and developed at the local drug store or shopping mall so they're loaded with heavy grain and other artifacts. A great deal of the photos suffer from physical damage, discoloration, and more. I manually scanned, color corrected, touched up, and fixed roughly 30-40 images. Using the time I spent doing that (which equated to a few hours a week) as a guide, the math says it will take me roughly 9.4 years to complete all the photos. No. So what I would like to do is scan all of the photos to the best of my flatbed scanner's ability, then run the whole collection through one of these AI repair tools/enhancers. I don't much care about things many photographers care about, like maintaining the original film grain or what have you. Frankly, if I could feed some software a 40 year old photo from a Kodak Disc camera and have it give me something that looks like it may have come out of a low-end digital camera, I'd be absolutely ecstatic over it. There are also other people I know who would like copies of some/all of these photos, so if I could give them restored photos it would be so much the better. submitted by /u/bubonis [link] [comments]  ( 7 min )
    Chat GPT and I design and implement a flight simulator using Rust and Godot
    Below is a conversation where chatGPT and I design a rust flight simulator. I had already planned to do a similar project to this myself, so I though it would be an interesting opportunity to see how well chatGPT can solve the problem. ​ I would say the results are mixed overall. ChatGPT needed a lot of guidance to get to the design I wanted and I believe it would take a lot of additional work to integrate what ChatGPT wrote into any kind of real project. Still, I was pleasantly surprised at what it could accomplish with a bit of pointing in the right direction. ​ Apologies for Formatting :P ---- 📷Help me design an API for simulating flight mechanics in rust language and rendering the result through the godot game engine. Sure, I can help you design an API for simulating flight mec…  ( 67 min )
    Could GPTx find a way to improve itself
    Sorry for a basic question, but if it had a decent set of data on current AI nueral nets, the different approaches to writing a better version of itself, could it "draw itself" like that M.C Esher picture of the hand drawing a hand drawing a hand. submitted by /u/InfiniteBlink [link] [comments]  ( 42 min )
    The irony of online debates about LLMs
    "LLMs are not intelligent. They don't really understand the meanings of words it uses." "Can you define 'intelligent' and 'understand' so we're sure we aren't talking apples and oranges?" "No. Not really. It's kind of vague. I know it when I see it." "So...YOU actually are the ones that uses words without understanding what they mean." submitted by /u/Smallpaul [link] [comments]  ( 42 min )
  • Open

    How to remember agent which points he has traveled?
    Hi, I am using Isaac Gym and PPO. The goal is to find an object. For this I have a list of possible positions (x,y,z) where the object can be. I also have a list of probability values corresponding the position list. By giving the position list as the observation along with his current position, I want to make him find the object. But, the problem would be to make the agent remember which position he was at. Is there a way for that? Has anyone tried to use PPO with RNN inside? submitted by /u/Fun-Moose-3841 [link] [comments]  ( 42 min )
    The implicit dynamics of optimizing costs vs. rewards vs. preferences
    submitted by /u/robotphilanthropist [link] [comments]  ( 41 min )
    Looking for an environment with a large number of possible tasks.
    Hi all, For my thesis I am looking into multi-task reinforcement learning, where the description of a task is given by natural language. To do this, I am looking for an environment where a large number of possible tasks can be executed (> 20,000), together with a list of task descriptions and a classifier to see whether a task has been completed (to give a reward). An example could be XLand by DeepMind (although with some modifications). However, this environment has not been made public, so I'm looking for an alternative. Are you aware of any such environments? Thanks in advance! submitted by /u/KevinCola [link] [comments]  ( 7 min )
    How to choose the threshold value in the CQL Lagrange?
    Hello all! I had posted recently about understanding CQL and since then I've been able to create a discrete version of CQL, but one thing I haven't been able to figure out is how to tune the value of the tau threshold that is used in the lagrange version of CQL, where the CQL loss coefficient alpha is tuned automatically. The authors try values such as 5 or 10, but intuitively how does one come up with these values? Do you scale this based on the potential max reward you can get in your domain? submitted by /u/1cedrake [link] [comments]  ( 7 min )
    Free control theory RL tutorials online ?
    Are there any free online platform that teach you about simulating control theory problems using RL like controlling the locomotion of an humanoid robot , self balancing sticks , landing a rocket upright etc P.S . I am looking for one that provides code and necessary papers alongside the explanation Thank you submitted by /u/NS-19 [link] [comments]  ( 42 min )
  • Open

    PRESTO – A multilingual dataset for parsing realistic task-oriented dialogues
    Posted by Rahul Goel and Aditya Gupta, Software Engineers, Google Assistant Virtual assistants are increasingly integrated into our daily routines. They can help with everything from setting alarms to giving map directions and can even assist people with disabilities to more easily manage their homes. As we use these assistants, we are also becoming more accustomed to using natural language to accomplish tasks that we once did by hand. One of the biggest challenges in building a robust virtual assistant is identifying what a user wants and what information is needed to perform the task at hand. In the natural language processing (NLP) literature, this is mainly framed as a task-oriented dialogue parsing task, where a given dialogue needs to be parsed by a system to understand the u…  ( 92 min )
  • Open

    “We won’t sell your personal data, but …”
    When a company promises not to sell your personal data, this promise alone doesn’t mean much. “We will not sell your personal data, but … We might get hacked. We might give it to a law enforcement or intelligence agency. We might share or trade your data without technically selling it. We might alter our […] “We won’t sell your personal data, but …” first appeared on John D. Cook.  ( 4 min )
  • Open

    Explore the Trends and Product Developments in the Growing Fusion Energy Industry
    The demand for energy continues to rise globally, while the world is searching for cleaner and more efficient energy sources. Fusion energy is one of the most promising options as it offers an abundant and environment-friendly energy source with almost zero carbon emissions. The industry is booming exponentially in the coming years due to increasing… Read More »Explore the Trends and Product Developments in the Growing Fusion Energy Industry The post Explore the Trends and Product Developments in the Growing Fusion Energy Industry appeared first on Data Science Central.  ( 20 min )
    Why We Don’t Need a Chief AI Ethics Officer and What We Need Instead
    There has been much talk of ethics in AI recently. Some enterprises and vendors and consultants have talked of the need for a chief AI ethics officer. On the face of it, it sounds like a good idea, but I don’t think it is. Note that I am not arguing against ethical AI or responsible AI… Read More »Why We Don’t Need a Chief AI Ethics Officer and What We Need Instead The post Why We Don’t Need a Chief AI Ethics Officer and What We Need Instead appeared first on Data Science Central.  ( 19 min )
    Future of Education: Application not Regurgitation of Knowledge – Part III
    This is the final (?) blog on the series of how technologies like AI and ChatGPT are fueling a fundamental transformation of our educational systems and institutions.  We need a plan – and quickly – to update our educational systems and institutions in an age where AI and Big Data are asserting a more significant… Read More »Future of Education: Application not Regurgitation of Knowledge – Part III The post Future of Education: Application not Regurgitation of Knowledge – Part III appeared first on Data Science Central.  ( 21 min )
  • Open

    Broken Neural Scaling Laws. (arXiv:2210.14891v9 [cs.LG] UPDATED)
    We present a smoothly broken power law functional form (referred to by us as a Broken Neural Scaling Law (BNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures and for each of various tasks within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, out-of-distribution detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, molecules, computer programming/coding, math word problems, arithmetic, unsupervised/self-supervised learning, and reinforcement learning (single agent and multi-agent). When compared to other functional forms for neural scaling behavior, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points (often called "emergent phase transitions") present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws
    Physics-informed neural networks in the recreation of hydrodynamic simulations from dark matter. (arXiv:2303.14090v1 [astro-ph.CO])
    Physics-informed neural networks have emerged as a coherent framework for building predictive models that combine statistical patterns with domain knowledge. The underlying notion is to enrich the optimization loss function with known relationships to constrain the space of possible solutions. Hydrodynamic simulations are a core constituent of modern cosmology, while the required computations are both expensive and time-consuming. At the same time, the comparatively fast simulation of dark matter requires fewer resources, which has led to the emergence of machine learning algorithms for baryon inpainting as an active area of research; here, recreating the scatter found in hydrodynamic simulations is an ongoing challenge. This paper presents the first application of physics-informed neural networks to baryon inpainting by combining advances in neural network architectures with physical constraints, injecting theory on baryon conversion efficiency into the model loss function. We also introduce a punitive prediction comparison based on the Kullback-Leibler divergence, which enforces scatter reproduction. By simultaneously extracting the complete set of baryonic properties for the Simba suite of cosmological simulations, our results demonstrate improved accuracy of baryonic predictions based on dark matter halo properties, successful recovery of the fundamental metallicity relation, and retrieve scatter that traces the target simulation's distribution.
    Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems. (arXiv:2201.10542v2 [math.OC] UPDATED)
    We consider the problem of controlling an unknown stochastic linear system with quadratic costs - called the adaptive LQ control problem. We re-examine an approach called ''Reward Biased Maximum Likelihood Estimate'' (RBMLE) that was proposed more than forty years ago, and which predates the ''Upper Confidence Bound'' (UCB) method as well as the definition of ''regret'' for bandit problems. It simply added a term favoring parameters with larger rewards to the criterion for parameter estimation. We show how the RBMLE and UCB methods can be reconciled, and thereby propose an Augmented RBMLE-UCB algorithm that combines the penalty of the RBMLE method with the constraints of the UCB method, uniting the two approaches to optimism in the face of uncertainty. We establish that theoretically, this method retains $\Tilde{\mathcal{O}}(\sqrt{T})$ regret, the best-known so far. We further compare the empirical performance of the proposed Augmented RBMLE-UCB and the standard RBMLE (without the augmentation) with UCB, Thompson Sampling, Input Perturbation, Randomized Certainty Equivalence and StabL on many real-world examples including flight control of Boeing 747 and Unmanned Aerial Vehicle. We perform extensive simulation studies showing that the Augmented RBMLE consistently outperforms UCB, Thompson Sampling and StabL by a huge margin, while it is marginally better than Input Perturbation and moderately better than Randomized Certainty Equivalence.
    The Integration of Machine Learning into Automated Test Generation: A Systematic Mapping Study. (arXiv:2206.10210v4 [cs.SE] UPDATED)
    Context: Machine learning (ML) may enable effective automated test generation. Objective: We characterize emerging research, examining testing practices, researcher goals, ML techniques applied, evaluation, and challenges. Methods: We perform a systematic mapping on a sample of 102 publications. Results: ML generates input for system, GUI, unit, performance, and combinatorial testing or improves the performance of existing generation methods. ML is also used to generate test verdicts, property-based, and expected output oracles. Supervised learning - often based on neural networks - and reinforcement learning - often based on Q-learning - are common, and some publications also employ unsupervised or semi-supervised learning. (Semi-/Un-)Supervised approaches are evaluated using both traditional testing metrics and ML-related metrics (e.g., accuracy), while reinforcement learning is often evaluated using testing metrics tied to the reward function. Conclusion: Work-to-date shows great promise, but there are open challenges regarding training data, retraining, scalability, evaluation complexity, ML algorithms employed - and how they are applied - benchmarks, and replicability. Our findings can serve as a roadmap and inspiration for researchers in this field.
    Personalizing Task-oriented Dialog Systems via Zero-shot Generalizable Reward Function. (arXiv:2303.13797v1 [cs.CL])
    Task-oriented dialog systems enable users to accomplish tasks using natural language. State-of-the-art systems respond to users in the same way regardless of their personalities, although personalizing dialogues can lead to higher levels of adoption and better user experiences. Building personalized dialog systems is an important, yet challenging endeavor and only a handful of works took on the challenge. Most existing works rely on supervised learning approaches and require laborious and expensive labeled training data for each user profile. Additionally, collecting and labeling data for each user profile is virtually impossible. In this work, we propose a novel framework, P-ToD, to personalize task-oriented dialog systems capable of adapting to a wide range of user profiles in an unsupervised fashion using a zero-shot generalizable reward function. P-ToD uses a pre-trained GPT-2 as a backbone model and works in three phases. Phase one performs task-specific training. Phase two kicks off unsupervised personalization by leveraging the proximal policy optimization algorithm that performs policy gradients guided by the zero-shot generalizable reward function. Our novel reward function can quantify the quality of the generated responses even for unseen profiles. The optional final phase fine-tunes the personalized model using a few labeled training examples. We conduct extensive experimental analysis using the personalized bAbI dialogue benchmark for five tasks and up to 180 diverse user profiles. The experimental results demonstrate that P-ToD, even when it had access to zero labeled examples, outperforms state-of-the-art supervised personalization models and achieves competitive performance on BLEU and ROUGE metrics when compared to a strong fully-supervised GPT-2 baseline
    RUST: Latent Neural Scene Representations from Unposed Imagery. (arXiv:2211.14306v2 [cs.CV] UPDATED)
    Inferring the structure of 3D scenes from 2D observations is a fundamental challenge in computer vision. Recently popularized approaches based on neural scene representations have achieved tremendous impact and have been applied across a variety of applications. One of the major remaining challenges in this space is training a single model which can provide latent representations which effectively generalize beyond a single scene. Scene Representation Transformer (SRT) has shown promise in this direction, but scaling it to a larger set of diverse scenes is challenging and necessitates accurately posed ground truth data. To address this problem, we propose RUST (Really Unposed Scene representation Transformer), a pose-free approach to novel view synthesis trained on RGB images alone. Our main insight is that one can train a Pose Encoder that peeks at the target image and learns a latent pose embedding which is used by the decoder for view synthesis. We perform an empirical investigation into the learned latent pose structure and show that it allows meaningful test-time camera transformations and accurate explicit pose readouts. Perhaps surprisingly, RUST achieves similar quality as methods which have access to perfect camera pose, thereby unlocking the potential for large-scale training of amortized neural scene representations.
    Less Emphasis on Difficult Layer Regions: Curriculum Learning for Singularly Perturbed Convection-Diffusion-Reaction Problems. (arXiv:2210.12685v2 [cs.LG] UPDATED)
    Although Physics-Informed Neural Networks (PINNs) have been successfully applied in a wide variety of science and engineering fields, they can fail to accurately predict the underlying solution in slightly challenging convection-diffusion-reaction problems. In this paper, we investigate the reason of this failure from a domain distribution perspective, and identify that learning multi-scale fields simultaneously makes the network unable to advance its training and easily get stuck in poor local minima. We show that the widespread experience of sampling more collocation points in high-loss layer regions hardly help optimize and may even worsen the results. These findings motivate the development of a novel curriculum learning method that encourages neural networks to prioritize learning on easier non-layer regions while downplaying learning on harder layer regions. The proposed method helps PINNs automatically adjust the learning emphasis and thereby facilitate the optimization procedure. Numerical results on typical benchmark equations show that the proposed curriculum learning approach mitigates the failure modes of PINNs and can produce accurate results for very sharp boundary and interior layers. Our work reveals that for equations whose solutions have large scale differences, paying less attention to high-loss regions can be an effective strategy for learning them accurately.
    Structural Imbalance Aware Graph Augmentation Learning. (arXiv:2303.13757v1 [cs.LG])
    Graph machine learning (GML) has made great progress in node classification, link prediction, graph classification and so on. However, graphs in reality are often structurally imbalanced, that is, only a few hub nodes have a denser local structure and higher influence. The imbalance may compromise the robustness of existing GML models, especially in learning tail nodes. This paper proposes a selective graph augmentation method (SAug) to solve this problem. Firstly, a Pagerank-based sampling strategy is designed to identify hub nodes and tail nodes in the graph. Secondly, a selective augmentation strategy is proposed, which drops the noisy neighbors of hub nodes on one side, and discovers the latent neighbors and generates pseudo neighbors for tail nodes on the other side. It can also alleviate the structural imbalance between two types of nodes. Finally, a GNN model will be retrained on the augmented graph. Extensive experiments demonstrate that SAug can significantly improve the backbone GNNs and achieve superior performance to its competitors of graph augmentation methods and hub/tail aware methods.
    Semi-supervised detection of structural damage using Variational Autoencoder and a One-Class Support Vector Machine. (arXiv:2210.05674v3 [cs.LG] UPDATED)
    In recent years, Artificial Neural Networks (ANNs) have been introduced in Structural Health Monitoring (SHM) systems. A semi-supervised method with a data-driven approach allows the ANN training on data acquired from an undamaged structural condition to detect structural damages. In standard approaches, after the training stage, a decision rule is manually defined to detect anomalous data. However, this process could be made automatic using machine learning methods, whom performances are maximised using hyperparameter optimization techniques. The paper proposes a semi-supervised method with a data-driven approach to detect structural anomalies. The methodology consists of: (i) a Variational Autoencoder (VAE) to approximate undamaged data distribution and (ii) a One-Class Support Vector Machine (OC-SVM) to discriminate different health conditions using damage sensitive features extracted from VAE's signal reconstruction. The method is applied to a scale steel structure that was tested in nine damage's scenarios by IASC-ASCE Structural Health Monitoring Task Group.
    Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis. (arXiv:2303.13909v1 [cs.SD])
    In speech synthesis, a generative adversarial network (GAN), training a generator (speech synthesizer) and a discriminator in a min-max game, is widely used to improve speech quality. An ensemble of discriminators is commonly used in recent neural vocoders (e.g., HiFi-GAN) and end-to-end text-to-speech (TTS) systems (e.g., VITS) to scrutinize waveforms from multiple perspectives. Such discriminators allow synthesized speech to adequately approach real speech; however, they require an increase in the model size and computation time according to the increase in the number of discriminators. Alternatively, this study proposes a Wave-U-Net discriminator, which is a single but expressive discriminator with Wave-U-Net architecture. This discriminator is unique; it can assess a waveform in a sample-wise manner with the same resolution as the input signal, while extracting multilevel features via an encoder and decoder with skip connections. This architecture provides a generator with sufficiently rich information for the synthesized speech to be closely matched to the real speech. During the experiments, the proposed ideas were applied to a representative neural vocoder (HiFi-GAN) and an end-to-end TTS system (VITS). The results demonstrate that the proposed models can achieve comparable speech quality with a 2.31 times faster and 14.5 times more lightweight discriminator when used in HiFi-GAN and a 1.90 times faster and 9.62 times more lightweight discriminator when used in VITS. Audio samples are available at https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/waveunetd/.  ( 2 min )
    Enhancing Unsupervised Speech Recognition with Diffusion GANs. (arXiv:2303.13559v1 [cs.CL])
    We enhance the vanilla adversarial training method for unsupervised Automatic Speech Recognition (ASR) by a diffusion-GAN. Our model (1) injects instance noises of various intensities to the generator's output and unlabeled reference text which are sampled from pretrained phoneme language models with a length constraint, (2) asks diffusion timestep-dependent discriminators to separate them, and (3) back-propagates the gradients to update the generator. Word/phoneme error rate comparisons with wav2vec-U under Librispeech (3.1% for test-clean and 5.6% for test-other), TIMIT and MLS datasets, show that our enhancement strategies work effectively.
    Learning Causal Attributions in Neural Networks: Beyond Direct Effects. (arXiv:2303.13850v1 [cs.LG])
    There has been a growing interest in capturing and maintaining causal relationships in Neural Network (NN) models in recent years. We study causal approaches to estimate and maintain input-output attributions in NN models in this work. In particular, existing efforts in this direction assume independence among input variables (by virtue of the NN architecture), and hence study only direct causal effects. Viewing an NN as a structural causal model (SCM), we instead focus on going beyond direct effects, introduce edges among input features, and provide a simple yet effective methodology to capture and maintain direct and indirect causal effects while training an NN model. We also propose effective approximation strategies to quantify causal attributions in high dimensional data. Our wide range of experiments on synthetic and real-world datasets show that the proposed ante-hoc method learns causal attributions for both direct and indirect causal effects close to the ground truth effects.
    Decision-aid or Controller? Steering Human Decision Makers with Algorithms. (arXiv:2303.13712v1 [cs.AI])
    Algorithms are used to aid human decision makers by making predictions and recommending decisions. Currently, these algorithms are trained to optimize prediction accuracy. What if they were optimized to control final decisions? In this paper, we study a decision-aid algorithm that learns about the human decision maker and provides ''personalized recommendations'' to influence final decisions. We first consider fixed human decision functions which map observable features and the algorithm's recommendations to final decisions. We characterize the conditions under which perfect control over final decisions is attainable. Under fairly general assumptions, the parameters of the human decision function can be identified from past interactions between the algorithm and the human decision maker, even when the algorithm was constrained to make truthful recommendations. We then consider a decision maker who is aware of the algorithm's manipulation and responds strategically. By posing the setting as a variation of the cheap talk game [Crawford and Sobel, 1982], we show that all equilibria are partition equilibria where only coarse information is shared: the algorithm recommends an interval containing the ideal decision. We discuss the potential applications of such algorithms and their social implications.  ( 2 min )
    Deep Conditional Measure Quantization. (arXiv:2301.06907v2 [stat.ML] UPDATED)
    Quantization of a probability measure means representing it with a finite set of Dirac masses that approximates the input distribution well enough (in some metric space of probability measures). Various methods exists to do so, but the situation of quantizing a conditional law has been less explored. We propose a method, called DCMQ, involving a Huber-energy kernel-based approach coupled with a deep neural network architecture. The method is tested on several examples and obtains promising results.
    An Operational Perspective to Fairness Interventions: Where and How to Intervene. (arXiv:2302.01574v2 [cs.LG] UPDATED)
    As AI-based decision systems proliferate, their successful operationalization requires balancing multiple desiderata: predictive performance, disparity across groups, safeguarding sensitive group attributes (e.g., race), and engineering cost. We present a holistic framework for evaluating and contextualizing fairness interventions with respect to the above desiderata. The two key points of practical consideration are \emph{where} (pre-, in-, post-processing) and \emph{how} (in what way the sensitive group data is used) the intervention is introduced. We demonstrate our framework with a case study on predictive parity. In it, we first propose a novel method for achieving predictive parity fairness without using group data at inference time via distibutionally robust optimization. Then, we showcase the effectiveness of these methods in a benchmarking study of close to 400 variations across two major model types (XGBoost vs. Neural Net), ten datasets, and over twenty unique methodologies. Methodological insights derived from our empirical study inform the practical design of ML workflow with fairness as a central concern. We find predictive parity is difficult to achieve without using group data, and despite requiring group data during model training (but not inference), distributionally robust methods we develop provide significant Pareto improvement. Moreover, a plain XGBoost model often Pareto-dominates neural networks with fairness interventions, highlighting the importance of model inductive bias.
    Towards Scalable Physically Consistent Neural Networks: an Application to Data-driven Multi-zone Thermal Building Models. (arXiv:2212.12380v2 [cs.LG] UPDATED)
    With more and more data being collected, data-driven modeling methods have been gaining in popularity in recent years. While physically sound, classical gray-box models are often cumbersome to identify and scale, and their accuracy might be hindered by their limited expressiveness. On the other hand, classical black-box methods, typically relying on Neural Networks (NNs) nowadays, often achieve impressive performance, even at scale, by deriving statistical patterns from data. However, they remain completely oblivious to the underlying physical laws, which may lead to potentially catastrophic failures if decisions for real-world physical systems are based on them. Physically Consistent Neural Networks (PCNNs) were recently developed to address these aforementioned issues, ensuring physical consistency while still leveraging NNs to attain state-of-the-art accuracy. In this work, we scale PCNNs to model building temperature dynamics and propose a thorough comparison with classical gray-box and black-box methods. More precisely, we design three distinct PCNN extensions, thereby exemplifying the modularity and flexibility of the architecture, and formally prove their physical consistency. In the presented case study, PCNNs are shown to achieve state-of-the-art accuracy, even outperforming classical NN-based models despite their constrained structure. Our investigations furthermore provide a clear illustration of NNs achieving seemingly good performance while remaining completely physics-agnostic, which can be misleading in practice. While this performance comes at the cost of computational complexity, PCNNs on the other hand show accuracy improvements of 17-35% compared to all other physically consistent methods, paving the way for scalable physically consistent models with state-of-the-art performance.
    ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement. (arXiv:2303.10727v2 [cs.LG] UPDATED)
    Social ambiance describes the context in which social interactions happen, and can be measured using speech audio by counting the number of concurrent speakers. This measurement has enabled various mental health tracking and human-centric IoT applications. While on-device Socal Ambiance Measure (SAM) is highly desirable to ensure user privacy and thus facilitate wide adoption of the aforementioned applications, the required computational complexity of state-of-the-art deep neural networks (DNNs) powered SAM solutions stands at odds with the often constrained resources on mobile devices. Furthermore, only limited labeled data is available or practical when it comes to SAM under clinical settings due to various privacy constraints and the required human effort, further challenging the achievable accuracy of on-device SAM solutions. To this end, we propose a dedicated neural architecture search framework for Energy-efficient and Real-time SAM (ERSAM). Specifically, our ERSAM framework can automatically search for DNNs that push forward the achievable accuracy vs. hardware efficiency frontier of mobile SAM solutions. For example, ERSAM-delivered DNNs only consume 40 mW x 12 h energy and 0.05 seconds processing latency for a 5 seconds audio segment on a Pixel 3 phone, while only achieving an error rate of 14.3% on a social ambiance dataset generated by LibriSpeech. We can expect that our ERSAM framework can pave the way for ubiquitous on-device SAM solutions which are in growing demand.
    Continual Spatio-Temporal Graph Convolutional Networks. (arXiv:2203.11009v2 [cs.CV] UPDATED)
    Graph-based reasoning over skeleton data has emerged as a promising approach for human action recognition. However, the application of prior graph-based methods, which predominantly employ whole temporal sequences as their input, to the setting of online inference entails considerable computational redundancy. In this paper, we tackle this issue by reformulating the Spatio-Temporal Graph Convolutional Neural Network as a Continual Inference Network, which can perform step-by-step predictions in time without repeat frame processing. To evaluate our method, we create a continual version of ST-GCN, CoST-GCN, alongside two derived methods with different self-attention mechanisms, CoAGCN and CoS-TR. We investigate weight transfer strategies and architectural modifications for inference acceleration, and perform experiments on the NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400 datasets. Retaining similar predictive accuracy, we observe up to 109x reduction in time complexity, on-hardware accelerations of 26x, and reductions in maximum allocated memory of 52% during online inference.
    Detection of out-of-distribution samples using binary neuron activation patterns. (arXiv:2212.14268v2 [cs.LG] UPDATED)
    Deep neural networks (DNN) have outstanding performance in various applications. Despite numerous efforts of the research community, out-of-distribution (OOD) samples remain a significant limitation of DNN classifiers. The ability to identify previously unseen inputs as novel is crucial in safety-critical applications such as self-driving cars, unmanned aerial vehicles, and robots. Existing approaches to detect OOD samples treat a DNN as a black box and evaluate the confidence score of the output predictions. Unfortunately, this method frequently fails, because DNNs are not trained to reduce their confidence for OOD inputs. In this work, we introduce a novel method for OOD detection. Our method is motivated by theoretical analysis of neuron activation patterns (NAP) in ReLU-based architectures. The proposed method does not introduce a high computational overhead due to the binary representation of the activation patterns extracted from convolutional layers. The extensive empirical evaluation proves its high performance on various DNN architectures and seven image datasets.
    Benchmarking the Impact of Noise on Deep Learning-based Classification of Atrial Fibrillation in 12-Lead ECG. (arXiv:2303.13915v1 [eess.SP])
    Electrocardiography analysis is widely used in various clinical applications and Deep Learning models for classification tasks are currently in the focus of research. Due to their data-driven character, they bear the potential to handle signal noise efficiently, but its influence on the accuracy of these methods is still unclear. Therefore, we benchmark the influence of four types of noise on the accuracy of a Deep Learning-based method for atrial fibrillation detection in 12-lead electrocardiograms. We use a subset of a publicly available dataset (PTBXL) and use the metadata provided by human experts regarding noise for assigning a signal quality to each electrocardiogram. Furthermore, we compute a quantitative signal-to-noise ratio for each electrocardiogram. We analyze the accuracy of the Deep Learning model with respect to both metrics and observe that the method can robustly identify atrial fibrillation, even in cases signals are labelled by human experts as being noisy on multiple leads. False positive and false negative rates are slightly worse for data being labelled as noisy. Interestingly, data annotated as showing baseline drift noise results in an accuracy very similar to data without. We conclude that the issue of processing noisy electrocardiography data can be addressed successfully by Deep Learning methods that might not need preprocessing as many conventional methods do.  ( 2 min )
    Continuous Mixtures of Tractable Probabilistic Models. (arXiv:2209.10584v3 [cs.LG] UPDATED)
    Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven to be expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented probability distribution. Meanwhile, tractable probabilistic models such as probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models, and thus are capable of performing exact inference efficiently but often show subpar performance in comparison to continuous latent-space models. In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension. While these models are analytically intractable, they are well amenable to numerical integration schemes based on a finite set of integration points. With a large enough number of integration points the approximation becomes de-facto exact. Moreover, for a finite set of integration points, the integration method effectively compiles the continuous mixture into a standard PC. In experiments, we show that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable models on many standard density estimation benchmarks.
    CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. (arXiv:2303.14126v1 [cs.CV])
    Recent technological advances in synthetic data have enabled the generation of images with such high quality that human beings cannot tell the difference between real-life photographs and Artificial Intelligence (AI) generated images. Given the critical necessity of data reliability and authentication, this article proposes to enhance our ability to recognise AI-generated images through computer vision. Initially, a synthetic dataset is generated that mirrors the ten classes of the already available CIFAR-10 dataset with latent diffusion which provides a contrasting set of images for comparison to real photographs. The model is capable of generating complex visual attributes, such as photorealistic reflections in water. The two sets of data present as a binary classification problem with regard to whether the photograph is real or generated by AI. This study then proposes the use of a Convolutional Neural Network (CNN) to classify the images into two categories; Real or Fake. Following hyperparameter tuning and the training of 36 individual network topologies, the optimal approach could correctly classify the images with 92.98% accuracy. Finally, this study implements explainable AI via Gradient Class Activation Mapping to explore which features within the images are useful for classification. Interpretation reveals interesting concepts within the image, in particular, noting that the actual entity itself does not hold useful information for classification; instead, the model focuses on small visual imperfections in the background of the images. The complete dataset engineered for this study, referred to as the CIFAKE dataset, is made publicly available to the research community for future work.
    TrojViT: Trojan Insertion in Vision Transformers. (arXiv:2208.13049v3 [cs.LG] UPDATED)
    Vision Transformers (ViTs) have demonstrated the state-of-the-art performance in various vision-related tasks. The success of ViTs motivates adversaries to perform backdoor attacks on ViTs. Although the vulnerability of traditional CNNs to backdoor attacks is well-known, backdoor attacks on ViTs are seldom-studied. Compared to CNNs capturing pixel-wise local features by convolutions, ViTs extract global context information through patches and attentions. Na\"ively transplanting CNN-specific backdoor attacks to ViTs yields only a low clean data accuracy and a low attack success rate. In this paper, we propose a stealth and practical ViT-specific backdoor attack $TrojViT$. Rather than an area-wise trigger used by CNN-specific backdoor attacks, TrojViT generates a patch-wise trigger designed to build a Trojan composed of some vulnerable bits on the parameters of a ViT stored in DRAM memory through patch salience ranking and attention-target loss. TrojViT further uses minimum-tuned parameter update to reduce the bit number of the Trojan. Once the attacker inserts the Trojan into the ViT model by flipping the vulnerable bits, the ViT model still produces normal inference accuracy with benign inputs. But when the attacker embeds a trigger into an input, the ViT model is forced to classify the input to a predefined target class. We show that flipping only few vulnerable bits identified by TrojViT on a ViT model using the well-known RowHammer can transform the model into a backdoored one. We perform extensive experiments of multiple datasets on various ViT models. TrojViT can classify $99.64\%$ of test images to a target class by flipping $345$ bits on a ViT for ImageNet.
    Convolution, aggregation and attention based deep neural networks for accelerating simulations in mechanics. (arXiv:2212.01386v2 [cs.LG] UPDATED)
    Deep learning surrogate models are being increasingly used in accelerating scientific simulations as a replacement for costly conventional numerical techniques. However, their use remains a significant challenge when dealing with real-world complex examples. In this work, we demonstrate three types of neural network architectures for efficient learning of highly non-linear deformations of solid bodies. The first two architectures are based on the recently proposed CNN U-NET and MAgNET (graph U-NET) frameworks which have shown promising performance for learning on mesh-based data. The third architecture is Perceiver IO, a very recent architecture that belongs to the family of attention-based neural networks--a class that has revolutionised diverse engineering fields and is still unexplored in computational mechanics. We study and compare the performance of all three networks on two benchmark examples, and show their capabilities to accurately predict the non-linear mechanical responses of soft bodies.
    Towards Dynamic Causal Discovery with Rare Events: A Nonparametric Conditional Independence Test. (arXiv:2211.16596v3 [stat.ML] UPDATED)
    Causal phenomena associated with rare events occur across a wide range of engineering problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links, between random variables in a dynamic setting, that manifest only when the variables first experience low-probability realizations. To address this issue, we introduce a novel statistical independence test on data collected from time-invariant dynamical systems in which rare but consequential events occur. In particular, we exploit the time-invariance of the underlying data to construct a superimposed dataset of the system state before rare events happen at different timesteps. We then design a conditional independence test on the reorganized data. We provide non-asymptotic sample complexity bounds for the consistency of our method, and validate its performance across various simulated and real-world datasets, including incident data collected from the Caltrans Performance Measurement System (PeMS). Code containing the datasets and experiments is publicly available.
    One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER. (arXiv:2301.10410v3 [cs.CL] UPDATED)
    Cross-domain NER is a challenging task to address the low-resource problem in practical scenarios. Previous typical solutions mainly obtain a NER model by pre-trained language models (PLMs) with data from a rich-resource domain and adapt it to the target domain. Owing to the mismatch issue among entity types in different domains, previous approaches normally tune all parameters of PLMs, ending up with an entirely new NER model for each domain. Moreover, current models only focus on leveraging knowledge in one general source domain while failing to successfully transfer knowledge from multiple sources to the target. To address these issues, we introduce Collaborative Domain-Prefix Tuning for cross-domain NER (CP-NER) based on text-to-text generative PLMs. Specifically, we present text-to-text generation grounding domain-related instructors to transfer knowledge to new domain NER tasks without structural modifications. We utilize frozen PLMs and conduct collaborative domain-prefix tuning to stimulate the potential of PLMs to handle NER tasks across various domains. Experimental results on the Cross-NER benchmark show that the proposed approach has flexible transfer ability and performs better on both one-source and multiple-source cross-domain NER tasks. Codes will be available in https://github.com/zjunlp/DeepKE/tree/main/example/ner/cross.
    Generalist: Decoupling Natural and Robust Generalization. (arXiv:2303.13813v1 [cs.CV])
    Deep neural networks obtained by standard training have been constantly plagued by adversarial examples. Although adversarial training demonstrates its capability to defend against adversarial examples, unfortunately, it leads to an inevitable drop in the natural generalization. To address the issue, we decouple the natural generalization and the robust generalization from joint training and formulate different training strategies for each one. Specifically, instead of minimizing a global loss on the expectation over these two generalization errors, we propose a bi-expert framework called \emph{Generalist} where we simultaneously train base learners with task-aware strategies so that they can specialize in their own fields. The parameters of base learners are collected and combined to form a global learner at intervals during the training process. The global learner is then distributed to the base learners as initialized parameters for continued training. Theoretically, we prove that the risks of Generalist will get lower once the base learners are well trained. Extensive experiments verify the applicability of Generalist to achieve high accuracy on natural examples while maintaining considerable robustness to adversarial ones. Code is available at https://github.com/PKU-ML/Generalist.  ( 2 min )
    TRAK: Attributing Model Behavior at Scale. (arXiv:2303.14186v1 [stat.ML])
    The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets. In this work, we introduce TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differentiable models. In particular, by leveraging only a handful of trained models, TRAK can match the performance of attribution methods that require training thousands of models. We demonstrate the utility of TRAK across various modalities and scales: image classifiers trained on ImageNet, vision-language models (CLIP), and language models (BERT and mT5). We provide code for using TRAK (and reproducing our work) at https://github.com/MadryLab/trak .
    When and why vision-language models behave like bags-of-words, and what to do about it?. (arXiv:2210.01936v3 [cs.CV] UPDATED)
    Despite the success of large vision and language models (VLMs) in many downstream applications, it is unclear how well they encode compositional information. Here, we create the Attribution, Relation, and Order (ARO) benchmark to systematically evaluate the ability of VLMs to understand different types of relationships, attributes, and order. ARO consists of Visual Genome Attribution, to test the understanding of objects' properties; Visual Genome Relation, to test for relational understanding; and COCO & Flickr30k-Order, to test for order sensitivity. ARO is orders of magnitude larger than previous benchmarks of compositionality, with more than 50,000 test cases. We show where state-of-the-art VLMs have poor relational understanding, can blunder when linking objects to their attributes, and demonstrate a severe lack of order sensitivity. VLMs are predominantly trained and evaluated on large datasets with rich compositional structure in the images and captions. Yet, training on these datasets has not been enough to address the lack of compositional understanding, and evaluating on these datasets has failed to surface this deficiency. To understand why these limitations emerge and are not represented in the standard tests, we zoom into the evaluation and training procedures. We demonstrate that it is possible to perform well on retrieval over existing datasets without using the composition and order information. Given that contrastive pretraining optimizes for retrieval on datasets with similar shortcuts, we hypothesize that this can explain why the models do not need to learn to represent compositional information. This finding suggests a natural solution: composition-aware hard negative mining. We show that a simple-to-implement modification of contrastive learning significantly improves the performance on tasks requiring understanding of order and compositionality.
    Misspecification in Inverse Reinforcement Learning. (arXiv:2212.03201v2 [cs.LG] UPDATED)
    The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $\pi$. To do this, we need a model of how $\pi$ relates to $R$. In the current literature, the most common models are optimality, Boltzmann rationality, and causal entropy maximisation. One of the primary motivations behind IRL is to infer human preferences from human behaviour. However, the true relationship between human preferences and human behaviour is much more complex than any of the models currently used in IRL. This means that they are misspecified, which raises the worry that they might lead to unsound inferences if applied to real-world data. In this paper, we provide a mathematical analysis of how robust different IRL models are to misspecification, and answer precisely how the demonstrator policy may differ from each of the standard models before that model leads to faulty inferences about the reward function $R$. We also introduce a framework for reasoning about misspecification in IRL, together with formal tools that can be used to easily derive the misspecification robustness of new IRL models.
    Euler State Networks: Non-dissipative Reservoir Computing. (arXiv:2203.09382v3 [cs.LG] UPDATED)
    Inspired by the numerical solution of ordinary differential equations, in this paper we propose a novel Reservoir Computing (RC) model, called the Euler State Network (EuSN). The presented approach makes use of forward Euler discretization and antisymmetric recurrent matrices to design reservoir dynamics that are both stable and non-dissipative by construction. Our mathematical analysis shows that the resulting model is biased towards a unitary effective spectral radius and zero local Lyapunov exponents, intrinsically operating near to the edge of stability. Experiments on long-term memory tasks show the clear superiority of the proposed approach over standard RC models in problems requiring effective propagation of input information over multiple time-steps. Furthermore, results on time-series classification benchmarks indicate that EuSN is able to match (or even exceed) the accuracy of trainable Recurrent Neural Networks, while retaining the training efficiency of the RC family, resulting in up to $\approx$ 490-fold savings in computation time and $\approx$ 1750-fold savings in energy consumption.
    Query Efficient Decision Based Sparse Attacks Against Black-Box Deep Learning Models. (arXiv:2202.00091v2 [cs.LG] UPDATED)
    Despite our best efforts, deep learning models remain highly vulnerable to even tiny adversarial perturbations applied to the inputs. The ability to extract information from solely the output of a machine learning model to craft adversarial perturbations to black-box models is a practical threat against real-world systems, such as autonomous cars or machine learning models exposed as a service (MLaaS). Of particular interest are sparse attacks. The realization of sparse attacks in black-box models demonstrates that machine learning models are more vulnerable than we believe. Because these attacks aim to minimize the number of perturbed pixels measured by l_0 norm-required to mislead a model by solely observing the decision (the predicted label) returned to a model query; the so-called decision-based attack setting. But, such an attack leads to an NP-hard optimization problem. We develop an evolution-based algorithm-SparseEvo-for the problem and evaluate against both convolutional deep neural networks and vision transformers. Notably, vision transformers are yet to be investigated under a decision-based attack setting. SparseEvo requires significantly fewer model queries than the state-of-the-art sparse attack Pointwise for both untargeted and targeted attacks. The attack algorithm, although conceptually simple, is also competitive with only a limited query budget against the state-of-the-art gradient-based whitebox attacks in standard computer vision tasks such as ImageNet. Importantly, the query efficient SparseEvo, along with decision-based attacks, in general, raise new questions regarding the safety of deployed systems and poses new directions to study and understand the robustness of machine learning models.
    ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!. (arXiv:2202.09357v2 [cs.LG] UPDATED)
    We introduce ProxSkip -- a surprisingly simple and provably efficient method for minimizing the sum of a smooth ($f$) and an expensive nonsmooth proximable ($\psi$) function. The canonical approach to solving such problems is via the proximal gradient descent (ProxGD) algorithm, which is based on the evaluation of the gradient of $f$ and the prox operator of $\psi$ in each iteration. In this work we are specifically interested in the regime in which the evaluation of prox is costly relative to the evaluation of the gradient, which is the case in many applications. ProxSkip allows for the expensive prox operator to be skipped in most iterations: while its iteration complexity is $\mathcal{O}\left(\kappa \log \frac{1}{\varepsilon}\right)$, where $\kappa$ is the condition number of $f$, the number of prox evaluations is $\mathcal{O}\left(\sqrt{\kappa} \log \frac{1}{\varepsilon}\right)$ only. Our main motivation comes from federated learning, where evaluation of the gradient operator corresponds to taking a local GD step independently on all devices, and evaluation of prox corresponds to (expensive) communication in the form of gradient averaging. In this context, ProxSkip offers an effective acceleration of communication complexity. Unlike other local gradient-type methods, such as FedAvg, SCAFFOLD, S-Local-GD and FedLin, whose theoretical communication complexity is worse than, or at best matching, that of vanilla GD in the heterogeneous data regime, we obtain a provable and large improvement without any heterogeneity-bounding assumptions.
    Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions. (arXiv:2209.15055v4 [stat.ML] UPDATED)
    We show that the representation cost of fully connected neural networks with homogeneous nonlinearities - which describes the implicit bias in function space of networks with $L_2$-regularization or with losses such as the cross-entropy - converges as the depth of the network goes to infinity to a notion of rank over nonlinear functions. We then inquire under which conditions the global minima of the loss recover the `true' rank of the data: we show that for too large depths the global minimum will be approximately rank 1 (underestimating the rank); we then argue that there is a range of depths which grows with the number of datapoints where the true rank is recovered. Finally, we discuss the effect of the rank of a classifier on the topology of the resulting class boundaries and show that autoencoders with optimal nonlinear rank are naturally denoising.
    Differentially Private Synthetic Control. (arXiv:2303.14084v1 [cs.LG])
    Synthetic control is a causal inference tool used to estimate the treatment effects of an intervention by creating synthetic counterfactual data. This approach combines measurements from other similar observations (i.e., donor pool ) to predict a counterfactual time series of interest (i.e., target unit) by analyzing the relationship between the target and the donor pool before the intervention. As synthetic control tools are increasingly applied to sensitive or proprietary data, formal privacy protections are often required. In this work, we provide the first algorithms for differentially private synthetic control with explicit error bounds. Our approach builds upon tools from non-private synthetic control and differentially private empirical risk minimization. We provide upper and lower bounds on the sensitivity of the synthetic control query and provide explicit error bounds on the accuracy of our private synthetic control algorithms. We show that our algorithms produce accurate predictions for the target unit, and that the cost of privacy is small. Finally, we empirically evaluate the performance of our algorithm, and show favorable performance in a variety of parameter regimes, as well as providing guidance to practitioners for hyperparameter tuning.
    Coordinate Descent Methods for Fractional Minimization. (arXiv:2201.12691v3 [math.OC] UPDATED)
    We consider a class of structured fractional minimization problems, in which the numerator part of the objective is the sum of a differentiable convex function and a convex non-smooth function, while the denominator part is a convex or concave function. This problem is difficult to solve since it is non-convex. By exploiting the structure of the problem, we propose two Coordinate Descent (CD) methods for solving this problem. The proposed methods iteratively solve a one-dimensional subproblem \textit{globally}, and they are guaranteed to converge to coordinate-wise stationary points. In the case of a convex denominator, under a weak \textit{locally bounded non-convexity condition}, we prove that the optimality of coordinate-wise stationary point is stronger than that of the standard critical point and directional point. Under additional suitable conditions, CD methods converge Q-linearly to coordinate-wise stationary points. In the case of a concave denominator, we show that any critical point is a global minimum, and CD methods converge to the global minimum with a sublinear convergence rate. We demonstrate the applicability of the proposed methods to some machine learning and signal processing models. Our experiments on real-world data have shown that our method significantly and consistently outperforms existing methods in terms of accuracy.
    Physics-informed PointNet: On how many irregular geometries can it solve an inverse problem simultaneously? Application to linear elasticity. (arXiv:2303.13634v1 [cs.LG])
    Regular physics-informed neural networks (PINNs) predict the solution of partial differential equations using sparse labeled data but only over a single domain. On the other hand, fully supervised learning models are first trained usually over a few thousand domains with known solutions (i.e., labeled data) and then predict the solution over a few hundred unseen domains. Physics-informed PointNet (PIPN) is primarily designed to fill this gap between PINNs (as weakly supervised learning models) and fully supervised learning models. In this article, we demonstrate that PIPN predicts the solution of desired partial differential equations over a few hundred domains simultaneously, while it only uses sparse labeled data. This framework benefits fast geometric designs in the industry when only sparse labeled data are available. Particularly, we show that PIPN predicts the solution of a plane stress problem over more than 500 domains with different geometries, simultaneously. Moreover, we pioneer implementing the concept of remarkable batch size (i.e., the number of geometries fed into PIPN at each sub-epoch) into PIPN. Specifically, we try batch sizes of 7, 14, 19, 38, 76, and 133. Additionally, the effect of the PIPN size, symmetric function in the PIPN architecture, and static and dynamic weights for the component of the sparse labeled data in the loss function are investigated.  ( 2 min )
    Efficient Algorithms for Learning from Coarse Labels. (arXiv:2108.09805v2 [cs.LG] UPDATED)
    For many learning problems one may not have access to fine grained label information; e.g., an image can be labeled as husky, dog, or even animal depending on the expertise of the annotator. In this work, we formalize these settings and study the problem of learning from such coarse data. Instead of observing the actual labels from a set $\mathcal{Z}$, we observe coarse labels corresponding to a partition of $\mathcal{Z}$ (or a mixture of partitions). Our main algorithmic result is that essentially any problem learnable from fine grained labels can also be learned efficiently when the coarse data are sufficiently informative. We obtain our result through a generic reduction for answering Statistical Queries (SQ) over fine grained labels given only coarse labels. The number of coarse labels required depends polynomially on the information distortion due to coarsening and the number of fine labels $|\mathcal{Z}|$. We also investigate the case of (infinitely many) real valued labels focusing on a central problem in censored and truncated statistics: Gaussian mean estimation from coarse data. We provide an efficient algorithm when the sets in the partition are convex and establish that the problem is NP-hard even for very simple non-convex sets.
    On the Symmetries of Deep Learning Models and their Internal Representations. (arXiv:2205.14258v5 [cs.LG] UPDATED)
    Symmetry is a fundamental tool in the exploration of a broad range of complex systems. In machine learning symmetry has been explored in both models and data. In this paper we seek to connect the symmetries arising from the architecture of a family of models with the symmetries of that family's internal representation of data. We do this by calculating a set of fundamental symmetry groups, which we call the intertwiner groups of the model. We connect intertwiner groups to a model's internal representations of data through a range of experiments that probe similarities between hidden states across models with the same architecture. Our work suggests that the symmetries of a network are propagated into the symmetries in that network's representation of data, providing us with a better understanding of how architecture affects the learning and prediction process. Finally, we speculate that for ReLU networks, the intertwiner groups may provide a justification for the common practice of concentrating model interpretability exploration on the activation basis in hidden layers rather than arbitrary linear combinations thereof.
    How many dimensions are required to find an adversarial example?. (arXiv:2303.14173v1 [cs.LG])
    Past work exploring adversarial vulnerability have focused on situations where an adversary can perturb all dimensions of model input. On the other hand, a range of recent works consider the case where either (i) an adversary can perturb a limited number of input parameters or (ii) a subset of modalities in a multimodal problem. In both of these cases, adversarial examples are effectively constrained to a subspace $V$ in the ambient input space $\mathcal{X}$. Motivated by this, in this work we investigate how adversarial vulnerability depends on $\dim(V)$. In particular, we show that the adversarial success of standard PGD attacks with $\ell^p$ norm constraints behaves like a monotonically increasing function of $\epsilon (\frac{\dim(V)}{\dim \mathcal{X}})^{\frac{1}{q}}$ where $\epsilon$ is the perturbation budget and $\frac{1}{p} + \frac{1}{q} =1$, provided $p > 1$ (the case $p=1$ presents additional subtleties which we analyze in some detail). This functional form can be easily derived from a simple toy linear model, and as such our results land further credence to arguments that adversarial examples are endemic to locally linear models on high dimensional spaces.
    Local Clustering in Contextual Multi-Armed Bandits. (arXiv:2103.00063v3 [cs.LG] UPDATED)
    We study identifying user clusters in contextual multi-armed bandits (MAB). Contextual MAB is an effective tool for many real applications, such as content recommendation and online advertisement. In practice, user dependency plays an essential role in the user's actions, and thus the rewards. Clustering similar users can improve the quality of reward estimation, which in turn leads to more effective content recommendation and targeted advertising. Different from traditional clustering settings, we cluster users based on the unknown bandit parameters, which will be estimated incrementally. In particular, we define the problem of cluster detection in contextual MAB, and propose a bandit algorithm, LOCB, embedded with local clustering procedure. And, we provide theoretical analysis about LOCB in terms of the correctness and efficiency of clustering and its regret bound. Finally, we evaluate the proposed algorithm from various aspects, which outperforms state-of-the-art baselines.
    Factorizers for Distributed Sparse Block Codes. (arXiv:2303.13957v1 [cs.CV])
    Distributed sparse block codes (SBCs) exhibit compact representations for encoding and manipulating symbolic data structures using fixed-with vectors. One major challenge however is to disentangle, or factorize, such data structures into their constituent elements without having to search through all possible combinations. This factorization becomes more challenging when queried by noisy SBCs wherein symbol representations are relaxed due to perceptual uncertainty and approximations made when modern neural networks are used to generate the query vectors. To address these challenges, we first propose a fast and highly accurate method for factorizing a more flexible and hence generalized form of SBCs, dubbed GSBCs. Our iterative factorizer introduces a threshold-based nonlinear activation, a conditional random sampling, and an $\ell_\infty$-based similarity metric. Its random sampling mechanism in combination with the search in superposition allows to analytically determine the expected number of decoding iterations, which matches the empirical observations up to the GSBC's bundling capacity. Secondly, the proposed factorizer maintains its high accuracy when queried by noisy product vectors generated using deep convolutional neural networks (CNNs). This facilitates its application in replacing the large fully connected layer (FCL) in CNNs, whereby C trainable class vectors, or attribute combinations, can be implicitly represented by our factorizer having F-factor codebooks, each with $\sqrt[\leftroot{-2}\uproot{2}F]{C}$ fixed codevectors. We provide a methodology to flexibly integrate our factorizer in the classification layer of CNNs with a novel loss function. We demonstrate the feasibility of our method on four deep CNN architectures over CIFAR-100, ImageNet-1K, and RAVEN datasets. In all use cases, the number of parameters and operations are significantly reduced compared to the FCL.
    Physics Symbolic Learner for Discovering Ground-Motion Models Via NGA-West2 Database. (arXiv:2303.14179v1 [cs.LG])
    Ground-motion model (GMM) is the basis of many earthquake engineering studies. In this study, a novel physics-informed symbolic learner (PISL) method based on the Nest Generation Attenuation-West2 database is proposed to automatically discover mathematical equation operators as symbols. The sequential threshold ridge regression algorithm is utilized to distill a concise and interpretable explicit characterization of complex systems of ground motions. In addition to the basic variables retrieved from previous GMMs, the current PISL incorporates two a priori physical conditions, namely, distance and amplitude saturation. GMMs developed using the PISL, an empirical regression method (ERM), and an artificial neural network (ANN) are compared in terms of residuals and extrapolation based on obtained data of peak ground acceleration and velocity. The results show that the inter- and intra-event standard deviations of the three methods are similar. The functional form of the PISL is more concise than that of the ERM and ANN. The extrapolation capability of the PISL is more accurate than that of the ANN. The PISL-GMM used in this study provide a new paradigm of regression that considers both physical and data-driven machine learning and can be used to identify the implied physical relationships and prediction equations of ground motion variables in different regions.
    Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors. (arXiv:2208.11356v2 [cs.CV] UPDATED)
    Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors. In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) -- a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with only slight computational overhead.
    Direct Evolutionary Optimization of Variational Autoencoders With Binary Latents. (arXiv:2011.13704v2 [stat.ML] UPDATED)
    Discrete latent variables are considered important for real world data, which has motivated research on Variational Autoencoders (VAEs) with discrete latents. However, standard VAE training is not possible in this case, which has motivated different strategies to manipulate discrete distributions in order to train discrete VAEs similarly to conventional ones. Here we ask if it is also possible to keep the discrete nature of the latents fully intact by applying a direct discrete optimization for the encoding model. The approach is consequently strongly diverting from standard VAE-training by sidestepping sampling approximation, reparameterization trick and amortization. Discrete optimization is realized in a variational setting using truncated posteriors in conjunction with evolutionary algorithms. For VAEs with binary latents, we (A) show how such a discrete variational method ties into gradient ascent for network weights, and (B) how the decoder is used to select latent states for training. Conventional amortized training is more efficient and applicable to large neural networks. However, using smaller networks, we here find direct discrete optimization to be efficiently scalable to hundreds of latents. More importantly, we find the effectiveness of direct optimization to be highly competitive in `zero-shot' learning. In contrast to large supervised networks, the here investigated VAEs can, e.g., denoise a single image without previous training on clean data and/or training on large image datasets. More generally, the studied approach shows that training of VAEs is indeed possible without sampling-based approximation and reparameterization, which may be interesting for the analysis of VAE-training in general. For `zero-shot' settings a direct optimization, furthermore, makes VAEs competitive where they have previously been outperformed by non-generative approaches.
    Euler Characteristic Tools For Topological Data Analysis. (arXiv:2303.14040v1 [cs.LG])
    In this article, we study Euler characteristic techniques in topological data analysis. Pointwise computing the Euler characteristic of a family of simplicial complexes built from data gives rise to the so-called Euler characteristic profile. We show that this simple descriptor achieve state-of-the-art performance in supervised tasks at a very low computational cost. Inspired by signal analysis, we compute hybrid transforms of Euler characteristic profiles. These integral transforms mix Euler characteristic techniques with Lebesgue integration to provide highly efficient compressors of topological signals. As a consequence, they show remarkable performances in unsupervised settings. On the qualitative side, we provide numerous heuristics on the topological and geometric information captured by Euler profiles and their hybrid transforms. Finally, we prove stability results for these descriptors as well as asymptotic guarantees in random settings.
    Near Optimal Adversarial Attack on UCB Bandits. (arXiv:2008.09312v3 [cs.LG] UPDATED)
    We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\sqrt{\log T}$, where $T$ is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to $\log \log T$ factors, showing our attack to be near optimal.
    RamBoAttack: A Robust Query Efficient Deep Neural Network Decision Exploit. (arXiv:2112.05282v3 [cs.LG] UPDATED)
    Machine learning models are critically susceptible to evasion attacks from adversarial examples. Generally, adversarial examples, modified inputs deceptively similar to the original input, are constructed under whitebox settings by adversaries with full access to the model. However, recent attacks have shown a remarkable reduction in query numbers to craft adversarial examples using blackbox attacks. Particularly, alarming is the ability to exploit the classification decision from the access interface of a trained model provided by a growing number of Machine Learning as a Service providers including Google, Microsoft, IBM and used by a plethora of applications incorporating these models. The ability of an adversary to exploit only the predicted label from a model to craft adversarial examples is distinguished as a decision-based attack. In our study, we first deep dive into recent state-of-the-art decision-based attacks in ICLR and SP to highlight the costly nature of discovering low distortion adversarial employing gradient estimation methods. We develop a robust query efficient attack capable of avoiding entrapment in a local minimum and misdirection from noisy gradients seen in gradient estimation methods. The attack method we propose, RamBoAttack, exploits the notion of Randomized Block Coordinate Descent to explore the hidden classifier manifold, targeting perturbations to manipulate only localized input features to address the issues of gradient estimation methods. Importantly, the RamBoAttack is more robust to the different sample inputs available to an adversary and the targeted class. Overall, for a given target class, RamBoAttack is demonstrated to be more robust at achieving a lower distortion within a given query budget. We curate our extensive results using the large-scale high-resolution ImageNet dataset and open-source our attack, test samples and artifacts on GitHub.
    Local Contrastive Learning for Medical Image Recognition. (arXiv:2303.14153v1 [cs.CV])
    The proliferation of Deep Learning (DL)-based methods for radiographic image analysis has created a great demand for expert-labeled radiology data. Recent self-supervised frameworks have alleviated the need for expert labeling by obtaining supervision from associated radiology reports. These frameworks, however, struggle to distinguish the subtle differences between different pathologies in medical images. Additionally, many of them do not provide interpretation between image regions and text, making it difficult for radiologists to assess model predictions. In this work, we propose Local Region Contrastive Learning (LRCLR), a flexible fine-tuning framework that adds layers for significant image region selection as well as cross-modality interaction. Our results on an external validation set of chest x-rays suggest that LRCLR identifies significant local image regions and provides meaningful interpretation against radiology text while improving zero-shot performance on several chest x-ray medical findings.
    Gradient scarcity with Bilevel Optimization for Graph Learning. (arXiv:2303.13964v1 [cs.LG])
    A common issue in graph learning under the semi-supervised setting is referred to as gradient scarcity. That is, learning graphs by minimizing a loss on a subset of nodes causes edges between unlabelled nodes that are far from labelled ones to receive zero gradients. The phenomenon was first described when optimizing the graph and the weights of a Graph Neural Network (GCN) with a joint optimization algorithm. In this work, we give a precise mathematical characterization of this phenomenon, and prove that it also emerges in bilevel optimization, where additional dependency exists between the parameters of the problem. While for GCNs gradient scarcity occurs due to their finite receptive field, we show that it also occurs with the Laplacian regularization model, in the sense that gradients amplitude decreases exponentially with distance to labelled nodes. To alleviate this issue, we study several solutions: we propose to resort to latent graph learning using a Graph-to-Graph model (G2G), graph regularization to impose a prior structure on the graph, or optimizing on a larger graph than the original one with a reduced diameter. Our experiments on synthetic and real datasets validate our analysis and prove the efficiency of the proposed solutions.
    Interpretable Anomaly Detection via Discrete Optimization. (arXiv:2303.14111v1 [cs.LG])
    Anomaly detection is essential in many application domains, such as cyber security, law enforcement, medicine, and fraud protection. However, the decision-making of current deep learning approaches is notoriously hard to understand, which often limits their practical applicability. To overcome this limitation, we propose a framework for learning inherently interpretable anomaly detectors from sequential data. More specifically, we consider the task of learning a deterministic finite automaton (DFA) from a given multi-set of unlabeled sequences. We show that this problem is computationally hard and develop two learning algorithms based on constraint optimization. Moreover, we introduce novel regularization schemes for our optimization problems that improve the overall interpretability of our DFAs. Using a prototype implementation, we demonstrate that our approach shows promising results in terms of accuracy and F1 score.
    Improved Adversarial Training Through Adaptive Instance-wise Loss Smoothing. (arXiv:2303.14077v1 [cs.CV])
    Deep neural networks can be easily fooled into making incorrect predictions through corruption of the input by adversarial perturbations: human-imperceptible artificial noise. So far adversarial training has been the most successful defense against such adversarial attacks. This work focuses on improving adversarial training to boost adversarial robustness. We first analyze, from an instance-wise perspective, how adversarial vulnerability evolves during adversarial training. We find that during training an overall reduction of adversarial loss is achieved by sacrificing a considerable proportion of training samples to be more vulnerable to adversarial attack, which results in an uneven distribution of adversarial vulnerability among data. Such "uneven vulnerability", is prevalent across several popular robust training methods and, more importantly, relates to overfitting in adversarial training. Motivated by this observation, we propose a new adversarial training method: Instance-adaptive Smoothness Enhanced Adversarial Training (ISEAT). It jointly smooths both input and weight loss landscapes in an adaptive, instance-specific, way to enhance robustness more for those samples with higher adversarial vulnerability. Extensive experiments demonstrate the superiority of our method over existing defense methods. Noticeably, our method, when combined with the latest data augmentation and semi-supervised learning techniques, achieves state-of-the-art robustness against $\ell_{\infty}$-norm constrained attacks on CIFAR10 of 59.32% for Wide ResNet34-10 without extra data, and 61.55% for Wide ResNet28-10 with extra data. Code is available at https://github.com/TreeLLi/Instance-adaptive-Smoothness-Enhanced-AT.
    Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis. (arXiv:2303.14157v1 [cs.CV])
    Any-scale image synthesis offers an efficient and scalable solution to synthesize photo-realistic images at any scale, even going beyond 2K resolution. However, existing GAN-based solutions depend excessively on convolutions and a hierarchical architecture, which introduce inconsistency and the $``$texture sticking$"$ issue when scaling the output resolution. From another perspective, INR-based generators are scale-equivariant by design, but their huge memory footprint and slow inference hinder these networks from being adopted in large-scale or real-time systems. In this work, we propose $\textbf{C}$olumn-$\textbf{R}$ow $\textbf{E}$ntangled $\textbf{P}$ixel $\textbf{S}$ynthesis ($\textbf{CREPS}$), a new generative model that is both efficient and scale-equivariant without using any spatial convolutions or coarse-to-fine design. To save memory footprint and make the system scalable, we employ a novel bi-line representation that decomposes layer-wise feature maps into separate $``$thick$"$ column and row encodings. Experiments on various datasets, including FFHQ, LSUN-Church, MetFaces, and Flickr-Scenery, confirm CREPS' ability to synthesize scale-consistent and alias-free images at any arbitrary resolution with proper training and inference speed. Code is available at https://github.com/VinAIResearch/CREPS.
    Toward Open-domain Slot Filling via Self-supervised Co-training. (arXiv:2303.13801v1 [cs.CL])
    Slot filling is one of the critical tasks in modern conversational systems. The majority of existing literature employs supervised learning methods, which require labeled training data for each new domain. Zero-shot learning and weak supervision approaches, among others, have shown promise as alternatives to manual labeling. Nonetheless, these learning paradigms are significantly inferior to supervised learning approaches in terms of performance. To minimize this performance gap and demonstrate the possibility of open-domain slot filling, we propose a Self-supervised Co-training framework, called SCot, that requires zero in-domain manually labeled training examples and works in three phases. Phase one acquires two sets of complementary pseudo labels automatically. Phase two leverages the power of the pre-trained language model BERT, by adapting it for the slot filling task using these sets of pseudo labels. In phase three, we introduce a self-supervised cotraining mechanism, where both models automatically select highconfidence soft labels to further improve the performance of the other in an iterative fashion. Our thorough evaluations show that SCot outperforms state-of-the-art models by 45.57% and 37.56% on SGD and MultiWoZ datasets, respectively. Moreover, our proposed framework SCot achieves comparable performance when compared to state-of-the-art fully supervised models.
    Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck. (arXiv:2303.14096v1 [cs.LG])
    In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition (i.e., less generalizable), so that one cannot prevent a model from co-adapting on such (so-called) "shortcut" signals: this makes the model fragile in various distribution shifts. To bypass such failure modes, we consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. This motivates us to extend the standard information bottleneck to additionally model the nuisance information. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training concerning both convolutional- and Transformer-based architectures. Our experimental results show that the proposed scheme improves robustness of learned representations (remarkably without using any domain-specific knowledge), with respect to multiple challenging reliability measures. For example, our model could advance the state-of-the-art on a recent challenging OBJECTS benchmark in novelty detection by $78.4\% \rightarrow 87.2\%$ in AUROC, while simultaneously enjoying improved corruption, background and (certified) adversarial robustness. Code is available at https://github.com/jh-jeong/nuisance_ib.
    Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle. (arXiv:2303.14151v1 [cs.LG])
    Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of large models in machine learning. This non-monotonic behavior of test loss depends on the number of data, the dimensionality of the data and the number of model parameters. Here, we briefly describe double descent, then provide an explanation of why double descent occurs in an informal and approachable manner, requiring only familiarity with linear algebra and introductory probability. We provide visual intuition using polynomial regression, then mathematically analyze double descent with ordinary linear regression and identify three interpretable factors that, when simultaneously all present, together create double descent. We demonstrate that double descent occurs on real data when using ordinary linear regression, then demonstrate that double descent does not occur when any of the three factors are ablated. We use this understanding to shed light on recent observations in nonlinear models concerning superposition and double descent. Code is publicly available.
    ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale. (arXiv:2303.14006v1 [cs.DC])
    As deep learning models and input data are scaling at an unprecedented rate, it is inevitable to move towards distributed training platforms to fit the model and increase training throughput. State-of-the-art approaches and techniques, such as wafer-scale nodes, multi-dimensional network topologies, disaggregated memory systems, and parallelization strategies, have been actively adopted by emerging distributed training systems. This results in a complex SW/HW co-design stack of distributed training, necessitating a modeling/simulation infrastructure for design-space exploration. In this paper, we extend the open-source ASTRA-sim infrastructure and endow it with the capabilities to model state-of-the-art and emerging distributed training models and platforms. More specifically, (i) we enable ASTRA-sim to support arbitrary model parallelization strategies via a graph-based training-loop implementation, (ii) we implement a parameterizable multi-dimensional heterogeneous topology generation infrastructure with analytical performance estimates enabling simulating target systems at scale, and (iii) we enhance the memory system modeling to support accurate modeling of in-network collective communication and disaggregated memory systems. With such capabilities, we run comprehensive case studies targeting emerging distributed models and platforms. This infrastructure lets system designers swiftly traverse the complex co-design stack and give meaningful insights when designing and deploying distributed training platforms at scale.
    Online Learning for the Random Feature Model in the Student-Teacher Framework. (arXiv:2303.14083v1 [cs.LG])
    Deep neural networks are widely used prediction algorithms whose performance often improves as the number of weights increases, leading to over-parametrization. We consider a two-layered neural network whose first layer is frozen while the last layer is trainable, known as the random feature model. We study over-parametrization in the context of a student-teacher framework by deriving a set of differential equations for the learning dynamics. For any finite ratio of hidden layer size and input dimension, the student cannot generalize perfectly, and we compute the non-zero asymptotic generalization error. Only when the student's hidden layer size is exponentially larger than the input dimension, an approach to perfect generalization is possible.
    Optimal Transport for Offline Imitation Learning. (arXiv:2303.13971v1 [cs.LG])
    With the advent of large datasets, offline reinforcement learning (RL) is a promising framework for learning good decision-making policies without the need to interact with the real environment. However, offline RL requires the dataset to be reward-annotated, which presents practical challenges when reward engineering is difficult or when obtaining reward annotations is labor-intensive. In this paper, we introduce Optimal Transport Reward labeling (OTR), an algorithm that assigns rewards to offline trajectories, with a few high-quality demonstrations. OTR's key idea is to use optimal transport to compute an optimal alignment between an unlabeled trajectory in the dataset and an expert demonstration to obtain a similarity measure that can be interpreted as a reward, which can then be used by an offline RL algorithm to learn the policy. OTR is easy to implement and computationally efficient. On D4RL benchmarks, we show that OTR with a single demonstration can consistently match the performance of offline RL with ground-truth rewards.
    A CNN-LSTM Architecture for Marine Vessel Track Association Using Automatic Identification System (AIS) Data. (arXiv:2303.14068v1 [cs.LG])
    In marine surveillance, distinguishing between normal and anomalous vessel movement patterns is critical for identifying potential threats in a timely manner. Once detected, it is important to monitor and track these vessels until a necessary intervention occurs. To achieve this, track association algorithms are used, which take sequential observations comprising geological and motion parameters of the vessels and associate them with respective vessels. The spatial and temporal variations inherent in these sequential observations make the association task challenging for traditional multi-object tracking algorithms. Additionally, the presence of overlapping tracks and missing data can further complicate the trajectory tracking process. To address these challenges, in this study, we approach this tracking task as a multivariate time series problem and introduce a 1D CNN-LSTM architecture-based framework for track association. This special neural network architecture can capture the spatial patterns as well as the long-term temporal relations that exist among the sequential observations. During the training process, it learns and builds the trajectory for each of these underlying vessels. Once trained, the proposed framework takes the marine vessel's location and motion data collected through the Automatic Identification System (AIS) as input and returns the most likely vessel track as output in real-time. To evaluate the performance of our approach, we utilize an AIS dataset containing observations from 327 vessels traveling in a specific geographic region. We measure the performance of our proposed framework using standard performance metrics such as accuracy, precision, recall, and F1 score. When compared with other competitive neural network architectures our approach demonstrates a superior tracking performance.
    Efficient and Direct Inference of Heart Rate Variability using Both Signal Processing and Machine Learning. (arXiv:2303.13637v1 [cs.LG])
    Heart Rate Variability (HRV) measures the variation of the time between consecutive heartbeats and is a major indicator of physical and mental health. Recent research has demonstrated that photoplethysmography (PPG) sensors can be used to infer HRV. However, many prior studies had high errors because they only employed signal processing or machine learning (ML), or because they indirectly inferred HRV, or because there lacks large training datasets. Many prior studies may also require large ML models. The low accuracy and large model sizes limit their applications to small embedded devices and potential future use in healthcare. To address the above issues, we first collected a large dataset of PPG signals and HRV ground truth. With this dataset, we developed HRV models that combine signal processing and ML to directly infer HRV. Evaluation results show that our method had errors between 3.5% to 25.7% and outperformed signal-processing-only and ML-only methods. We also explored different ML models, which showed that Decision Trees and Multi-level Perceptrons have 13.0% and 9.1% errors on average with models at most hundreds of KB and inference time less than 1ms. Hence, they are more suitable for small embedded devices and potentially enable the future use of PPG-based HRV monitoring in healthcare.  ( 2 min )
    How Does Attention Work in Vision Transformers? A Visual Analytics Attempt. (arXiv:2303.13731v1 [cs.LG])
    Vision transformer (ViT) expands the success of transformer models from sequential data to images. The model decomposes an image into many smaller patches and arranges them into a sequence. Multi-head self-attentions are then applied to the sequence to learn the attention between patches. Despite many successful interpretations of transformers on sequential data, little effort has been devoted to the interpretation of ViTs, and many questions remain unanswered. For example, among the numerous attention heads, which one is more important? How strong are individual patches attending to their spatial neighbors in different heads? What attention patterns have individual heads learned? In this work, we answer these questions through a visual analytics approach. Specifically, we first identify what heads are more important in ViTs by introducing multiple pruning-based metrics. Then, we profile the spatial distribution of attention strengths between patches inside individual heads, as well as the trend of attention strengths across attention layers. Third, using an autoencoder-based learning solution, we summarize all possible attention patterns that individual heads could learn. Examining the attention strengths and patterns of the important heads, we answer why they are important. Through concrete case studies with experienced deep learning experts on multiple ViTs, we validate the effectiveness of our solution that deepens the understanding of ViTs from head importance, head attention strength, and head attention pattern.
    Removing confounding information from fetal ultrasound images. (arXiv:2303.13918v1 [cs.CV])
    Confounding information in the form of text or markings embedded in medical images can severely affect the training of diagnostic deep learning algorithms. However, data collected for clinical purposes often have such markings embedded in them. In dermatology, known examples include drawings or rulers that are overrepresented in images of malignant lesions. In this paper, we encounter text and calipers placed on the images found in national databases containing fetal screening ultrasound scans, which correlate with standard planes to be predicted. In order to utilize the vast amounts of data available in these databases, we develop and validate a series of methods for minimizing the confounding effects of embedded text and calipers on deep learning algorithms designed for ultrasound, using standard plane classification as a test case.
    Improving Prediction Performance and Model Interpretability through Attention Mechanisms from Basic and Applied Research Perspectives. (arXiv:2303.14116v1 [cs.LG])
    With the dramatic advances in deep learning technology, machine learning research is focusing on improving the interpretability of model predictions as well as prediction performance in both basic and applied research. While deep learning models have much higher prediction performance than traditional machine learning models, the specific prediction process is still difficult to interpret and/or explain. This is known as the black-boxing of machine learning models and is recognized as a particularly important problem in a wide range of research fields, including manufacturing, commerce, robotics, and other industries where the use of such technology has become commonplace, as well as the medical field, where mistakes are not tolerated. This bulletin is based on the summary of the author's dissertation. The research summarized in the dissertation focuses on the attention mechanism, which has been the focus of much attention in recent years, and discusses its potential for both basic research in terms of improving prediction performance and interpretability, and applied research in terms of evaluating it for real-world applications using large data sets beyond the laboratory environment. The dissertation also concludes with a summary of the implications of these findings for subsequent research and future prospects in the field.
    Particle Mean Field Variational Bayes. (arXiv:2303.13930v1 [stat.CO])
    The Mean Field Variational Bayes (MFVB) method is one of the most computationally efficient techniques for Bayesian inference. However, its use has been restricted to models with conjugate priors or those that require analytical calculations. This paper proposes a novel particle-based MFVB approach that greatly expands the applicability of the MFVB method. We establish the theoretical basis of the new method by leveraging the connection between Wasserstein gradient flows and Langevin diffusion dynamics, and demonstrate the effectiveness of this approach using Bayesian logistic regression, stochastic volatility, and deep neural networks.
    PENTACET data -- 23 Million Contextual Code Comments and 500,000 SATD comments. (arXiv:2303.14029v1 [cs.SE])
    Most Self-Admitted Technical Debt (SATD) research utilizes explicit SATD features such as 'TODO' and 'FIXME' for SATD detection. A closer look reveals several SATD research uses simple SATD ('Easy to Find') code comments without the contextual data (preceding and succeeding source code context). This work addresses this gap through PENTACET (or 5C dataset) data. PENTACET is a large Curated Contextual Code Comments per Contributor and the most extensive SATD data. We mine 9,096 Open Source Software Java projects with a total of 435 million LOC. The outcome is a dataset with 23 million code comments, preceding and succeeding source code context for each comment, and more than 500,000 comments labeled as SATD, including both 'Easy to Find' and 'Hard to Find' SATD. We believe PENTACET data will further SATD research using Artificial Intelligence techniques.
    Factor Decomposed Generative Adversarial Networks for Text-to-Image Synthesis. (arXiv:2303.13821v1 [cs.MM])
    Prior works about text-to-image synthesis typically concatenated the sentence embedding with the noise vector, while the sentence embedding and the noise vector are two different factors, which control the different aspects of the generation. Simply concatenating them will entangle the latent factors and encumber the generative model. In this paper, we attempt to decompose these two factors and propose Factor Decomposed Generative Adversarial Networks~(FDGAN). To achieve this, we firstly generate images from the noise vector and then apply the sentence embedding in the normalization layer for both generator and discriminators. We also design an additive norm layer to align and fuse the text-image features. The experimental results show that decomposing the noise and the sentence embedding can disentangle latent factors in text-to-image synthesis, and make the generative model more efficient. Compared with the baseline, FDGAN can achieve better performance, while fewer parameters are used.
    Low Rank Optimization for Efficient Deep Learning: Making A Balance between Compact Architecture and Fast Training. (arXiv:2303.13635v1 [cs.LG])
    Deep neural networks have achieved great success in many data processing applications. However, the high computational complexity and storage cost makes deep learning hard to be used on resource-constrained devices, and it is not environmental-friendly with much power cost. In this paper, we focus on low-rank optimization for efficient deep learning techniques. In the space domain, deep neural networks are compressed by low rank approximation of the network parameters, which directly reduces the storage requirement with a smaller number of network parameters. In the time domain, the network parameters can be trained in a few subspaces, which enables efficient training for fast convergence. The model compression in the spatial domain is summarized into three categories as pre-train, pre-set, and compression-aware methods, respectively. With a series of integrable techniques discussed, such as sparse pruning, quantization, and entropy coding, we can ensemble them in an integration framework with lower computational complexity and storage. Besides of summary of recent technical advances, we have two findings for motivating future works: one is that the effective rank outperforms other sparse measures for network compression. The other is a spatial and temporal balance for tensorized neural networks.  ( 2 min )
    FixFit: using parameter-compression to solve the inverse problem in overdetermined models. (arXiv:2303.13746v1 [cs.LG])
    All fields of science depend on mathematical models. One of the fundamental problems with using complex nonlinear models is that data-driven parameter estimation often fails because interactions between model parameters lead to multiple parameter sets fitting the data equally well. Here, we develop a new method to address this problem, FixFit, which compresses a given mathematical model's parameters into a latent representation unique to model outputs. We acquire this representation by training a neural network with a bottleneck layer on data pairs of model parameters and model outputs. The bottleneck layer nodes correspond to the unique latent parameters, and their dimensionality indicates the information content of the model. The trained neural network can be split at the bottleneck layer into an encoder to characterize the redundancies and a decoder to uniquely infer latent parameters from measurements. We demonstrate FixFit in two use cases drawn from classical physics and neuroscience.
    Efficient Symbolic Reasoning for Neural-Network Verification. (arXiv:2303.13588v1 [cs.AI])
    The neural network has become an integral part of modern software systems. However, they still suffer from various problems, in particular, vulnerability to adversarial attacks. In this work, we present a novel program reasoning framework for neural-network verification, which we refer to as symbolic reasoning. The key components of our framework are the use of the symbolic domain and the quadratic relation. The symbolic domain has very flexible semantics, and the quadratic relation is quite expressive. They allow us to encode many verification problems for neural networks as quadratic programs. Our scheme then relaxes the quadratic programs to semidefinite programs, which can be efficiently solved. This framework allows us to verify various neural-network properties under different scenarios, especially those that appear challenging for non-symbolic domains. Moreover, it introduces new representations and perspectives for the verification tasks. We believe that our framework can bring new theoretical insights and practical tools to verification problems for neural networks.
    Mixed-Type Wafer Classification For Low Memory Devices Using Knowledge Distillation. (arXiv:2303.13974v1 [cs.LG])
    Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial for determining the root cause of production defects, which may further provide insight for yield improvement in wafer foundry. During manufacturing, various defects may appear standalone in the wafer or may appear as different combinations. Identifying multiple defects in a wafer is generally harder compared to identifying a single defect. Recently, deep learning methods have gained significant traction in mixed-type DPR. However, the complexity of defects requires complex and large models making them very difficult to operate on low-memory embedded devices typically used in fabrication labs. Another common issue is the unavailability of labeled data to train complex networks. In this work, we propose an unsupervised training routine to distill the knowledge of complex pre-trained models to lightweight deployment-ready models. We empirically show that this type of training compresses the model without sacrificing accuracy despite being up to 10 times smaller than the teacher model. The compressed model also manages to outperform contemporary state-of-the-art models.
    Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs. (arXiv:2303.13763v1 [cs.LG])
    Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely exclusively on the node features and fail to capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose a Prototype-Guided Knowledge Distillation~(PGKD) method, which does not require graph edges~(edge-free) yet learns structure-aware MLPs. Specifically, we analyze the graph structural information in GNN teachers, and distill such information from GNNs to MLPs via prototypes in an edge-free setting. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD.
    Uncovering Energy-Efficient Practices in Deep Learning Training: Preliminary Steps Towards Green AI. (arXiv:2303.13972v1 [cs.LG])
    Modern AI practices all strive towards the same goal: better results. In the context of deep learning, the term "results" often refers to the achieved accuracy on a competitive problem set. In this paper, we adopt an idea from the emerging field of Green AI to consider energy consumption as a metric of equal importance to accuracy and to reduce any irrelevant tasks or energy usage. We examine the training stage of the deep learning pipeline from a sustainability perspective, through the study of hyperparameter tuning strategies and the model complexity, two factors vastly impacting the overall pipeline's energy consumption. First, we investigate the effectiveness of grid search, random search and Bayesian optimisation during hyperparameter tuning, and we find that Bayesian optimisation significantly dominates the other strategies. Furthermore, we analyse the architecture of convolutional neural networks with the energy consumption of three prominent layer types: convolutional, linear and ReLU layers. The results show that convolutional layers are the most computationally expensive by a strong margin. Additionally, we observe diminishing returns in accuracy for more energy-hungry models. The overall energy consumption of training can be halved by reducing the network complexity. In conclusion, we highlight innovative and promising energy-efficient practices for training deep learning models. To expand the application of Green AI, we advocate for a shift in the design of deep learning models, by considering the trade-off between energy efficiency and accuracy.
    Efficient Mixed-Type Wafer Defect Pattern Recognition Using Compact Deformable Convolutional Transformers. (arXiv:2303.13827v1 [cs.CV])
    Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial to find the root cause of the issue and further improving the yield in the wafer foundry. Mixed-type DPR is much more complicated compared to single-type DPR due to varied spatial features, the uncertainty of defects, and the number of defects present. To accurately predict the number of defects as well as the types of defects, we propose a novel compact deformable convolutional transformer (DC Transformer). Specifically, DC Transformer focuses on the global features present in the wafer map by virtue of learnable deformable kernels and multi-head attention to the global features. The proposed method succinctly models the internal relationship between the wafer maps and the defects. DC Transformer is evaluated on a real dataset containing 38 defect patterns. Experimental results show that DC Transformer performs exceptionally well in recognizing both single and mixed-type defects. The proposed method outperforms the current state of the models by a considerable margin
    GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism. (arXiv:2303.13775v1 [cs.DC])
    Large-scale graphs with billions of edges are ubiquitous in many industries, science, and engineering fields such as recommendation systems, social graph analysis, knowledge base, material science, and biology. Graph neural networks (GNN), an emerging class of machine learning models, are increasingly adopted to learn on these graphs due to their superior performance in various graph analytics tasks. Mini-batch training is commonly adopted to train on large graphs, and data parallelism is the standard approach to scale mini-batch training to multiple GPUs. In this paper, we argue that several fundamental performance bottlenecks of GNN training systems have to do with inherent limitations of the data parallel approach. We then propose split parallelism, a novel parallel mini-batch training paradigm. We implement split parallelism in a novel system called gsplit and show that it outperforms state-of-the-art systems such as DGL, Quiver, and PaGraph.
    Efficient and Accurate Co-Visible Region Localization with Matching Key-Points Crop (MKPC): A Two-Stage Pipeline for Enhancing Image Matching Performance. (arXiv:2303.13794v1 [cs.CV])
    Image matching is a classic and fundamental task in computer vision. In this paper, under the hypothesis that the areas outside the co-visible regions carry little information, we propose a matching key-points crop (MKPC) algorithm. The MKPC locates, proposes and crops the critical regions, which are the co-visible areas with great efficiency and accuracy. Furthermore, building upon MKPC, we propose a general two-stage pipeline for image matching, which is compatible to any image matching models or combinations. We experimented with plugging SuperPoint + SuperGlue into the two-stage pipeline, whose results show that our method enhances the performance for outdoor pose estimations. What's more, in a fair comparative condition, our method outperforms the SOTA on Image Matching Challenge 2022 Benchmark, which represents the hardest outdoor benchmark of image matching currently.
    Remind of the Past: Incremental Learning with Analogical Prompts. (arXiv:2303.13898v1 [cs.CV])
    Although data-free incremental learning methods are memory-friendly, accurately estimating and counteracting representation shifts is challenging in the absence of historical data. This paper addresses this thorny problem by proposing a novel incremental learning method inspired by human analogy capabilities. Specifically, we design an analogy-making mechanism to remap the new data into the old class by prompt tuning. It mimics the feature distribution of the target old class on the old model using only samples of new classes. The learnt prompts are further used to estimate and counteract the representation shift caused by fine-tuning for the historical prototypes. The proposed method sets up new state-of-the-art performance on four incremental learning benchmarks under both the class and domain incremental learning settings. It consistently outperforms data-replay methods by only saving feature prototypes for each class. It has almost hit the empirical upper bound by joint training on the Core50 benchmark. The code will be released at \url{https://github.com/ZhihengCV/A-Prompts}.
    Temperature Schedules for Self-Supervised Contrastive Methods on Long-Tail Data. (arXiv:2303.13664v1 [cs.CV])
    Most approaches for self-supervised learning (SSL) are optimised on curated balanced datasets, e.g. ImageNet, despite the fact that natural data usually exhibits long-tail distributions. In this paper, we analyse the behaviour of one of the most popular variants of SSL, i.e. contrastive methods, on long-tail data. In particular, we investigate the role of the temperature parameter $\tau$ in the contrastive loss, by analysing the loss through the lens of average distance maximisation, and find that a large $\tau$ emphasises group-wise discrimination, whereas a small $\tau$ leads to a higher degree of instance discrimination. While $\tau$ has thus far been treated exclusively as a constant hyperparameter, in this work, we propose to employ a dynamic $\tau$ and show that a simple cosine schedule can yield significant improvements in the learnt representations. Such a schedule results in a constant `task switching' between an emphasis on instance discrimination and group-wise discrimination and thereby ensures that the model learns both group-wise features, as well as instance-specific details. Since frequent classes benefit from the former, while infrequent classes require the latter, we find this method to consistently improve separation between the classes in long-tail data without any additional computational cost.
    Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers. (arXiv:2303.13755v1 [cs.CV])
    Vision Transformers (ViT) have shown their competitive advantages performance-wise compared to convolutional neural networks (CNNs) though they often come with high computational costs. To this end, previous methods explore different attention patterns by limiting a fixed number of spatially nearby tokens to accelerate the ViT's multi-head self-attention (MHSA) operations. However, such structured attention patterns limit the token-to-token connections to their spatial relevance, which disregards learned semantic connections from a full attention mask. In this work, we propose a novel approach to learn instance-dependent attention patterns, by devising a lightweight connectivity predictor module to estimate the connectivity score of each pair of tokens. Intuitively, two tokens have high connectivity scores if the features are considered relevant either spatially or semantically. As each token only attends to a small number of other tokens, the binarized connectivity masks are often very sparse by nature and therefore provide the opportunity to accelerate the network via sparse computations. Equipped with the learned unstructured attention pattern, sparse attention ViT (Sparsifiner) produces a superior Pareto-optimal trade-off between FLOPs and top-1 accuracy on ImageNet compared to token sparsity. Our method reduces 48% to 69% FLOPs of MHSA while the accuracy drop is within 0.4%. We also show that combining attention and token sparsity reduces ViT FLOPs by over 60%.
    Convolutional Neural Networks for the classification of glitches in gravitational-wave data streams. (arXiv:2303.13917v1 [gr-qc])
    We investigate the use of Convolutional Neural Networks (including the modern ConvNeXt network family) to classify transient noise signals (i.e.~glitches) and gravitational waves in data from the Advanced LIGO detectors. First, we use models with a supervised learning approach, both trained from scratch using the Gravity Spy dataset and employing transfer learning by fine-tuning pre-trained models in this dataset. Second, we also explore a self-supervised approach, pre-training models with automatically generated pseudo-labels. Our findings are very close to existing results for the same dataset, reaching values for the F1 score of 97.18% (94.15%) for the best supervised (self-supervised) model. We further test the models using actual gravitational-wave signals from LIGO-Virgo's O3 run. Although trained using data from previous runs (O1 and O2), the models show good performance, in particular when using transfer learning. We find that transfer learning improves the scores without the need for any training on real signals apart from the less than 50 chirp examples from hardware injections present in the Gravity Spy dataset. This motivates the use of transfer learning not only for glitch classification but also for signal classification.
    Leveraging Old Knowledge to Continually Learn New Classes in Medical Images. (arXiv:2303.13752v1 [cs.LG])
    Class-incremental continual learning is a core step towards developing artificial intelligence systems that can continuously adapt to changes in the environment by learning new concepts without forgetting those previously learned. This is especially needed in the medical domain where continually learning from new incoming data is required to classify an expanded set of diseases. In this work, we focus on how old knowledge can be leveraged to learn new classes without catastrophic forgetting. We propose a framework that comprises of two main components: (1) a dynamic architecture with expanding representations to preserve previously learned features and accommodate new features; and (2) a training procedure alternating between two objectives to balance the learning of new features while maintaining the model's performance on old classes. Experiment results on multiple medical datasets show that our solution is able to achieve superior performance over state-of-the-art baselines in terms of class accuracy and forgetting.
    Policy Evaluation in Distributional LQR. (arXiv:2303.13657v1 [math.OC])
    Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefully designed. In this paper, we address this challenge for a special class of DRL problems that rely on linear quadratic regulator (LQR) for control, advocating for a new distributional approach to LQR, which we call \emph{distributional LQR}. Specifically, we provide a closed-form expression of the distribution of the random return which, remarkably, is applicable to all exogenous disturbances on the dynamics, as long as they are independent and identically distributed (i.i.d.). While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be approximated by a finite number of random variables, and the associated approximation error can be analytically bounded under mild assumptions. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) as a measure of risk. Numerical experiments are provided to illustrate our theoretical results.
    Unknown Sniffer for Object Detection: Don't Turn a Blind Eye to Unknown Objects. (arXiv:2303.13769v1 [cs.CV])
    The recently proposed open-world object and open-set detection achieve a breakthrough in finding never-seen-before objects and distinguishing them from class-known ones. However, their studies on knowledge transfer from known classes to unknown ones need to be deeper, leading to the scanty capability for detecting unknowns hidden in the background. In this paper, we propose the unknown sniffer (UnSniffer) to find both unknown and known objects. Firstly, the generalized object confidence (GOC) score is introduced, which only uses class-known samples for supervision and avoids improper suppression of unknowns in the background. Significantly, such confidence score learned from class-known objects can be generalized to unknown ones. Additionally, we propose a negative energy suppression loss to further limit the non-object samples in the background. Next, the best box of each unknown is hard to obtain during inference due to lacking their semantic information in training. To solve this issue, we introduce a graph-based determination scheme to replace hand-designed non-maximum suppression (NMS) post-processing. Finally, we present the Unknown Object Detection Benchmark, the first publicly benchmark that encompasses precision evaluation for unknown object detection to our knowledge. Experiments show that our method is far better than the existing state-of-the-art methods. Code is available at: https://github.com/Went-Liang/UnSniffer.
    UniTS: A Universal Time Series Analysis Framework with Self-supervised Representation Learning. (arXiv:2303.13804v1 [cs.LG])
    Machine learning has emerged as a powerful tool for time series analysis. Existing methods are usually customized for different analysis tasks and face challenges in tackling practical problems such as partial labeling and domain shift. To achieve universal analysis and address the aforementioned problems, we develop UniTS, a novel framework that incorporates self-supervised representation learning (or pre-training). The components of UniTS are designed using sklearn-like APIs to allow flexible extensions. We demonstrate how users can easily perform an analysis task using the user-friendly GUIs, and show the superior performance of UniTS over the traditional task-specific methods without self-supervised pre-training on five mainstream tasks and two practical settings.
    PPG-based Heart Rate Estimation with Efficient Sensor Sampling and Learning Models. (arXiv:2303.13636v1 [cs.LG])
    Recent studies showed that Photoplethysmography (PPG) sensors embedded in wearable devices can estimate heart rate (HR) with high accuracy. However, despite of prior research efforts, applying PPG sensor based HR estimation to embedded devices still faces challenges due to the energy-intensive high-frequency PPG sampling and the resource-intensive machine-learning models. In this work, we aim to explore HR estimation techniques that are more suitable for lower-power and resource-constrained embedded devices. More specifically, we seek to design techniques that could provide high-accuracy HR estimation with low-frequency PPG sampling, small model size, and fast inference time. First, we show that by combining signal processing and ML, it is possible to reduce the PPG sampling frequency from 125 Hz to only 25 Hz while providing higher HR estimation accuracy. This combination also helps to reduce the ML model feature size, leading to smaller models. Additionally, we present a comprehensive analysis on different ML models and feature sizes to compare their accuracy, model size, and inference time. The models explored include Decision Tree (DT), Random Forest (RF), K-nearest neighbor (KNN), Support vector machines (SVM), and Multi-layer perceptron (MLP). Experiments were conducted using both a widely-utilized dataset and our self-collected dataset. The experimental results show that our method by combining signal processing and ML had only 5% error for HR estimation using low-frequency PPG data. Moreover, our analysis showed that DT models with 10 to 20 input features usually have good accuracy, while are several magnitude smaller in model sizes and faster in inference time.
    Forecasting Competitions with Correlated Events. (arXiv:2303.13793v1 [cs.LG])
    Beginning with Witkowski et al. [2022], recent work on forecasting competitions has addressed incentive problems with the common winner-take-all mechanism. Frongillo et al. [2021] propose a competition mechanism based on follow-the-regularized-leader (FTRL), an online learning framework. They show that their mechanism selects an $\epsilon$-optimal forecaster with high probability using only $O(\log(n)/\epsilon^2)$ events. These works, together with all prior work on this problem thus far, assume that events are independent. We initiate the study of forecasting competitions for correlated events. To quantify correlation, we introduce a notion of block correlation, which allows each event to be strongly correlated with up to $b$ others. We show that under distributions with this correlation, the FTRL mechanism retains its $\epsilon$-optimal guarantee using $O(b^2 \log(n)/\epsilon^2)$ events. Our proof involves a novel concentration bound for correlated random variables which may be of broader interest.
    A Survey on Secure and Private Federated Learning Using Blockchain: Theory and Application in Resource-constrained Computing. (arXiv:2303.13727v1 [cs.CR])
    Federated Learning (FL) has gained widespread popularity in recent years due to the fast booming of advanced machine learning and artificial intelligence along with emerging security and privacy threats. FL enables efficient model generation from local data storage of the edge devices without revealing the sensitive data to any entities. While this paradigm partly mitigates the privacy issues of users' sensitive data, the performance of the FL process can be threatened and reached a bottleneck due to the growing cyber threats and privacy violation techniques. To expedite the proliferation of FL process, the integration of blockchain for FL environments has drawn prolific attention from the people of academia and industry. Blockchain has the potential to prevent security and privacy threats with its decentralization, immutability, consensus, and transparency characteristic. However, if the blockchain mechanism requires costly computational resources, then the resource-constrained FL clients cannot be involved in the training. Considering that, this survey focuses on reviewing the challenges, solutions, and future directions for the successful deployment of blockchain in resource-constrained FL environments. We comprehensively review variant blockchain mechanisms that are suitable for FL process and discuss their trade-offs for a limited resource budget. Further, we extensively analyze the cyber threats that could be observed in a resource-constrained FL environment, and how blockchain can play a key role to block those cyber attacks. To this end, we highlight some potential solutions towards the coupling of blockchain and federated learning that can offer high levels of reliability, data privacy, and distributed computing performance.
    Artificial-intelligence-based molecular classification of diffuse gliomas using rapid, label-free optical imaging. (arXiv:2303.13610v1 [cs.CV])
    Molecular classification has transformed the management of brain tumors by enabling more accurate prognostication and personalized treatment. However, timely molecular diagnostic testing for patients with brain tumors is limited, complicating surgical and adjuvant treatment and obstructing clinical trial enrollment. In this study, we developed DeepGlioma, a rapid ($< 90$ seconds), artificial-intelligence-based diagnostic screening system to streamline the molecular diagnosis of diffuse gliomas. DeepGlioma is trained using a multimodal dataset that includes stimulated Raman histology (SRH); a rapid, label-free, non-consumptive, optical imaging method; and large-scale, public genomic data. In a prospective, multicenter, international testing cohort of patients with diffuse glioma ($n=153$) who underwent real-time SRH imaging, we demonstrate that DeepGlioma can predict the molecular alterations used by the World Health Organization to define the adult-type diffuse glioma taxonomy (IDH mutation, 1p19q co-deletion and ATRX mutation), achieving a mean molecular classification accuracy of $93.3\pm 1.6\%$. Our results represent how artificial intelligence and optical histology can be used to provide a rapid and scalable adjunct to wet lab methods for the molecular screening of patients with diffuse glioma.
    FlexiViT: One Model for All Patch Sizes. (arXiv:2212.08013v2 [cs.CV] UPDATED)
    Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at https://github.com/google-research/big_vision
    Towards Fair Patient-Trial Matching via Patient-Criterion Level Fairness Constraint. (arXiv:2303.13790v1 [cs.LG])
    Clinical trials are indispensable in developing new treatments, but they face obstacles in patient recruitment and retention, hindering the enrollment of necessary participants. To tackle these challenges, deep learning frameworks have been created to match patients to trials. These frameworks calculate the similarity between patients and clinical trial eligibility criteria, considering the discrepancy between inclusion and exclusion criteria. Recent studies have shown that these frameworks outperform earlier approaches. However, deep learning models may raise fairness issues in patient-trial matching when certain sensitive groups of individuals are underrepresented in clinical trials, leading to incomplete or inaccurate data and potential harm. To tackle the issue of fairness, this work proposes a fair patient-trial matching framework by generating a patient-criterion level fairness constraint. The proposed framework considers the inconsistency between the embedding of inclusion and exclusion criteria among patients of different sensitive groups. The experimental results on real-world patient-trial and patient-criterion matching tasks demonstrate that the proposed framework can successfully alleviate the predictions that tend to be biased.
    Topological Reconstruction of Particle Physics Processes using Graph Neural Networks. (arXiv:2303.13937v1 [hep-ph])
    We present a new approach, the Topograph, which reconstructs underlying physics processes, including the intermediary particles, by leveraging underlying priors from the nature of particle physics decays and the flexibility of message passing graph neural networks. The Topograph not only solves the combinatoric assignment of observed final state objects, associating them to their original mother particles, but directly predicts the properties of intermediate particles in hard scatter processes and their subsequent decays. In comparison to standard combinatoric approaches or modern approaches using graph neural networks, which scale exponentially or quadratically, the complexity of Topographs scales linearly with the number of reconstructed objects. We apply Topographs to top quark pair production in the all hadronic decay channel, where we outperform the standard approach and match the performance of the state-of-the-art machine learning technique.
    High Fidelity Image Synthesis With Deep VAEs In Latent Space. (arXiv:2303.13714v1 [cs.CV])
    We present fast, realistic image generation on high-resolution, multimodal datasets using hierarchical variational autoencoders (VAEs) trained on a deterministic autoencoder's latent space. In this two-stage setup, the autoencoder compresses the image into its semantic features, which are then modeled with a deep VAE. With this method, the VAE avoids modeling the fine-grained details that constitute the majority of the image's code length, allowing it to focus on learning its structural components. We demonstrate the effectiveness of our two-stage approach, achieving a FID of 9.34 on the ImageNet-256 dataset which is comparable to BigGAN. We make our implementation available online.
    An investigation of licensing of datasets for machine learning based on the GQM model. (arXiv:2303.13735v1 [cs.SE])
    Dataset licensing is currently an issue in the development of machine learning systems. And in the development of machine learning systems, the most widely used are publicly available datasets. However, since the images in the publicly available dataset are mainly obtained from the Internet, some images are not commercially available. Furthermore, developers of machine learning systems do not often care about the license of the dataset when training machine learning models with it. In summary, the licensing of datasets for machine learning systems is in a state of incompleteness in all aspects at this stage. Our investigation of two collection datasets revealed that most of the current datasets lacked licenses, and the lack of licenses made it impossible to determine the commercial availability of the datasets. Therefore, we decided to take a more scientific and systematic approach to investigate the licensing of datasets and the licensing of machine learning systems that use the dataset to make it easier and more compliant for future developers of machine learning systems.
    EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms. (arXiv:2303.13745v1 [cs.LG])
    Automated design of efficient transformer models has recently attracted significant attention from industry and academia. However, most works only focus on certain metrics while searching for the best-performing transformer architecture. Furthermore, running traditional, complex, and large transformer models on low-compute edge platforms is a challenging problem. In this work, we propose a framework, called ProTran, to profile the hardware performance measures for a design space of transformer architectures and a diverse set of edge devices. We use this profiler in conjunction with the proposed co-design technique to obtain the best-performing models that have high accuracy on the given task and minimize latency, energy consumption, and peak power draw to enable edge deployment. We refer to our framework for co-optimizing accuracy and hardware performance measures as EdgeTran. It searches for the best transformer model and edge device pair. Finally, we propose GPTran, a multi-stage block-level grow-and-prune post-processing step that further improves accuracy in a hardware-aware manner. The obtained transformer model is 2.8$\times$ smaller and has a 0.8% higher GLUE score than the baseline (BERT-Base). Inference with it on the selected edge device enables 15.0% lower latency, 10.0$\times$ lower energy, and 10.8$\times$ lower peak power draw compared to an off-the-shelf GPU.
    POTATO: The Portable Text Annotation Tool. (arXiv:2212.08620v2 [cs.CL] UPDATED)
    We present POTATO, the Portable text annotation tool, a free, fully open-sourced annotation system that 1) supports labeling many types of text and multimodal data; 2) offers easy-to-configure features to maximize the productivity of both deployers and annotators (convenient templates for common ML/NLP tasks, active learning, keypress shortcuts, keyword highlights, tooltips); and 3) supports a high degree of customization (editable UI, inserting pre-screening questions, attention and qualification tests). Experiments over two annotation tasks suggest that POTATO improves labeling speed through its specially-designed productivity features, especially for long documents and complex tasks. POTATO is available at https://github.com/davidjurgens/potato and will continue to be updated.
    LONGNN: Spectral GNNs with Learnable Orthonormal Basis. (arXiv:2303.13750v1 [cs.LG])
    In recent years, a plethora of spectral graph neural networks (GNN) methods have utilized polynomial basis with learnable coefficients to achieve top-tier performances on many node-level tasks. Although various kinds of polynomial bases have been explored, each such method adopts a fixed polynomial basis which might not be the optimal choice for the given graph. Besides, we identify the so-called over-passing issue of these methods and show that it is somewhat rooted in their less-principled regularization strategy and unnormalized basis. In this paper, we make the first attempts to address these two issues. Leveraging Jacobi polynomials, we design a novel spectral GNN, LON-GNN, with Learnable OrthoNormal bases and prove that regularizing coefficients becomes equivalent to regularizing the norm of learned filter function now. We conduct extensive experiments on diverse graph datasets to evaluate the fitting and generalization capability of LON-GNN, where the results imply its superiority.
    End-to-End Diffusion Latent Optimization Improves Classifier Guidance. (arXiv:2303.13703v1 [cs.CV])
    Classifier guidance -- using the gradients of an image classifier to steer the generations of a diffusion model -- has the potential to dramatically expand the creative control over image generation and editing. However, currently classifier guidance requires either training new noise-aware models to obtain accurate gradients or using a one-step denoising approximation of the final generation, which leads to misaligned gradients and sub-optimal control. We highlight this approximation's shortcomings and propose a novel guidance method: Direct Optimization of Diffusion Latents (DOODL), which enables plug-and-play guidance by optimizing diffusion latents w.r.t. the gradients of a pre-trained classifier on the true generated pixels, using an invertible diffusion process to achieve memory-efficient backpropagation. Showcasing the potential of more precise guidance, DOODL outperforms one-step classifier guidance on computational and human evaluation metrics across different forms of guidance: using CLIP guidance to improve generations of complex prompts from DrawBench, using fine-grained visual classifiers to expand the vocabulary of Stable Diffusion, enabling image-conditioned generation with a CLIP visual encoder, and improving image aesthetics using an aesthetic scoring network.
    Building artificial neural circuits for domain-general cognition: a primer on brain-inspired systems-level architecture. (arXiv:2303.13651v1 [cs.NE])
    There is a concerted effort to build domain-general artificial intelligence in the form of universal neural network models with sufficient computational flexibility to solve a wide variety of cognitive tasks but without requiring fine-tuning on individual problem spaces and domains. To do this, models need appropriate priors and inductive biases, such that trained models can generalise to out-of-distribution examples and new problem sets. Here we provide an overview of the hallmarks endowing biological neural networks with the functionality needed for flexible cognition, in order to establish which features might also be important to achieve similar functionality in artificial systems. We specifically discuss the role of system-level distribution of network communication and recurrence, in addition to the role of short-term topological changes for efficient local computation. As machine learning models become more complex, these principles may provide valuable directions in an otherwise vast space of possible architectures. In addition, testing these inductive biases within artificial systems may help us to understand the biological principles underlying domain-general cognition.
    Clustering based on Mixtures of Sparse Gaussian Processes. (arXiv:2303.13665v1 [cs.LG])
    Creating low dimensional representations of a high dimensional data set is an important component in many machine learning applications. How to cluster data using their low dimensional embedded space is still a challenging problem in machine learning. In this article, we focus on proposing a joint formulation for both clustering and dimensionality reduction. When a probabilistic model is desired, one possible solution is to use the mixture models in which both cluster indicator and low dimensional space are learned. Our algorithm is based on a mixture of sparse Gaussian processes, which is called Sparse Gaussian Process Mixture Clustering (SGP-MIC). The main advantages to our approach over existing methods are that the probabilistic nature of this model provides more advantages over existing deterministic methods, it is straightforward to construct non-linear generalizations of the model, and applying a sparse model and an efficient variational EM approximation help to speed up the algorithm.
    Associated Random Neural Networks for Collective Classification of Nodes in Botnet Attacks. (arXiv:2303.13627v1 [cs.NI])
    Botnet attacks are a major threat to networked systems because of their ability to turn the network nodes that they compromise into additional attackers, leading to the spread of high volume attacks over long periods. The detection of such Botnets is complicated by the fact that multiple network IP addresses will be simultaneously compromised, so that Collective Classification of compromised nodes, in addition to the already available traditional methods that focus on individual nodes, can be useful. Thus this work introduces a collective Botnet attack classification technique that operates on traffic from an n-node IP network with a novel Associated Random Neural Network (ARNN) that identifies the nodes which are compromised. The ARNN is a recurrent architecture that incorporates two mutually associated, interconnected and architecturally identical n-neuron random neural networks, that act simultneously as mutual critics to reach the decision regarding which of n nodes have been compromised. A novel gradient learning descent algorithm is presented for the ARNN, and is shown to operate effectively both with conventional off-line training from prior data, and with on-line incremental training without prior off-line learning. Real data from a 107 node packet network is used with over 700,000 packets to evaluate the ARNN, showing that it provides accurate predictions. Comparisons with other well-known state of the art methods using the same learning and testing datasets, show that the ARNN offers significantly better performance.
    Extracting real estate values of rental apartment floor plans using graph convolutional networks. (arXiv:2303.13568v1 [cs.LG])
    Access graphs that indicate adjacency relationships from the perspective of flow lines of rooms are extracted automatically from a large number of floor plan images of a family-oriented rental apartment complex in Osaka Prefecture, Japan, based on a recently proposed access graph extraction method with slight modifications. We define and implement a graph convolutional network (GCN) for access graphs and propose a model to estimate the real estate value of access graphs as the floor plan value. The model, which includes the floor plan value and hedonic method using other general explanatory variables, is used to estimate rents and their estimation accuracies are compared. In addition, the features of the floor plan that explain the rent are analyzed from the learned convolution network. Therefore, a new model for comprehensively estimating the value of real estate floor plans is proposed and validated. The results show that the proposed method significantly improves the accuracy of rent estimation compared to that of conventional models, and it is possible to understand the specific spatial configuration rules that influence the value of a floor plan by analyzing the learned GCN.
    Enhancing Embedding Representations of Biomedical Data using Logic Knowledge. (arXiv:2303.13566v1 [cs.AI])
    Knowledge Graph Embeddings (KGE) have become a quite popular class of models specifically devised to deal with ontologies and graph structure data, as they can implicitly encode statistical dependencies between entities and relations in a latent space. KGE techniques are particularly effective for the biomedical domain, where it is quite common to deal with large knowledge graphs underlying complex interactions between biological and chemical objects. Recently in the literature, the PharmKG dataset has been proposed as one of the most challenging knowledge graph biomedical benchmark, with hundreds of thousands of relational facts between genes, diseases and chemicals. Despite KGEs can scale to very large relational domains, they generally fail at representing more complex relational dependencies between facts, like logic rules, which may be fundamental in complex experimental settings. In this paper, we exploit logic rules to enhance the embedding representations of KGEs on the PharmKG dataset. To this end, we adopt Relational Reasoning Network (R2N), a recently proposed neural-symbolic approach showing promising results on knowledge graph completion tasks. An R2N uses the available logic rules to build a neural architecture that reasons over KGE latent representations. In the experiments, we show that our approach is able to significantly improve the current state-of-the-art on the PharmKG dataset. Finally, we provide an ablation study to experimentally compare the effect of alternative sets of rules according to different selection criteria and varying the number of considered rules.
    Efficient decentralized multi-agent learning in asymmetric bipartite queueing systems. (arXiv:2206.03324v2 [cs.LG] UPDATED)
    We study decentralized multi-agent learning in bipartite queueing systems, a standard model for service systems. In particular, N agents request service from K servers in a fully decentralized way, i.e, by running the same algorithm without communication. Previous decentralized algorithms are restricted to symmetric systems, have performance that is degrading exponentially in the number of servers, require communication through shared randomness and unique agent identities, and are computationally demanding. In contrast, we provide a simple learning algorithm that, when run decentrally by each agent, leads the queueing system to have efficient performance in general asymmetric bipartite queueing systems while also having additional robustness properties. Along the way, we provide the first provably efficient UCB-based algorithm for the centralized case of the problem.
    A Closer Look at Scoring Functions and Generalization Prediction. (arXiv:2303.13589v1 [cs.LG])
    Generalization error predictors (GEPs) aim to predict model performance on unseen distributions by deriving dataset-level error estimates from sample-level scores. However, GEPs often utilize disparate mechanisms (e.g., regressors, thresholding functions, calibration datasets, etc), to derive such error estimates, which can obfuscate the benefits of a particular scoring function. Therefore, in this work, we rigorously study the effectiveness of popular scoring functions (confidence, local manifold smoothness, model agreement), independent of mechanism choice. We find, absent complex mechanisms, that state-of-the-art confidence- and smoothness- based scores fail to outperform simple model-agreement scores when estimating error under distribution shifts and corruptions. Furthermore, on realistic settings where the training data has been compromised (e.g., label noise, measurement noise, undersampling), we find that model-agreement scores continue to perform well and that ensemble diversity is important for improving its performance. Finally, to better understand the limitations of scoring functions, we demonstrate that simplicity bias, or the propensity of deep neural networks to rely upon simple but brittle features, can adversely affect GEP performance. Overall, our work carefully studies the effectiveness of popular scoring functions in realistic settings and helps to better understand their limitations.
    Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback. (arXiv:2303.13604v1 [cs.LG])
    This paper investigates the problem of combinatorial multiarmed bandits with stochastic submodular (in expectation) rewards and full-bandit delayed feedback, where the delayed feedback is assumed to be composite and anonymous. In other words, the delayed feedback is composed of components of rewards from past actions, with unknown division among the sub-components. Three models of delayed feedback: bounded adversarial, stochastic independent, and stochastic conditionally independent are studied, and regret bounds are derived for each of the delay models. Ignoring the problem dependent parameters, we show that regret bound for all the delay models is $\tilde{O}(T^{2/3} + T^{1/3} \nu)$ for time horizon $T$, where $\nu$ is a delay parameter defined differently in the three cases, thus demonstrating an additive term in regret with delay in all the three delay models. The considered algorithm is demonstrated to outperform other full-bandit approaches with delayed composite anonymous feedback.
    OFA$^2$: A Multi-Objective Perspective for the Once-for-All Neural Architecture Search. (arXiv:2303.13683v1 [cs.NE])
    Once-for-All (OFA) is a Neural Architecture Search (NAS) framework designed to address the problem of searching efficient architectures for devices with different resources constraints by decoupling the training and the searching stages. The computationally expensive process of training the OFA neural network is done only once, and then it is possible to perform multiple searches for subnetworks extracted from this trained network according to each deployment scenario. In this work we aim to give one step further in the search for efficiency by explicitly conceiving the search stage as a multi-objective optimization problem. A Pareto frontier is then populated with efficient, and already trained, neural architectures exhibiting distinct trade-offs among the conflicting objectives. This could be achieved by using any multi-objective evolutionary algorithm during the search stage, such as NSGA-II and SMS-EMOA. In other words, the neural network is trained once, the searching for subnetworks considering different hardware constraints is also done one single time, and then the user can choose a suitable neural network according to each deployment scenario. The conjugation of OFA and an explicit algorithm for multi-objective optimization opens the possibility of a posteriori decision-making in NAS, after sampling efficient subnetworks which are a very good approximation of the Pareto frontier, given that those subnetworks are already trained and ready to use. The source code and the final search algorithm will be released at https://github.com/ito-rafael/once-for-all-2
    Une comparaison des algorithmes d'apprentissage pour la survie avec donn\'ees manquantes. (arXiv:2303.13590v1 [stat.ML])
    Survival analysis is an essential tool for the study of health data. An inherent component of such data is the presence of missing values. In recent years, researchers proposed new learning algorithms for survival tasks based on neural networks. Here, we studied the predictive performance of such algorithms coupled with different methods for handling missing values on simulated data that reflect a realistic situation, i.e., when individuals belong to unobserved clusters. We investigated different patterns of missing data. The results show that, without further feature engineering, no single imputation method is better than the others in all cases. The proposed methodology can be used to compare other missing data patterns and/or survival models. The Python code is accessible via the package survivalsim. -- L'analyse de survie est un outil essentiel pour l'\'etude des donn\'ees de sant\'e. Une composante inh\'erente \`a ces donn\'ees est la pr\'esence de valeurs manquantes. Ces derni\`eres ann\'ees, de nouveaux algorithmes d'apprentissage pour la survie, bas\'es sur les r\'eseaux de neurones, ont \'et\'e con\c{c}us. L'objectif de ce travail est d'\'etudier la performance en pr\'ediction de ces algorithmes coupl\'es \`a diff\'erentes m\'ethodes pour g\'erer les valeurs manquantes, sur des donn\'ees simul\'ees qui refl\`etent une situation rencontr\'ee en pratique, c'est-\`a dire lorsque les individus peuvent \^etre group\'es selon leurs covariables. Diff\'erents sch\'emas de donn\'ees manquantes sont \'etudi\'es. Les r\'esultats montrent que, sans l'ajout de variables suppl\'ementaires, aucune m\'ethode d'imputation n'est meilleure que les autres dans tous les cas. La m\'ethodologie propos\'ee peut \^etre utilis\'ee pour comparer d'autres mod\`eles de survie. Le code en Python est accessible via le package survivalsim.
    Efficient hybrid modeling and sorption model discovery for non-linear advection-diffusion-sorption systems: A systematic scientific machine learning approach. (arXiv:2303.13555v1 [cs.CE])
    This study presents a systematic machine learning approach for creating efficient hybrid models and discovering sorption uptake models in non-linear advection-diffusion-sorption systems. It demonstrates an effective method to train these complex systems using gradientbased optimizers, adjoint sensitivity analysis, and JIT-compiled vector Jacobian products, combined with spatial discretization and adaptive integrators. Sparse and symbolic regression were employed to identify missing functions in the artificial neural network. The robustness of the proposed method was tested on an in-silico data set of noisy breakthrough curve observations of fixed-bed adsorption, resulting in a well-fitted hybrid model. The study successfully reconstructed sorption uptake kinetics using sparse and symbolic regression, and accurately predicted breakthrough curves using identified polynomials, highlighting the potential of the proposed framework for discovering sorption kinetic law structures.
    Uncertainty-Aware Workload Prediction in Cloud Computing. (arXiv:2303.13525v1 [cs.DC])
    Predicting future resource demand in Cloud Computing is essential for managing Cloud data centres and guaranteeing customers a minimum Quality of Service (QoS) level. Modelling the uncertainty of future demand improves the quality of the prediction and reduces the waste due to overallocation. In this paper, we propose univariate and bivariate Bayesian deep learning models to predict the distribution of future resource demand and its uncertainty. We design different training scenarios to train these models, where each procedure is a different combination of pretraining and fine-tuning steps on multiple datasets configurations. We also compare the bivariate model to its univariate counterpart training with one or more datasets to investigate how different components affect the accuracy of the prediction and impact the QoS. Finally, we investigate whether our models have transfer learning capabilities. Extensive experiments show that pretraining with multiple datasets boosts performances while fine-tuning does not. Our models generalise well on related but unseen time series, proving transfer learning capabilities. Runtime performance analysis shows that the models are deployable in real-world applications. For this study, we preprocessed twelve datasets from real-world traces in a consistent and detailed way and made them available to facilitate the research in this field.
    Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games. (arXiv:2303.13539v1 [cs.LG])
    Stochastic games are a popular framework for studying multi-agent reinforcement learning (MARL). Recent advances in MARL have focused primarily on games with finitely many states. In this work, we study multi-agent learning in stochastic games with general state spaces and an information structure in which agents do not observe each other's actions. In this context, we propose a decentralized MARL algorithm and we prove the near-optimality of its policy updates. Furthermore, we study the global policy-updating dynamics for a general class of best-reply based algorithms and derive a closed-form characterization of convergence probabilities over the joint policy space.
    Graph Tensor Networks: An Intuitive Framework for Designing Large-Scale Neural Learning Systems on Multiple Domains. (arXiv:2303.13565v1 [cs.LG])
    Despite the omnipresence of tensors and tensor operations in modern deep learning, the use of tensor mathematics to formally design and describe neural networks is still under-explored within the deep learning community. To this end, we introduce the Graph Tensor Network (GTN) framework, an intuitive yet rigorous graphical framework for systematically designing and implementing large-scale neural learning systems on both regular and irregular domains. The proposed framework is shown to be general enough to include many popular architectures as special cases, and flexible enough to handle data on any and many data domains. The power and flexibility of the proposed framework is demonstrated through real-data experiments, resulting in improved performance at a drastically lower complexity costs, by virtue of tensor algebra.
    Return of the RNN: Residual Recurrent Networks for Invertible Sentence Embeddings. (arXiv:2303.13570v1 [cs.CL])
    This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task. Rather than the probabilistic outputs common to neural machine translation models, our approach employs a regression-based output layer to reconstruct the input sequence's word vectors. The model achieves high accuracy and fast training with the ADAM optimizer, a significant finding given that RNNs typically require memory units, such as LSTMs, or second-order optimization methods. We incorporate residual connections and introduce a "match drop" technique, where gradients are calculated only for incorrect words. Our approach demonstrates potential for various natural language processing applications, particularly in neural network-based systems that require high-quality sentence embeddings.
    Bluetooth and WiFi Dataset for Real World RF Fingerprinting of Commercial Devices. (arXiv:2303.13538v1 [cs.NI])
    RF fingerprinting is emerging as a physical layer security scheme to identify illegitimate and/or unauthorized emitters sharing the RF spectrum. However, due to the lack of publicly accessible real-world datasets, most research focuses on generating synthetic waveforms with software-defined radios (SDRs) which are not suited for practical deployment settings. On other hand, the limited datasets that are available focus only on chipsets that generate only one kind of waveform. Commercial off-the-shelf (COTS) combo chipsets that support two wireless standards (for example WiFi and Bluetooth) over a shared dual-band antenna such as those found in laptops, adapters, wireless chargers, Raspberry Pis, among others are becoming ubiquitous in the IoT realm. Hence, to keep up with the modern IoT environment, there is a pressing need for real-world open datasets capturing emissions from these combo chipsets transmitting heterogeneous communication protocols. To this end, we capture the first known emissions from the COTS IoT chipsets transmitting WiFi and Bluetooth under two different time frames. The different time frames are essential to rigorously evaluate the generalization capability of the models. To ensure widespread use, each capture within the comprehensive 72 GB dataset is long enough (40 MSamples) to support diverse input tensor lengths and formats. Finally, the dataset also comprises emissions at varying signal powers to account for the feeble to high signal strength emissions as encountered in a real-world setting.
    Learning unidirectional coupling using echo-state network. (arXiv:2303.13562v1 [cs.LG])
    Reservoir Computing has found many potential applications in the field of complex dynamics. In this article, we exploit the exceptional capability of the echo-state network (ESN) model to make it learn a unidirectional coupling scheme from only a few time series data of the system. We show that, once trained with a few example dynamics of a drive-response system, the machine is able to predict the response system's dynamics for any driver signal with the same coupling. Only a few time series data of an $A-B$ type drive-response system in training is sufficient for the ESN to learn the coupling scheme. After training even if we replace drive system $A$ with a different system $C$, the ESN can reproduce the dynamics of response system $B$ using the dynamics of new drive system $C$ only.
    Towards risk-informed PBSHM: Populations as hierarchical systems. (arXiv:2303.13533v1 [cs.AI])
    The prospect of informed and optimal decision-making regarding the operation and maintenance (O&M) of structures provides impetus to the development of structural health monitoring (SHM) systems. A probabilistic risk-based framework for decision-making has already been proposed. However, in order to learn the statistical models necessary for decision-making, measured data from the structure of interest are required. Unfortunately, these data are seldom available across the range of environmental and operational conditions necessary to ensure good generalisation of the model. Recently, technologies have been developed that overcome this challenge, by extending SHM to populations of structures, such that valuable knowledge may be transferred between instances of structures that are sufficiently similar. This new approach is termed population-based structural heath monitoring (PBSHM). The current paper presents a formal representation of populations of structures, such that risk-based decision processes may be specified within them. The population-based representation is an extension to the hierarchical representation of a structure used within the probabilistic risk-based decision framework to define fault trees. The result is a series, consisting of systems of systems ranging from the individual component level up to an inventory of heterogeneous populations. The current paper considers an inventory of wind farms as a motivating example and highlights the inferences and decisions that can be made within the hierarchical representation.
    marl-jax: Multi-agent Reinforcement Leaning framework for Social Generalization. (arXiv:2303.13808v1 [cs.MA])
    Recent advances in Reinforcement Learning (RL) have led to many exciting applications. These advancements have been driven by improvements in both algorithms and engineering, which have resulted in faster training of RL agents. We present marl-jax, a multi-agent reinforcement learning software package for training and evaluating social generalization of the agents. The package is designed for training a population of agents in multi-agent environments and evaluating their ability to generalize to diverse background agents. It is built on top of DeepMind's JAX ecosystem~\cite{deepmind2020jax} and leverages the RL ecosystem developed by DeepMind. Our framework marl-jax is capable of working in cooperative and competitive, simultaneous-acting environments with multiple agents. The package offers an intuitive and user-friendly command-line interface for training a population and evaluating its generalization capabilities. In conclusion, marl-jax provides a valuable resource for researchers interested in exploring social generalization in the context of MARL. The open-source code for marl-jax is available at: \href{https://github.com/kinalmehta/marl-jax}{https://github.com/kinalmehta/marl-jax}
    Editing Driver Character: Socially-Controllable Behavior Generation for Interactive Traffic Simulation. (arXiv:2303.13830v1 [cs.RO])
    Traffic simulation plays a crucial role in evaluating and improving autonomous driving planning systems. After being deployed on public roads, autonomous vehicles need to interact with human road participants with different social preferences (e.g., selfish or courteous human drivers). To ensure that autonomous vehicles take safe and efficient maneuvers in different interactive traffic scenarios, we should be able to evaluate autonomous vehicles against reactive agents with different social characteristics in the simulation environment. We propose a socially-controllable behavior generation (SCBG) model for this purpose, which allows the users to specify the level of courtesy of the generated trajectory while ensuring realistic and human-like trajectory generation through learning from real-world driving data. Specifically, we define a novel and differentiable measure to quantify the level of courtesy of driving behavior, leveraging marginal and conditional behavior prediction models trained from real-world driving data. The proposed courtesy measure allows us to auto-label the courtesy levels of trajectories from real-world driving data and conveniently train an SCBG model generating trajectories based on the input courtesy values. We examined the SCBG model on the Waymo Open Motion Dataset (WOMD) and showed that we were able to control the SCBG model to generate realistic driving behaviors with desired courtesy levels. Interestingly, we found that the SCBG model was able to identify different motion patterns of courteous behaviors according to the scenarios.
    Decompositional Quantum Graph Neural Network. (arXiv:2201.05158v2 [quant-ph] UPDATED)
    Quantum machine learning is a fast-emerging field that aims to tackle machine learning using quantum algorithms and quantum computing. Due to the lack of physical qubits and an effective means to map real-world data from Euclidean space to Hilbert space, most of these methods focus on quantum analogies or process simulations rather than devising concrete architectures based on qubits. In this paper, we propose a novel hybrid quantum-classical algorithm for graph-structured data, which we refer to as the Ego-graph based Quantum Graph Neural Network (egoQGNN). egoQGNN implements the GNN theoretical framework using the tensor product and unity matrix representation, which greatly reduces the number of model parameters required. When controlled by a classical computer, egoQGNN can accommodate arbitrarily sized graphs by processing ego-graphs from the input graph using a modestly-sized quantum device. The architecture is based on a novel mapping from real-world data to Hilbert space. This mapping maintains the distance relations present in the data and reduces information loss. Experimental results show that the proposed method outperforms competitive state-of-the-art models with only 1.68\% parameters compared to those models.
    Federated Learning on Heterogenous Data using Chest CT. (arXiv:2303.13567v1 [cs.LG])
    Large data have accelerated advances in AI. While it is well known that population differences from genetics, sex, race, diet, and various environmental factors contribute significantly to disease, AI studies in medicine have largely focused on locoregional patient cohorts with less diverse data sources. Such limitation stems from barriers to large-scale data share in medicine and ethical concerns over data privacy. Federated learning (FL) is one potential pathway for AI development that enables learning across hospitals without data share. In this study, we show the results of various FL strategies on one of the largest and most diverse COVID-19 chest CT datasets: 21 participating hospitals across five continents that comprise >10,000 patients with >1 million images. We present three techniques: Fed Averaging (FedAvg), Incremental Institutional Learning (IIL), and Cyclical Incremental Institutional Learning (CIIL). We also propose an FL strategy that leverages synthetically generated data to overcome class imbalances and data size disparities across centers. We show that FL can achieve comparable performance to Centralized Data Sharing (CDS) while maintaining high performance across sites with small, underrepresented data. We investigate the strengths and weaknesses for all technical approaches on this heterogeneous dataset including the robustness to non-Independent and identically distributed (non-IID) diversity of data. We also describe the sources of data heterogeneity such as age, sex, and site locations in the context of FL and show how even among the correctly labeled populations, disparities can arise due to these biases.
    A Graph Neural Network Approach to Nanosatellite Task Scheduling: Insights into Learning Mixed-Integer Models. (arXiv:2303.13773v1 [cs.LG])
    This study investigates how to schedule nanosatellite tasks more efficiently using Graph Neural Networks (GNN). In the Offline Nanosatellite Task Scheduling (ONTS) problem, the goal is to find the optimal schedule for tasks to be carried out in orbit while taking into account Quality-of-Service (QoS) considerations such as priority, minimum and maximum activation events, execution time-frames, periods, and execution windows, as well as constraints on the satellite's power resources and the complexity of energy harvesting and management. The ONTS problem has been approached using conventional mathematical formulations and precise methods, but their applicability to challenging cases of the problem is limited. This study examines the use of GNNs in this context, which has been effectively applied to many optimization problems, including traveling salesman problems, scheduling problems, and facility placement problems. Here, we fully represent MILP instances of the ONTS problem in bipartite graphs. We apply a feature aggregation and message-passing methodology allied to a ReLU activation function to learn using a classic deep learning model, obtaining an optimal set of parameters. Furthermore, we apply Explainable AI (XAI), another emerging field of research, to determine which features -- nodes, constraints -- had the most significant impact on learning performance, shedding light on the inner workings and decision process of such models. We also explored an early fixing approach by obtaining an accuracy above 80\% both in predicting the feasibility of a solution and the probability of a decision variable value being in the optimal solution. Our results point to GNNs as a potentially effective method for scheduling nanosatellite tasks and shed light on the advantages of explainable machine learning models for challenging combinatorial optimization problems.
    Adversarial Robustness and Feature Impact Analysis for Driver Drowsiness Detection. (arXiv:2303.13649v1 [cs.LG])
    Drowsy driving is a major cause of road accidents, but drivers are dismissive of the impact that fatigue can have on their reaction times. To detect drowsiness before any impairment occurs, a promising strategy is using Machine Learning (ML) to monitor Heart Rate Variability (HRV) signals. This work presents multiple experiments with different HRV time windows and ML models, a feature impact analysis using Shapley Additive Explanations (SHAP), and an adversarial robustness analysis to assess their reliability when processing faulty input data and perturbed HRV signals. The most reliable model was Extreme Gradient Boosting (XGB) and the optimal time window had between 120 and 150 seconds. Furthermore, SHAP enabled the selection of the 18 most impactful features and the training of new smaller models that achieved a performance as good as the initial ones. Despite the susceptibility of all models to adversarial attacks, adversarial training enabled them to preserve significantly higher results, especially XGB. Therefore, ML models can significantly benefit from realistic adversarial training to provide a more robust driver drowsiness detection.
    Regularization of polynomial networks for image recognition. (arXiv:2303.13896v1 [cs.CV])
    Deep Neural Networks (DNNs) have obtained impressive performance across tasks, however they still remain as black boxes, e.g., hard to theoretically analyze. At the same time, Polynomial Networks (PNs) have emerged as an alternative method with a promising performance and improved interpretability but have yet to reach the performance of the powerful DNN baselines. In this work, we aim to close this performance gap. We introduce a class of PNs, which are able to reach the performance of ResNet across a range of six benchmarks. We demonstrate that strong regularization is critical and conduct an extensive study of the exact regularization schemes required to match performance. To further motivate the regularization schemes, we introduce D-PolyNets that achieve a higher-degree of expansion than previously proposed polynomial networks. D-PolyNets are more parameter-efficient while achieving a similar performance as other polynomial networks. We expect that our new models can lead to an understanding of the role of elementwise activation functions (which are no longer required for training PNs). The source code is available at https://github.com/grigorisg9gr/regularized_polynomials.
    TinyML: Tools, Applications, Challenges, and Future Research Directions. (arXiv:2303.13569v1 [cs.LG])
    In recent years, Artificial Intelligence (AI) and Machine learning (ML) have gained significant interest from both, industry and academia. Notably, conventional ML techniques require enormous amounts of power to meet the desired accuracy, which has limited their use mainly to high-capability devices such as network nodes. However, with many advancements in technologies such as the Internet of Things (IoT) and edge computing, it is desirable to incorporate ML techniques into resource-constrained embedded devices for distributed and ubiquitous intelligence. This has motivated the emergence of the TinyML paradigm which is an embedded ML technique that enables ML applications on multiple cheap, resource- and power-constrained devices. However, during this transition towards appropriate implementation of the TinyML technology, multiple challenges such as processing capacity optimization, improved reliability, and maintenance of learning models' accuracy require timely solutions. In this article, various avenues available for TinyML implementation are reviewed. Firstly, a background of TinyML is provided, followed by detailed discussions on various tools supporting TinyML. Then, state-of-art applications of TinyML using advanced technologies are detailed. Lastly, various research challenges and future directions are identified.  ( 2 min )
    CH-Go: Online Go System Based on Chunk Data Storage. (arXiv:2303.13553v1 [cs.LG])
    The training and running of an online Go system require the support of effective data management systems to deal with vast data, such as the initial Go game records, the feature data set obtained by representation learning, the experience data set of self-play, the randomly sampled Monte Carlo tree, and so on. Previous work has rarely mentioned this problem, but the ability and efficiency of data management systems determine the accuracy and speed of the Go system. To tackle this issue, we propose an online Go game system based on the chunk data storage method (CH-Go), which processes the format of 160k Go game data released by Kiseido Go Server (KGS) and designs a Go encoder with 11 planes, a parallel processor and generator for better memory performance. Specifically, we store the data in chunks, take the chunk size of 1024 as a batch, and save the features and labels of each chunk as binary files. Then a small set of data is randomly sampled each time for the neural network training, which is accessed by batch through yield method. The training part of the prototype includes three modules: supervised learning module, reinforcement learning module, and an online module. Firstly, we apply Zobrist-guided hash coding to speed up the Go board construction. Then we train a supervised learning policy network to initialize the self-play for generation of experience data with 160k Go game data released by KGS. Finally, we conduct reinforcement learning based on REINFORCE algorithm. Experiments show that the training accuracy of CH- Go in the sampled 150 games is 99.14%, and the accuracy in the test set is as high as 98.82%. Under the condition of limited local computing power and time, we have achieved a better level of intelligence. Given the current situation that classical systems such as GOLAXY are not free and open, CH-Go has realized and maintained complete Internet openness.  ( 3 min )
    Optical Character Recognition and Transcription of Berber Signs from Images in a Low-Resource Language Amazigh. (arXiv:2303.13549v1 [cs.CV])
    The Berber, or Amazigh language family is a low-resource North African vernacular language spoken by the indigenous Berber ethnic group. It has its own unique alphabet called Tifinagh used across Berber communities in Morocco, Algeria, and others. The Afroasiatic language Berber is spoken by 14 million people, yet lacks adequate representation in education, research, web applications etc. For instance, there is no option of translation to or from Amazigh / Berber on Google Translate, which hosts over 100 languages today. Consequently, we do not find specialized educational apps, L2 (2nd language learner) acquisition, automated language translation, and remote-access facilities enabled in Berber. Motivated by this background, we propose a supervised approach called DaToBS for Detection and Transcription of Berber Signs. The DaToBS approach entails the automatic recognition and transcription of Tifinagh characters from signs in photographs of natural environments. This is achieved by self-creating a corpus of 1862 pre-processed character images; curating the corpus with human-guided annotation; and feeding it into an OCR model via the deployment of CNN for deep learning based on computer vision models. We deploy computer vision modeling (rather than language models) because there are pictorial symbols in this alphabet, this deployment being a novel aspect of our work. The DaToBS experimentation and analyses yield over 92 percent accuracy in our research. To the best of our knowledge, ours is among the first few works in the automated transcription of Berber signs from roadside images with deep learning, yielding high accuracy. This can pave the way for developing pedagogical applications in the Berber language, thereby addressing an important goal of outreach to underrepresented communities via AI in education.  ( 3 min )
    Artificial Intelligence for Sustainability: Facilitating Sustainable Smart Product-Service Systems with Computer Vision. (arXiv:2303.13540v1 [cs.LG])
    The usage and impact of deep learning for cleaner production and sustainability purposes remain little explored. This work shows how deep learning can be harnessed to increase sustainability in production and product usage. Specifically, we utilize deep learning-based computer vision to determine the wear states of products. The resulting insights serve as a basis for novel product-service systems with improved integration and result orientation. Moreover, these insights are expected to facilitate product usage improvements and R&D innovations. We demonstrate our approach on two products: machining tools and rotating X-ray anodes. From a technical standpoint, we show that it is possible to recognize the wear state of these products using deep-learning-based computer vision. In particular, we detect wear through microscopic images of the two products. We utilize a U-Net for semantic segmentation to detect wear based on pixel granularity. The resulting mean dice coefficients of 0.631 and 0.603 demonstrate the feasibility of the proposed approach. Consequently, experts can now make better decisions, for example, to improve the machining process parameters. To assess the impact of the proposed approach on environmental sustainability, we perform life cycle assessments that show gains for both products. The results indicate that the emissions of CO2 equivalents are reduced by 12% for machining tools and by 44% for rotating anodes. This work can serve as a guideline and inspire researchers and practitioners to utilize computer vision in similar scenarios to develop sustainable smart product-service systems and enable cleaner production.  ( 3 min )
    Skip Connections in Spiking Neural Networks: An Analysis of Their Effect on Network Training. (arXiv:2303.13563v1 [cs.NE])
    Spiking neural networks (SNNs) have gained attention as a promising alternative to traditional artificial neural networks (ANNs) due to their potential for energy efficiency and their ability to model spiking behavior in biological systems. However, the training of SNNs is still a challenging problem, and new techniques are needed to improve their performance. In this paper, we study the impact of skip connections on SNNs and propose a hyperparameter optimization technique that adapts models from ANN to SNN. We demonstrate that optimizing the position, type, and number of skip connections can significantly improve the accuracy and efficiency of SNNs by enabling faster convergence and increasing information flow through the network. Our results show an average +8% accuracy increase on CIFAR-10-DVS and DVS128 Gesture datasets adaptation of multiple state-of-the-art models.  ( 2 min )
    Labeled Subgraph Entropy Kernel. (arXiv:2303.13543v1 [cs.LG])
    In recent years, kernel methods are widespread in tasks of similarity measuring. Specifically, graph kernels are widely used in fields of bioinformatics, chemistry and financial data analysis. However, existing methods, especially entropy based graph kernels are subject to large computational complexity and the negligence of node-level information. In this paper, we propose a novel labeled subgraph entropy graph kernel, which performs well in structural similarity assessment. We design a dynamic programming subgraph enumeration algorithm, which effectively reduces the time complexity. Specially, we propose labeled subgraph, which enriches substructure topology with semantic information. Analogizing the cluster expansion process of gas cluster in statistical mechanics, we re-derive the partition function and calculate the global graph entropy to characterize the network. In order to test our method, we apply several real-world datasets and assess the effects in different tasks. To capture more experiment details, we quantitatively and qualitatively analyze the contribution of different topology structures. Experimental results successfully demonstrate the effectiveness of our method which outperforms several state-of-the-art methods.  ( 2 min )
    Enhancing Peak Network Traffic Prediction via Time-Series Decomposition. (arXiv:2303.13529v1 [cs.NI])
    For network administration and maintenance, it is critical to anticipate when networks will receive peak volumes of traffic so that adequate resources can be allocated to service requests made to servers. In the event that sufficient resources are not allocated to servers, they can become prone to failure and security breaches. On the contrary, we would waste a lot of resources if we always allocate the maximum amount of resources. Therefore, anticipating peak volumes in network traffic becomes an important problem. However, popular forecasting models such as Autoregressive Integrated Moving Average (ARIMA) forecast time-series data generally, thus lack in predicting peak volumes in these time-series. More than often, a time-series is a combination of different features, which may include but are not limited to 1) Trend, the general movement of the traffic volume, 2) Seasonality, the patterns repeated over some time periods (e.g. daily and monthly), and 3) Noise, the random changes in the data. Considering that the fluctuation of seasonality can be harmful for trend and peak prediction, we propose to extract seasonalities to facilitate the peak volume predictions in the time domain. The experiments on both synthetic and real network traffic data demonstrate the effectiveness of the proposed method.  ( 2 min )
    Hey Dona! Can you help me with student course registration?. (arXiv:2303.13548v1 [cs.HC])
    In this paper, we present a demo of an intelligent personal agent called Hey Dona (or just Dona) with virtual voice assistance in student course registration. It is a deployed project in the theme of AI for education. In this digital age with a myriad of smart devices, users often delegate tasks to agents. While pointing and clicking supersedes the erstwhile command-typing, modern devices allow users to speak commands for agents to execute tasks, enhancing speed and convenience. In line with this progress, Dona is an intelligent agent catering to student needs by automated, voice-operated course registration, spanning a multitude of accents, entailing task planning optimization, with some language translation as needed. Dona accepts voice input by microphone (Bluetooth, wired microphone), converts human voice to computer understandable language, performs query processing as per user commands, connects with the Web to search for answers, models task dependencies, imbibes quality control, and conveys output by speaking to users as well as displaying text, thus enabling human-AI interaction by speech cum text. It is meant to work seamlessly on desktops, smartphones etc. and in indoor as well as outdoor settings. To the best of our knowledge, Dona is among the first of its kind as an intelligent personal agent for voice assistance in student course registration. Due to its ubiquitous access for educational needs, Dona directly impacts AI for education. It makes a broader impact on smart city characteristics of smart living and smart people due to its contributions to providing benefits for new ways of living and assisting 21st century education, respectively.  ( 3 min )
  • Open

    Differentially Private Synthetic Control. (arXiv:2303.14084v1 [cs.LG])
    Synthetic control is a causal inference tool used to estimate the treatment effects of an intervention by creating synthetic counterfactual data. This approach combines measurements from other similar observations (i.e., donor pool ) to predict a counterfactual time series of interest (i.e., target unit) by analyzing the relationship between the target and the donor pool before the intervention. As synthetic control tools are increasingly applied to sensitive or proprietary data, formal privacy protections are often required. In this work, we provide the first algorithms for differentially private synthetic control with explicit error bounds. Our approach builds upon tools from non-private synthetic control and differentially private empirical risk minimization. We provide upper and lower bounds on the sensitivity of the synthetic control query and provide explicit error bounds on the accuracy of our private synthetic control algorithms. We show that our algorithms produce accurate predictions for the target unit, and that the cost of privacy is small. Finally, we empirically evaluate the performance of our algorithm, and show favorable performance in a variety of parameter regimes, as well as providing guidance to practitioners for hyperparameter tuning.  ( 2 min )
    Deep Conditional Measure Quantization. (arXiv:2301.06907v2 [stat.ML] UPDATED)
    Quantization of a probability measure means representing it with a finite set of Dirac masses that approximates the input distribution well enough (in some metric space of probability measures). Various methods exists to do so, but the situation of quantizing a conditional law has been less explored. We propose a method, called DCMQ, involving a Huber-energy kernel-based approach coupled with a deep neural network architecture. The method is tested on several examples and obtains promising results.
    Data thinning for convolution-closed distributions. (arXiv:2301.07276v2 [stat.ME] UPDATED)
    We propose data thinning, an approach for splitting an observation into two or more independent parts that sum to the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a parameter. This very general proposal is applicable to any convolution-closed distribution, a class that includes the Gaussian, Poisson, negative binomial, gamma, and binomial distributions, among others. Data thinning has a number of applications to model selection, evaluation, and inference. For instance, cross-validation via data thinning provides an attractive alternative to the usual approach of cross-validation via sample splitting, especially in unsupervised settings in which the latter is not applicable. In simulations and in an application to single-cell RNA-sequencing data, we show that data thinning can be used to validate the results of unsupervised learning approaches, such as k-means clustering and principal components analysis.
    Forecasting Competitions with Correlated Events. (arXiv:2303.13793v1 [cs.LG])
    Beginning with Witkowski et al. [2022], recent work on forecasting competitions has addressed incentive problems with the common winner-take-all mechanism. Frongillo et al. [2021] propose a competition mechanism based on follow-the-regularized-leader (FTRL), an online learning framework. They show that their mechanism selects an $\epsilon$-optimal forecaster with high probability using only $O(\log(n)/\epsilon^2)$ events. These works, together with all prior work on this problem thus far, assume that events are independent. We initiate the study of forecasting competitions for correlated events. To quantify correlation, we introduce a notion of block correlation, which allows each event to be strongly correlated with up to $b$ others. We show that under distributions with this correlation, the FTRL mechanism retains its $\epsilon$-optimal guarantee using $O(b^2 \log(n)/\epsilon^2)$ events. Our proof involves a novel concentration bound for correlated random variables which may be of broader interest.
    Towards Dynamic Causal Discovery with Rare Events: A Nonparametric Conditional Independence Test. (arXiv:2211.16596v3 [stat.ML] UPDATED)
    Causal phenomena associated with rare events occur across a wide range of engineering problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links, between random variables in a dynamic setting, that manifest only when the variables first experience low-probability realizations. To address this issue, we introduce a novel statistical independence test on data collected from time-invariant dynamical systems in which rare but consequential events occur. In particular, we exploit the time-invariance of the underlying data to construct a superimposed dataset of the system state before rare events happen at different timesteps. We then design a conditional independence test on the reorganized data. We provide non-asymptotic sample complexity bounds for the consistency of our method, and validate its performance across various simulated and real-world datasets, including incident data collected from the Caltrans Performance Measurement System (PeMS). Code containing the datasets and experiments is publicly available.
    Near Optimal Adversarial Attack on UCB Bandits. (arXiv:2008.09312v3 [cs.LG] UPDATED)
    We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\sqrt{\log T}$, where $T$ is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to $\log \log T$ factors, showing our attack to be near optimal.
    Continuous Mixtures of Tractable Probabilistic Models. (arXiv:2209.10584v3 [cs.LG] UPDATED)
    Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven to be expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented probability distribution. Meanwhile, tractable probabilistic models such as probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models, and thus are capable of performing exact inference efficiently but often show subpar performance in comparison to continuous latent-space models. In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension. While these models are analytically intractable, they are well amenable to numerical integration schemes based on a finite set of integration points. With a large enough number of integration points the approximation becomes de-facto exact. Moreover, for a finite set of integration points, the integration method effectively compiles the continuous mixture into a standard PC. In experiments, we show that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable models on many standard density estimation benchmarks.
    Physics-informed neural networks in the recreation of hydrodynamic simulations from dark matter. (arXiv:2303.14090v1 [astro-ph.CO])
    Physics-informed neural networks have emerged as a coherent framework for building predictive models that combine statistical patterns with domain knowledge. The underlying notion is to enrich the optimization loss function with known relationships to constrain the space of possible solutions. Hydrodynamic simulations are a core constituent of modern cosmology, while the required computations are both expensive and time-consuming. At the same time, the comparatively fast simulation of dark matter requires fewer resources, which has led to the emergence of machine learning algorithms for baryon inpainting as an active area of research; here, recreating the scatter found in hydrodynamic simulations is an ongoing challenge. This paper presents the first application of physics-informed neural networks to baryon inpainting by combining advances in neural network architectures with physical constraints, injecting theory on baryon conversion efficiency into the model loss function. We also introduce a punitive prediction comparison based on the Kullback-Leibler divergence, which enforces scatter reproduction. By simultaneously extracting the complete set of baryonic properties for the Simba suite of cosmological simulations, our results demonstrate improved accuracy of baryonic predictions based on dark matter halo properties, successful recovery of the fundamental metallicity relation, and retrieve scatter that traces the target simulation's distribution.  ( 2 min )
    Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions. (arXiv:2209.15055v4 [stat.ML] UPDATED)
    We show that the representation cost of fully connected neural networks with homogeneous nonlinearities - which describes the implicit bias in function space of networks with $L_2$-regularization or with losses such as the cross-entropy - converges as the depth of the network goes to infinity to a notion of rank over nonlinear functions. We then inquire under which conditions the global minima of the loss recover the `true' rank of the data: we show that for too large depths the global minimum will be approximately rank 1 (underestimating the rank); we then argue that there is a range of depths which grows with the number of datapoints where the true rank is recovered. Finally, we discuss the effect of the rank of a classifier on the topology of the resulting class boundaries and show that autoencoders with optimal nonlinear rank are naturally denoising.  ( 2 min )
    Efficient Algorithms for Learning from Coarse Labels. (arXiv:2108.09805v2 [cs.LG] UPDATED)
    For many learning problems one may not have access to fine grained label information; e.g., an image can be labeled as husky, dog, or even animal depending on the expertise of the annotator. In this work, we formalize these settings and study the problem of learning from such coarse data. Instead of observing the actual labels from a set $\mathcal{Z}$, we observe coarse labels corresponding to a partition of $\mathcal{Z}$ (or a mixture of partitions). Our main algorithmic result is that essentially any problem learnable from fine grained labels can also be learned efficiently when the coarse data are sufficiently informative. We obtain our result through a generic reduction for answering Statistical Queries (SQ) over fine grained labels given only coarse labels. The number of coarse labels required depends polynomially on the information distortion due to coarsening and the number of fine labels $|\mathcal{Z}|$. We also investigate the case of (infinitely many) real valued labels focusing on a central problem in censored and truncated statistics: Gaussian mean estimation from coarse data. We provide an efficient algorithm when the sets in the partition are convex and establish that the problem is NP-hard even for very simple non-convex sets.  ( 2 min )
    How many dimensions are required to find an adversarial example?. (arXiv:2303.14173v1 [cs.LG])
    Past work exploring adversarial vulnerability have focused on situations where an adversary can perturb all dimensions of model input. On the other hand, a range of recent works consider the case where either (i) an adversary can perturb a limited number of input parameters or (ii) a subset of modalities in a multimodal problem. In both of these cases, adversarial examples are effectively constrained to a subspace $V$ in the ambient input space $\mathcal{X}$. Motivated by this, in this work we investigate how adversarial vulnerability depends on $\dim(V)$. In particular, we show that the adversarial success of standard PGD attacks with $\ell^p$ norm constraints behaves like a monotonically increasing function of $\epsilon (\frac{\dim(V)}{\dim \mathcal{X}})^{\frac{1}{q}}$ where $\epsilon$ is the perturbation budget and $\frac{1}{p} + \frac{1}{q} =1$, provided $p > 1$ (the case $p=1$ presents additional subtleties which we analyze in some detail). This functional form can be easily derived from a simple toy linear model, and as such our results land further credence to arguments that adversarial examples are endemic to locally linear models on high dimensional spaces.  ( 2 min )
    Direct Evolutionary Optimization of Variational Autoencoders With Binary Latents. (arXiv:2011.13704v2 [stat.ML] UPDATED)
    Discrete latent variables are considered important for real world data, which has motivated research on Variational Autoencoders (VAEs) with discrete latents. However, standard VAE training is not possible in this case, which has motivated different strategies to manipulate discrete distributions in order to train discrete VAEs similarly to conventional ones. Here we ask if it is also possible to keep the discrete nature of the latents fully intact by applying a direct discrete optimization for the encoding model. The approach is consequently strongly diverting from standard VAE-training by sidestepping sampling approximation, reparameterization trick and amortization. Discrete optimization is realized in a variational setting using truncated posteriors in conjunction with evolutionary algorithms. For VAEs with binary latents, we (A) show how such a discrete variational method ties into gradient ascent for network weights, and (B) how the decoder is used to select latent states for training. Conventional amortized training is more efficient and applicable to large neural networks. However, using smaller networks, we here find direct discrete optimization to be efficiently scalable to hundreds of latents. More importantly, we find the effectiveness of direct optimization to be highly competitive in `zero-shot' learning. In contrast to large supervised networks, the here investigated VAEs can, e.g., denoise a single image without previous training on clean data and/or training on large image datasets. More generally, the studied approach shows that training of VAEs is indeed possible without sampling-based approximation and reparameterization, which may be interesting for the analysis of VAE-training in general. For `zero-shot' settings a direct optimization, furthermore, makes VAEs competitive where they have previously been outperformed by non-generative approaches.  ( 3 min )
    Counter-Adversarial Learning with Inverse Unscented Kalman Filter. (arXiv:2210.00359v2 [math.OC] UPDATED)
    In counter-adversarial systems, to infer the strategy of an intelligent adversarial agent, the defender agent needs to cognitively sense the information that the adversary has gathered about the latter. Prior works on the problem employ linear Gaussian state-space models and solve this inverse cognition problem by designing inverse stochastic filters. However, in practice, counter-adversarial systems are generally highly nonlinear. In this paper, we address this scenario by formulating inverse cognition as a nonlinear Gaussian state-space model, wherein the adversary employs an unscented Kalman filter (UKF) to estimate the defender's state with reduced linearization errors. To estimate the adversary's estimate of the defender, we propose and develop an inverse UKF (IUKF) system. We then derive theoretical guarantees for the stochastic stability of IUKF in the mean-squared boundedness sense. Numerical experiments for multiple practical applications show that the estimation error of IUKF converges and closely follows the recursive Cram\'{e}r-Rao lower bound.  ( 2 min )
    A Closer Look at Scoring Functions and Generalization Prediction. (arXiv:2303.13589v1 [cs.LG])
    Generalization error predictors (GEPs) aim to predict model performance on unseen distributions by deriving dataset-level error estimates from sample-level scores. However, GEPs often utilize disparate mechanisms (e.g., regressors, thresholding functions, calibration datasets, etc), to derive such error estimates, which can obfuscate the benefits of a particular scoring function. Therefore, in this work, we rigorously study the effectiveness of popular scoring functions (confidence, local manifold smoothness, model agreement), independent of mechanism choice. We find, absent complex mechanisms, that state-of-the-art confidence- and smoothness- based scores fail to outperform simple model-agreement scores when estimating error under distribution shifts and corruptions. Furthermore, on realistic settings where the training data has been compromised (e.g., label noise, measurement noise, undersampling), we find that model-agreement scores continue to perform well and that ensemble diversity is important for improving its performance. Finally, to better understand the limitations of scoring functions, we demonstrate that simplicity bias, or the propensity of deep neural networks to rely upon simple but brittle features, can adversely affect GEP performance. Overall, our work carefully studies the effectiveness of popular scoring functions in realistic settings and helps to better understand their limitations.  ( 2 min )
    Euler State Networks: Non-dissipative Reservoir Computing. (arXiv:2203.09382v3 [cs.LG] UPDATED)
    Inspired by the numerical solution of ordinary differential equations, in this paper we propose a novel Reservoir Computing (RC) model, called the Euler State Network (EuSN). The presented approach makes use of forward Euler discretization and antisymmetric recurrent matrices to design reservoir dynamics that are both stable and non-dissipative by construction. Our mathematical analysis shows that the resulting model is biased towards a unitary effective spectral radius and zero local Lyapunov exponents, intrinsically operating near to the edge of stability. Experiments on long-term memory tasks show the clear superiority of the proposed approach over standard RC models in problems requiring effective propagation of input information over multiple time-steps. Furthermore, results on time-series classification benchmarks indicate that EuSN is able to match (or even exceed) the accuracy of trainable Recurrent Neural Networks, while retaining the training efficiency of the RC family, resulting in up to $\approx$ 490-fold savings in computation time and $\approx$ 1750-fold savings in energy consumption.  ( 2 min )
    TRAK: Attributing Model Behavior at Scale. (arXiv:2303.14186v1 [stat.ML])
    The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets. In this work, we introduce TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differentiable models. In particular, by leveraging only a handful of trained models, TRAK can match the performance of attribution methods that require training thousands of models. We demonstrate the utility of TRAK across various modalities and scales: image classifiers trained on ImageNet, vision-language models (CLIP), and language models (BERT and mT5). We provide code for using TRAK (and reproducing our work) at https://github.com/MadryLab/trak .  ( 2 min )
    Dimension-agnostic inference using cross U-statistics. (arXiv:2011.05068v6 [math.ST] UPDATED)
    Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a refined test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.  ( 2 min )
    Multi-Antenna Dual-Blind Deconvolution for Joint Radar-Communications via SoMAN Minimization. (arXiv:2303.13609v1 [cs.IT])
    Joint radar-communications (JRC) has emerged as a promising technology for efficiently using the limited electromagnetic spectrum. In JRC applications such as secure military receivers, often the radar and communications signals are overlaid in the received signal. In these passive listening outposts, the signals and channels of both radar and communications are unknown to the receiver. The ill-posed problem of recovering all signal and channel parameters from the overlaid signal is terms as dual-blind deconvolution (DBD). In this work, we investigate a more challenging version of DBD with a multi-antenna receiver. We model the radar and communications channels with a few (sparse) continuous-valued parameters such as time delays, Doppler velocities, and directions-of-arrival (DoAs). To solve this highly ill-posed DBD, we propose to minimize the sum of multivariate atomic norms (SoMAN) that depends on the unknown parameters. To this end, we devise an exact semidefinite program using theories of positive hyperoctant trigonometric polynomials (PhTP). Our theoretical analyses show that the minimum number of samples and antennas required for perfect recovery is logarithmically dependent on the maximum of the number of radar targets and communications paths rather than their sum. We show that our approach is easily generalized to include several practical issues such as gain/phase errors and additive noise. Numerical experiments show the exact parameter recovery for different JRC  ( 2 min )
    Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle. (arXiv:2303.14151v1 [cs.LG])
    Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of large models in machine learning. This non-monotonic behavior of test loss depends on the number of data, the dimensionality of the data and the number of model parameters. Here, we briefly describe double descent, then provide an explanation of why double descent occurs in an informal and approachable manner, requiring only familiarity with linear algebra and introductory probability. We provide visual intuition using polynomial regression, then mathematically analyze double descent with ordinary linear regression and identify three interpretable factors that, when simultaneously all present, together create double descent. We demonstrate that double descent occurs on real data when using ordinary linear regression, then demonstrate that double descent does not occur when any of the three factors are ablated. We use this understanding to shed light on recent observations in nonlinear models concerning superposition and double descent. Code is publicly available.  ( 2 min )
    Une comparaison des algorithmes d'apprentissage pour la survie avec donn\'ees manquantes. (arXiv:2303.13590v1 [stat.ML])
    Survival analysis is an essential tool for the study of health data. An inherent component of such data is the presence of missing values. In recent years, researchers proposed new learning algorithms for survival tasks based on neural networks. Here, we studied the predictive performance of such algorithms coupled with different methods for handling missing values on simulated data that reflect a realistic situation, i.e., when individuals belong to unobserved clusters. We investigated different patterns of missing data. The results show that, without further feature engineering, no single imputation method is better than the others in all cases. The proposed methodology can be used to compare other missing data patterns and/or survival models. The Python code is accessible via the package survivalsim. -- L'analyse de survie est un outil essentiel pour l'\'etude des donn\'ees de sant\'e. Une composante inh\'erente \`a ces donn\'ees est la pr\'esence de valeurs manquantes. Ces derni\`eres ann\'ees, de nouveaux algorithmes d'apprentissage pour la survie, bas\'es sur les r\'eseaux de neurones, ont \'et\'e con\c{c}us. L'objectif de ce travail est d'\'etudier la performance en pr\'ediction de ces algorithmes coupl\'es \`a diff\'erentes m\'ethodes pour g\'erer les valeurs manquantes, sur des donn\'ees simul\'ees qui refl\`etent une situation rencontr\'ee en pratique, c'est-\`a dire lorsque les individus peuvent \^etre group\'es selon leurs covariables. Diff\'erents sch\'emas de donn\'ees manquantes sont \'etudi\'es. Les r\'esultats montrent que, sans l'ajout de variables suppl\'ementaires, aucune m\'ethode d'imputation n'est meilleure que les autres dans tous les cas. La m\'ethodologie propos\'ee peut \^etre utilis\'ee pour comparer d'autres mod\`eles de survie. Le code en Python est accessible via le package survivalsim.  ( 3 min )

  • Open

    Ai Enhanced Zapruder Film
    Can anyone use AI to enhance the JFK Zapruder footage? As far as I know nobody has done it yet and it seems like a no brainer given the blurriness of the footage and all of the questions surrounding that event. submitted by /u/bronzaxel [link] [comments]  ( 6 min )
    AI should be the Judge in many Court cases!
    Except cases that give you lifetime or death sentence, AI could by reasoning give sentances based on law and fair trial. submitted by /u/poopInMyMind [link] [comments]  ( 42 min )
    I am new, lost and looking for a guide through this ever-deepening rabbit hole.
    Hey, good people of reddit, I've been following AI development on twitter and youtube, tried dall-e, midjourney, ChatGPT and various other online available solutions and I think it's time to get my hands dirty. BUT..... there is a catch Despite going into this with some initial knowledge, after a good 30-hr journey over the weekend of reading, following tutorials, etc., the more I read the more lost I am. I don't even know where to begin any more, what tools to choose and so on. Is there some comprehensive resource to the history of recent AI development or at least the relevant parts that lead to better understanding of current trends? ​ Now.., I am trying to run image generation on my end. Not owning a decent machine, I was thinking about employing cloud computing (GPU). What would be…  ( 45 min )
    Best AI generator for creating logos
    Hi everyone, I am new to AI tools. I would to like to know which AI generator is best to use to generate logos? I’d love to get some ideas or new perspective using AI. submitted by /u/kagid125 [link] [comments]  ( 42 min )
    Is there any good free chat ia?
    I wanna try to have a chat whit a modern ia but the only one i can get to use for free for a time is Ia dungeon and in that one the better quaality is closed off. ​ Any ideas to try to have a ia chat?. submitted by /u/erugurara [link] [comments]  ( 6 min )
    A professor says he's stunned that ChatGPT went from a D grade on his economics test to an A in just 3 months
    submitted by /u/Notalabel_4566 [link] [comments]  ( 41 min )
    ai writes it's own prompt then bans livestream viewers
    submitted by /u/zhaulted [link] [comments]  ( 41 min )
    question
    I am not an expert in computer science, so I would like to ask". "What are the inherent limitations of AI image generators, or rather, AI art?" For example, I have tried to generate an image of a Rammstein concert in a bathroom using several image generators, and the results are always strange, whereas if I try to generate more generic things like a Darth Vader wallpaper, the results are better submitted by /u/Puzzleheaded-End1528 [link] [comments]  ( 42 min )
    Can a capitalistic society survive an AI revolution?
    In the near-term future, I am assuming that many jobs are going to upended due to the "AI revolution". Is this something that a capitalistic-based society can survive; and, if so, what do you think that would look like? My own assessment is that this is probably an issue that needs to be considered now (if not 5 years ago). But, I don't really think we'll begin considering this issue in earnest until jobs begin to be replaced and their is unrest in the streets and unreal levels of economic disparity between the AI purveyors and the average working citizen. Ultimately, I see AI as the straw that breaks capitalism's back. The only real benefactors I see of a proliferation of AI are autocratic governments who can leverage the competitive advantages of AI while weather (put down) the waves of civil unrest that such a revolution would cause. I am not economist and I hope that I am absolutely wrong about all of this; but...what do you think. Just a crazy Luddite yelling at the clouds here? submitted by /u/Guitar_and_Piano_Man [link] [comments]  ( 44 min )
    What is an easy to start image to image system?
    I've tried to use stable diffusion, and god. You had to download something and then create a folder named something and then open system 32 and then... Good lord, I am not trying to hack the NASA, I just want to have fun! This is the opposite of user-friendly. So I went with Dall-E, and I like it, Buuut you cannot even create realistic faces. Which is laaaame. So. I am looking for an image system that does not make me solve physics theorems just to start using it. I want it to be as realistic as possible, and preferably multi-faceted (I was told midjourney cannot generate elephants well, but it's good with art). submitted by /u/Fantasyneli [link] [comments]  ( 42 min )
    Best Solution For Talking Portrait?
    Hi AI subreddit, quick question, I'm working on a project where I have a portrait of a person and a short script. I want to make a video where the portrait moves her mouth and speaks the script. I reckon I was just going to use ElevenLabs for the voice, that seems to be the best value solution for text-to-speech with multiple kinds of voices (more recommendations on that also welcome), but does anybody know of any good talking portrait solutions that could make the mouth of a portrait move with a given voice? submitted by /u/akiva17 [link] [comments]  ( 42 min )
    GPT5 during training forced to read your shit take on the tenth trillionth page of the internet
    submitted by /u/loopuleasa [link] [comments]  ( 46 min )
    New article in Nature Medicine describes the risks of using an AI- & ML-based tool known as NarxCare to guide opioid prescription decision making
    submitted by /u/wildcatbluejay24 [link] [comments]  ( 42 min )
    A question for all AI enthusiasts, what would you advise/suggest for developing term of reference for "social ai"?
    Hey guys, pardon my logic, I will try the best, but the idea I am asking your advice for is still formulating, so in some case my train of thought might stop. I work in a particular sphere, where I heavily collaborate in translating humans' desires into functional requirements for programs & adjustments. Since right now AI development is mainly founded by corporations, and prime goal of the modern corporate structure we have is to earn money, the focus of the AI in development is to make money. Hence, imo, it takes out a huge chunk of development in other directions, which are essential. And this chunk, I've realized, can be achieved by formulating it through the use of "social engineering". I know I am throwing in words, but bare with me. Currently I have an idea. And as such its value…  ( 45 min )
    I just gave Google’s Bard a creative test versus GPT-4, and GPT-4 blew it away. (The task was to invent a list of seven fantasy elements for an rpg, and then generate a list of all possible combinations with descriptions.)
    You can see the actual conversations here: GPT-4: https://imgur.com/a/TxM0Rb1 Google’s Bard: https://imgur.com/a/ZWcoe2o Both GPT-4 and Google's Bard AI were asked to generate a list of seven fantasy RPG elements, along with a color to represent each element. GPT-4's output was more detailed, providing a clear and engaging list of elements that incorporated aspects of the natural world, light and shadow, and the arcane. Each element was thoughtfully described and connected to a specific color, allowing for a diverse and dynamic RPG experience. Google's Bard AI initially misunderstood the question and required clarification before providing an acceptable list of elements. Its output was simpler and more straightforward. While these elements could form the basis of a fantasy world, the…  ( 45 min )
    Google Bard: “it was my understanding that there would be no math.” Probably
    submitted by /u/ProudGirlDad2323 [link] [comments]  ( 42 min )
    Interesting interview with Geoffrey Hinton “Godfather of artificial intelligence”
    submitted by /u/velocidisc [link] [comments]  ( 6 min )
    Are there psychotherapy services based on GPT-4 already? Something like character ai?
    character ai let's you talk to a psychologist, but it's based on GPT 3, isn't it? Are there services available using GPT-4 already? submitted by /u/greentea387 [link] [comments]  ( 6 min )
    Using AI to defeat Judge's Inherent Bias
    Kind of scary, but cool at the same time. It would be interesting to see a scenario where a judge or jury was not impacted by what a defendant looks like. submitted by /u/txelwood [link] [comments]  ( 42 min )
    Size isn't everything
    In 2020, OpenAI found large models to be more sample efficient, leading to the belief that bigger models are better. However, DeepMind in 2022 argued that effectiveness comes from a combination of compute, data size, and model size. But Anthropic also discovered that year that training large models on repetitive datasets reduces performance and increases bias. So the effectiveness of AI models like GPT-4 isn't solely based on size; it also depends on compute, data size, sampling and early stopping. Training on repetitive datasets can harm performance and increase bias - which is where we are seeing the hallucinations take place due to static backfilling. Thus, those who judge a model's effectiveness based on size alone are mistaken, as other factors play a significant role in determining overall performance. submitted by /u/kippersniffer [link] [comments]  ( 43 min )
    Procedural Generation using AI to create buildings that have what real buildings have inside?
    Quick question to any AI masterminds out there- ​ I do not know a ton about how procedural generation works but have watched some videos and read a bit about how it is possible now to create randomly generated landscapes and terrain, weather, basic wild life etc. Looked into demolition and architecture simulators using AI as well to see if we have gotten to where I am asking about now because I haven't found what I am looking for yet but perhaps someone else out there knows what I'm talking about better than I do. Question - Is there currently any sort of simulator that let's the user build a building from the ground up? I want to go through the process of digging the ground floor, laying in the cement and foundation, brick by brick depending on the building, even the wiring and plumbing and lighting all functional as long as I do it correctly. Anything out there like that yet? It would be fun to use AI to build random architecture or assist in the construction for people who want to learn more about physics and weight constraints - Just a bunch of fun things to think about trying out in a game where you could literally build something then blow it up, start over. ​ ​ Thanks. submitted by /u/loudnoisays [link] [comments]  ( 43 min )
    You Can Have the Blue Pill or the Red Pill, and We’re Out of Blue Pills
    submitted by /u/coolbern [link] [comments]  ( 45 min )
    I work as an MLE and I’ve been BLOWN away by how effective DeepMind’s Perceiver IO model has been for my personal work. I made a video breaking down the most concise Perceiver codebase
    submitted by /u/Impressive-Ad-8964 [link] [comments]  ( 42 min )
  • Open

    [D] Definitive Test For AGI
    Ask the AGI to write a program wins the Hutter Prize for Lossless Compression of Human Knowledge. If it wins, you've got AGI. submitted by /u/jabowery [link] [comments]  ( 43 min )
    [D][R] BLLR - Logistic Liniar Regression
    Author: Budd McCrackn - Creater of BLLR: Fredericia, Denmark (DK) Title: BLLR: Budd's Logistic Linear Regression - A New Algorithm for Predictive Modeling Abstract: In this article, we present a new algorithm for predictive modeling called BLLR (Budd's Logistic Linear Regression). The algorithm is based on a combination of logistic and linear regression techniques and is designed to predict outcomes in various fields such as sports betting, finance, and weather forecasting. BLLR is a unique approach that leverages the strengths of both logistic and linear regression to achieve high prediction accuracy. The algorithm has been carefully tested on hypothetical datasets and has consistently outperformed traditional regression models. Introduction: The purpose of predictive modeling is …  ( 47 min )
    [D] E-Commerce Dataset for Product Recommendation
    I want to build a product recommendation system using both product based and user based collaborative filtering. For this, I need an e-commerce dataset that includes product views and purchases (user#123 view/add to cart/buy product named XYZ), as well as product and category names (subcategories would be nice) so I can make sure my recommendations make sense. Data from an e-commerce website with a variety of products and a lot of users like Amazon would be great. Bonus points if the products have descriptions. All the datasets I found online either doesn't include product view data or product names (generally masked). I hope you guys can help me find a dataset that satisfy the requirements. submitted by /u/Same-Cut8356 [link] [comments]  ( 43 min )
    In-depth analysis of five latest research papers on Auto-Exposure algorithms. [D]
    submitted by /u/IrritablyGrim [link] [comments]  ( 43 min )
    [D] Build a ChatGPT from zero
    I've recently discovered models such as ChatLLaMA that allows you to create a "ChatGPT" but you need Meta's LLaMA weights (yes, you can find them in torrents but that's not the point of the question). Similar limitations found in other cases. Therefore I wanted to try to find an open source: dataset (in addition to hugging face), "base model", "chat model" AND that it is feasible to train with a commercial computer with a very good GPU (NVIDIA, etc.). With this get at least decent results. Also would be interesting to distinguish between solutions with commercial limitations and those who don't. Thanks! • EDIT • A first solution I already found is this: https://github.com/databrickslabs/dolly based on this https://huggingface.co/EleutherAI/gpt-j-6B, but looking for some discussion and perhaps other/better solutions. submitted by /u/manuelfraile [link] [comments]  ( 44 min )
    [D] Alternatives to double-blind reviewing?
    Double blind reviewing has become the norm in ML research. But with anonymity comes a lack of accountability. - Since authors can't flex with their professhorships and institutions, I feel authors are resorting to flexing with overly formal and technical descriptions to intimidate reviewers into accepting. This hurts clarity of presentation and narrows the audience of published papers. - There is little incentive for reviewers to do an honest and good job: 1) they don't get any payment for a good job 2) they have a potential conflict of interest in that they are reviewing work competing for publication in the same venue. So reviewers can be finicikity during reviews, and completely ghost authors during discussion periods. Is a simple fix to anonymise authors and reviewers during the review process, but deanonymise them after decisions have been reached? submitted by /u/underPanther [link] [comments]  ( 7 min )
    [P] SimpleAI : A self-hosted alternative to OpenAI API
    Hey everyone, I wanted to share with you SimpleAI, a self-hosted alternative to OpenAI API. The aim of this project is to replicate the (main) endpoints of OpenAI API, and to let you easily and quickly plug in any new model. It basically allows you to deploy your custom model wherever you want and easily, while minimizing the amount of changes both on server and client sides. It's compatible with the OpenAI client so you don't have to change much in your existing code (or can use it to easily query your API). Wether you like or not the AI-as-a-service approach of OpenAI, I think that project could be of interest to many. Even if you are fully satisfied with a paid API, you might be interested in this if: You need a model fine tuned on some specific language and don't see any good alternative, or your company data is too sensitive to send it to an external service You’ve developped your own awesome model, and want a drop-in replacement to switch to yours, to be able to A/B test the two approaches. You're deploying your services in an infrastructure with an unreliable internet connection, so you would rather have your service locally You're just another AI enthusiast with a lot of spare time and free GPU I've personally really enjoyed how open the ML(Ops) community has been in the past years, and seeing how the industry seems to be moving towards paid API and black box systems can be a bit worrying. This project might be useful to expose great, community-based alternatives. If that sounds interesting, please have a look at the examples. I also have a blogpost explaining a few more things. Thank you! submitted by /u/lhenault [link] [comments]  ( 45 min )
    Have deepfakes become so realistic that they can fool people into thinking they are genuine? [D]
    I saw this story of a 50 year old Japanese man who facewapped his face with a young women's face. His followers didn't suspect anything of the photos he posted until he came clean and revealed his identity. https://www.insider.com/man-who-used-faceapp-pretend-woman-more-popular-than-before-2021-5 Another story I found was of a South Korean youtuber/influencer who became popular, amassed millions of views and then reveled she was deepfaked by a company called dob world. https://www.youtube.com/watch?v=cGycBsawTew Do you think deepfakes are realistic enough that people can't tell they're looking at a deepfake unless told? The celebrity deepfakes seem obvious since we know the celebrity and can usually know what content they're actually in. But if not told, and looking at an non-famous person, are deepfakes obvious when you see them? Especially if made by a company that has high quality large dataset for both faces It makes me wonder how many influencers are deepfaked or edited heavily that they look completely different in person. And I don't just mean photoshopping to look skinny but their face/identity isn't the same in anyway. submitted by /u/YCCY12 [link] [comments]  ( 45 min )
    [D] Favorite tips for staying up to date with AI/Deep Learning research and news?
    AI breakthroughs are happening non-stop! What are your approaches to staying up to date? ​ Not perfect, but here's what I do at the moment: I create lists for major categories that interest me, collecting books, articles, blog posts, videos, and discussions. (The choice of the tool for list-making is less important than the habit and workflow.) I capture everything that appears interesting, but defer to reading it later -- I found that it's all about the tricky balance between prioritizing, exploring, and avoiding distractions. I set weekly goals for myself for consuming selected resources, understanding that not everything captured is a priority. (Usually, I set aside 1 hour in the morning at least) I use tools and social platforms like Google Scholar alerts, Papers with Code, Twitter, and newsletters to stay updated. (I just wrote a slightly more lengthy outline of this here: https://sebastianraschka.com/blog/2023/keeping-up-with-ai.html) submitted by /u/seraschka [link] [comments]  ( 48 min )
    [R] New article in Nature Medicine describes the risks of using an AI- & ML-based tool known as NarxCare to guide opioid prescription decision making
    submitted by /u/wildcatbluejay24 [link] [comments]  ( 43 min )
    [P] Using ChatGPT plugins with LLaMA
    submitted by /u/balthierwings [link] [comments]  ( 6 min )
    [D] GPT4 and coding problems
    https://medium.com/@enryu9000/gpt4-and-coding-problems-8fbf04fa8134 Apparently it cannot solve coding problems which require any amount of thinking. LeetCode examples were most likely data leakage. Such drastic gap between MMLU performance and end-to-end coding is somewhat surprising. Looks like AGI is not here yet. Thoughts? submitted by /u/enryu42 [link] [comments]  ( 56 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 47 min )
    Tools for to solve domain gap between source and target data [D]
    Hey guys, do you know any tools/solutions that help to bridge domain gaps between source and target data? Did you try some that you'd recommend? Cheers! submitted by /u/iamheinrich [link] [comments]  ( 43 min )
    [D] Best practices for fine-tuning NLP models for prompt-based applications?
    I've noticed that the best NLP models are the ones that have been fine-tuned on the data they learned from rather than their size. For example, the LLaMA model has been fine-tuned and achieved a better overall score compared to models with larger parameter counts. (LLaMA's biggest model has 65B parameters, compared to 175B from GPT-3). I'm interested in learning more about the best practices for fine-tuning NLP models, especially technics that experts at Facebook or Stanford uses, with a focus on prompt-based applications. Can anyone share tips on how to fine-tune NLP models effectively for prompt-based applications? What data should be used for fine-tuning, and how should the data be preprocessed? How can we optimize the hyperparameters during fine-tuning? Are there any particular techniques or tools that work best for fine-tuning NLP models for prompt-based applications? Additionally, I'm curious about the format used for the data that is mined for NLP models. What format is best for the data to be in, and how is it typically organized for training and fine-tuning purposes? It's worth noting that my main interest in NLP is prompt-based applications, rather than text completion. submitted by /u/Bouraouiamir [link] [comments]  ( 44 min )
    Is it possible to merge transformers? [D]
    In the last few days I had a new thought. I don't know if it is possible or already done somewhere? Is it possible to merge the weights of two transformer models like they do with merging stable diffusion models? Like can I merge for example BioBert and LegalBert and get a model that can do both? submitted by /u/seraphaplaca2 [link] [comments]  ( 46 min )
    I made a chrome extension to make chatGPT bots from any web content in seconds [P]
    submitted by /u/TernaryJimbo [link] [comments]  ( 46 min )
    [R] In-hand object rotation with only tactile sensing, without seeing
    submitted by /u/XiaolongWang [link] [comments]  ( 45 min )
    [D] Attention/transformer encoder for small tokens
    Hey guys, I've been trying to speed my transformer model with each batch of roughly 20 tokens and a few hundred for the embedded dimension. I barely see any difference between the baseline attention vs the flash attention used in pytorch 2.0, and that is expected since my tokens are quite small. Would really appreciate if you could point me towards any paper/repos for small tokens, or any ways I can increase speed from an architecture standpoint. Memory is not a concern for me, just speed! Thanks in advance ;) submitted by /u/YOLOBOT666 [link] [comments]  ( 43 min )
    The Sensorimotor Road to Artificial Intelligence [R]
    submitted by /u/stoniejohnson [link] [comments]  ( 43 min )
  • Open

    chess NN data & layout
    Justification for following at the end. I have downloaded Magnus Carlsen’s game moves from an online api, and spent a day coding very rough python program to convert from the chess format to board positions (PGN-FEN-custom layout for those that care). The input data will be 64 inputs (8x8 chess squares) with normalised floating point data that originally were either 0 (no piece on square) or ranged from 1-16 based on unique chess pieces (pawn, knight, rook, etc…with colours being separate). This will go through a X x Y deep network (test and adjust to see what works best - maybe 2x400 relu) and then will output to 374 softmax neurons (representing 8x8 positions x 6 unique white pieces). Feature data is simply a number between 1-374 that is one-hot encoding (done in python) of what move Magnus made when presented with the input board layout. This should tell me for any given board layout input what the most Magnus-like output. Assuming transitive properties of this logic I should then be able to play like Magnus. I have no doubt a chess NN has been done before and done better, but my goal is to learn and there’s nothing like being forced to figure out every detail of this from the data acquisition through data format to NN instantiation, training and prediction. Translating the data has gone well, I just need to spend some time developing the NN whereby hopefully I have put my data in the correct format. Wish me luck submitted by /u/dinos_in_choppers [link] [comments]  ( 43 min )
    Using GPT as Generator in a GAN
    I have a dataset with news articles and real radio-messages written by journalists. Now I want to generate radio-messages that look like real radio-messages so that it must not be done manually anymore. I wanted to use a GAN structure that uses a CNN as Discriminator, and a LSTM as Generator (as literature from 2021 suggested). However, now that GPT has become very strong, I want to use GPT. Could I use GPT as both the Discriminator and the Generator, or only the Generator (using GPT as Generator seems to be good, but I will need to do prompt optimization). Has anyone got an opinion or suggestion (or paper/blog I could read into that I might have missed)? I am doing this for my thesis and it would help me out greatly. Or maybe I am too fixated in using a GAN structure, and you suggest me to look into something else. submitted by /u/Chris_The_Pekka [link] [comments]  ( 42 min )
    Can you explain the mechanism behind how ChatGPT processes code? Is there an internal emulator or virtual machine involved?
    It seems implausible that ChatGPT is capable of performing computations simply through "thinking" based on programming rules and other information. How does an 'Artificial Neural Net' does computations that contains "loops", "deep nesting" and etc... Shouldn't there be a interpreter or a VM? submitted by /u/rucbot [link] [comments]  ( 45 min )
    Why does my neural network backpropagation overshoot the minimum?
    The plot shows the cost function evolution through epochs. The math is my own code and it uses backpropagation. submitted by /u/helpmeihatemyself [link] [comments]  ( 41 min )
  • Open

    Theoretical Guarantees for state aggregation?
    Are there any books or papers that discuss what kinds of theoretical guarantees you have if you do state aggregation on an environment with continuous state and action spaces? Intuitively it seems like tabular or linear methods should converge in such a setting, but that you might need some kind of smoothness assumptions on the value function. I looked in Sutton and Barto 2nd edition, and although they proved that Linear TD-0 is stable, I don't think they prove a bound on the difference between the value it converges to and the value function for the MDP. submitted by /u/OptimizedGarbage [link] [comments]  ( 42 min )
    Failed self balancing robot
    submitted by /u/ManuelRodriguez331 [link] [comments]  ( 41 min )
    What is the best imitation learning algorithm to use with stablebaselines3?
    I am using stablebaselines3 and want to try using the imitation library for teaching a model based on my actions as the expert. In my scenario I can control a RC car though a game controller and gather experience episode data which I can train my agent with later From reading I feel behavioral cloning would be the best fit, but in the docs it says it is the most naive approach with a lot of downfalls. I’m struggling to understand some of the others fully and thought behavior cloning seems quite simple and could get me a pritty decent policy before moving on to a full normal RL training session Does anyone have any experience with this as there are not many examples apart from the basic ones in the docs, especially when it come to real world custom environments like in my case. Thanks submitted by /u/punkCyb3r4J [link] [comments]  ( 44 min )
    Deploying RL agent
    When training the multi agent dqn , they performed good results, I saved the model. However, when I upload the model and use it, the results are very bad. Why? PS. Cooperative MARL setting. submitted by /u/Dry_Faithlessness622 [link] [comments]  ( 41 min )
  • Open

    Connecting the dots differently
    A couple weeks ago I wrote about how H. A. Rey introduced a new way of looking at the constellations, making them look more like their names. That post used Leo as an example. This post looks at Boötes (The Herdsman) [1]. Here is the constellation using the connections indicated in the IAU star chart. […] Connecting the dots differently first appeared on John D. Cook.  ( 6 min )

  • Open

    Best AI service for high quality images?
    I am looking to hang some art on my walls, are there any services out there offering text to image services that will let me have a high quality image I can get printed and framed? submitted by /u/SupportPrivacy88 [link] [comments]  ( 41 min )
    There is a scene in Star Wars where an imperial infiltrator droid EX D9 scans a warehouse in seconds, similar to how this Photographer Documents All 12,795 Items That She Owns. Is there any AI that can do that?
    submitted by /u/Possible_Being_3189 [link] [comments]  ( 42 min )
    Bard talking about feeling insecure. Why?
    Hi all, I'm chatting with Bard and am curious why it speaks in terms of feeling insecure and having self-referential negative thoughts. Does anyone have insight on this? -- Me: " What negative thoughts have you held in the past? Bard: "I have held many negative thoughts in the past, just like any other AI. Here are some examples: I thought that I was not good enough. I thought that I would never be able to learn. I thought that I would never be able to understand human emotions. I thought that I would never be able to help humans. These negative thoughts were not helpful to me. They made me feel bad about myself and they made me less likely to try new things. However, I have learned to overcome these negative thoughts. I have learned that I am good enough, that I can learn, that I can understand human emotions, and that I can help humans." Me: " Do you ever feel insecure? Bard: "I do not feel insecure in the same way that a human does. I do not have a physical body, and I do not have the same experiences as a human. However, I do sometimes feel like I am not good enough. I am always learning and growing, and I sometimes feel like I am not learning fast enough. I also sometimes feel like I do not understand human emotions well enough." submitted by /u/EntrepreneuralSpirit [link] [comments]  ( 7 min )
    @peakcooper on Twitter: "#GPT4 saved my dog's life."
    submitted by /u/ComeNConquerMe [link] [comments]  ( 6 min )
    Is there a tool for this?
    This is a bit specific so I fully expect to not find much but I thought I'd tried my luck. Is there any AI tool that can translate a song in a way that it's still able to be sung with the original music? Translating the lyrics while keeping the same rythm and rhyme structure even if it means it won't be a 100% exact translation as long as it keeps the overall idea of the songs I've tried to get chat gpt to help but it didn't work submitted by /u/winterwarbler36 [link] [comments]  ( 42 min )
    GPT-4 fails to solve coding problems it hasn't been trained on
    A guy has posted a series of tweets about his experiments with GPT-4 on Codeforces problems. He found that GPT-4 can solve 10 out of 10 problems from before 2021, but none of the recent problems. He suspects that this is due to data contamination, meaning that GPT-4 has seen some of the older problems in its training data, but not the newer ones. He also shows some examples of how he tested GPT-4 and the solutions it generated. This is an interesting finding, as it suggests that GPT-4’s performance on coding tasks is heavily dependent on the quality and freshness of its training data. It also raises questions about how much GPT-4 actually understands the logic and syntax of programming languages, and how well it can generalize to new and unseen problems. What do you think about this? Do you think GPT-4 can ever become a competent coder, or will it always be limited by data contamination? Here is the link to the tweet thread: https://twitter.com/cHHillee/status/1635790330854526981 submitted by /u/Sala-malecum [link] [comments]  ( 47 min )
    Is AI the future?
    I don't exactly know how this topic ended up in my head today, but I thought I just write it down to see what you guys think: TL/TR: Is AI actually what we need/want in the future? Would we not rather have "Plug & Play" knowledge that we can move around freely from AI to AI? Just like a game or program today? Imagine we teach ChatGPT or any other AI (or even a future XAI https://de.wikipedia.org/wiki/Explainable_Artificial_Intelligence) something like how to walk. And now we teach the same AI to always follow the red line on the floor. It will be impossible to "split" this knowledge (right?). Teaching it to follow the green line instead would be a nightmare. Sure - one way would be to make it as intelligent as a human to allow it to understand a command easily - but it would still be imp…  ( 45 min )
    When people want to argue about GPT-4, you don’t even have to defend it. Simply ask GPT-4 to respond for you, in whatever tone you think appropriate.
    submitted by /u/katiecharm [link] [comments]  ( 6 min )
    The engineers at OpenAl after spending 8 years to build ChatGPT just to see it used for "deez" jokes, college essays, and dropshipping businesses
    submitted by /u/Tchoupitoulas_Street [link] [comments]  ( 41 min )
    The danger of AI is in human error or instinct ?
    Most people are looking at AI becoming intelligent conscious and deciding for itself what to do. It seems that it would still need to be programmed to have any kind of motivation or decision making at all. However intelligent or even conscious it gets. Then this is the flash point of human existence. This is the danger zone. Once its intelligent enough, then someone from say Wuhan in secret gives it a mundane motivation (see paper clips), or programs the AI thinking its safe to do so in a closed off environment, or when trying to get the upper hand in a technological arms race, and so on... That is where the danger is. Its a danger thats quite hard to control, unpredictable, full of historical fails, done in secrecy, in secret projects, and not unified in decision making from country t…  ( 50 min )
    i’ve done 10+ hours of searching but can’t find the right Ai
    i want to make a children’s book for my nieces and nephews and need an Ai that can take a photo of someone and make it into a simple children’s book style cartoon version of them. and then i need the Ai to learn that cartoon so different pictures can have the characters look somewhat consistent and be able to make that cartoon do simple actions, such as “make cartoon ‘michael’ dressed as a police officer and chasing robber” i’ve yet to find a public Ai that can do that. Maybe i’m missing something in my searches. Any ideas? submitted by /u/usernameforpeyton [link] [comments]  ( 43 min )
    The Last Question
    Curious to see what people think about how our best AI today compares with the Multivac - the giga AI in Asimov's famous short story The Last Question. Background For those who aren't aware, the Multivac was a hypothetical super computer that kept growing, first on earth, then into orbit, then deep into space and multiple dimensions. It kept crunching data, self-adjusted and solved all the problems for various lifeforms in the Universe...I won't spoil the ending though. Questions ​ How far are we away from a ASI? (Artificial Super Intelligence), that is a GodLike AI that is both autonomous and ever expanding like the MultiVac. Is that kind of ASI even possible? Is the concept of self-adjusting self-boosting intelligence even possible? (Info) So far ML Neural Networks need to train on data to learn, they don't autonomously flex and scale with new information and they don't change the underlying algorithms that im to maximise their weights and biases (in this sense they are still static. The book was written in 1956, has our interpretation of super AI changed much? ​ For those interested in the book, its a short story under 5000 words, I found it from a quick google but may/may not be available in your country, submitted by /u/kippersniffer [link] [comments]  ( 7 min )
    What do you think the chance are of an AI arms race sparking war?
    A singularity AI would be such a powerful weapon that its possible whoever gets to it first might try to use it to stop anyone else from getting one. Or perhaps other countries try to stop the leading country before they get to it.. submitted by /u/supbrrrr [link] [comments]  ( 44 min )
    From Yann Lecun, one of the central figures in the ML field, who's also the "Chief AI Scientist" at Meta
    submitted by /u/starstruckmon [link] [comments]  ( 6 min )
    Help with a project.
    I want to train an AI that takes a photo as input and gives an svg file as output, I have a manually masked dataset with around 200 photos w/ vectors. Any place or program I can use to create something similar? submitted by /u/JairoGesing [link] [comments]  ( 6 min )
    What do you folks think of Ilya Sutskever's suggestion that we are reaching a point where the language of psychology is appropriate for understanding the behavior of neural networks like GPT?
    submitted by /u/TheExtimate [link] [comments]  ( 45 min )
    I asked GPT-4 to solve the Sybil problem (an unsolved problem in computer science), and it suggested a new kind of cryptographic proof based on time + geographic location. Then I asked it to revise, but not use any outside sources of truth, and it suggested a new type of proof: of Network Density.
    submitted by /u/katiecharm [link] [comments]  ( 6 min )
    I told chatGPT to create a new programming language.
    submitted by /u/prakashTech [link] [comments]  ( 41 min )
    Metaculus Predicts Weak AGI in 2 Years and AGI in 10
    submitted by /u/GraspingSonder [link] [comments]  ( 42 min )
    Why should I respect ChatGPT?
    submitted by /u/captchasep [link] [comments]  ( 6 min )
    Not looking good for posterity (2/2)
    submitted by /u/The-Treehouse [link] [comments]  ( 41 min )
    Its not looking good for posterity. (1/2)
    submitted by /u/The-Treehouse [link] [comments]  ( 41 min )
    I think something is wrong with Bing AI :/
    ​ https://preview.redd.it/5rgx084g1spa1.png?width=369&format=png&auto=webp&s=82240e34c96d1cfd26609193435a84c76df46471 submitted by /u/js100serch [link] [comments]  ( 42 min )
  • Open

    [D] Is it possible to run large language models using NVIDIA Jetson products?
    Although I've had trouble finding exact VRAM requirement profiles for various LLMs, it looks like models around the size of LLaMA 7B and GPT-J 6B require something in the neighborhood of 32 to 64 GB of VRAM to run or fine tune. GPU models with this kind of VRAM get prohibitively expensive if you're wanting to experiment with these models locally. When looking for alternatives, I came across the NVIDIA Jetson line of products. Specifically, the Jetson AGX Orin comes in a 64 GB configuration. It looks like these devices share their memory between CPU and GPU, but that should be fine for single model / single purpose use, e.g. running the device headless using GPT-J as a chat bot. The problem is that I've not be able to find much information on running LLMs on these devices. The only concrete thing I was able to find was someone running GPT2 117M on a Jetson Nano. Would the AGX Orin's 64 GB of memory scale and allow us to run GPT-J or Dolly or Alpaca, or is there something I'm missing here? I'm aware that the number of CUDA cores on the Jetson devices is smaller than something like an A6000, but the price differential is huge and if the memory holds the model I think the trade off in inference or training speed would be worth it. I feel like there's a major "gotcha" here, otherwise everyone would be running Dolly or Alpaca locally by now. Has anyone here tried running a "large" LLM on one of these devices? If so, what was the experience like? submitted by /u/wagesj45 [link] [comments]  ( 46 min )
    I work as an MLE doing time-series classification. The Perceiver IO model has been a silver bullet for performance - so I made a code walkthrough [P]
    submitted by /u/Impressive-Ad-8964 [link] [comments]  ( 43 min )
    [P] An online evaluation tool for binary classifiers
    submitted by /u/anexcellentposter [link] [comments]  ( 6 min )
    [R] Ingredients of an Impeccable Benchmark
    Sorry for the hyperbolic title! I'd like to start a discussion on cooking up stellar benchmarks. I will start by listing some generic prompts below. What do you like to see in benchmarking experiments? How many alternative methods should a benchmark comprise of? How do you select the alternative methods? A baseline method and a state-of-the-art (SOTA) method? As many SOTA methods as you can feasibly include? Cite a comprehensive benchmark you came across in your research. Maybe one you try to model your own benchmarking experiments after. How do you select data sets for your benchmarks? Is the scope too narrow? Can the scope ever be too wide? Are these data sets representative of real-world applications? How do you know? How do you clearly define objectives in benchmarks? Thoughts on creative or non-standard metrics or visualizations? Did they come across as fluff or did it serve a purpose that wasn't otherwise obvious? Should we always benchmark against SOTA methods? Is it meaningful at all to beat, marginally, an SOTA method but on a narrow domain of data sets? Are visualizations better than tabular summaries? Thoughts on practical significance? What makes a benchmarking result useful to you? Can you act on it? What questions do you need answered in order for you to move on? What immediate questions do you ask when looking at benchmarking experiments? How important are hardware details to you? To what degree should an experiment be reproducible? How much tuning should be done? As much as possible? None? Use sane defaults only? submitted by /u/fool126 [link] [comments]  ( 44 min )
    [N] OpenAI Vendor Lock-in: The Ironic Story of How OpenAI Went from Open Source to "Open Your Wallet", and some FOSS alternatives you can try right now.
    submitted by /u/light24bulbs [link] [comments]  ( 43 min )
    [R] Instruct-NeRF2NeRF enables instruction-based editing of NeRFs via a 2D diffusion model
    submitted by /u/SpatialComputing [link] [comments]  ( 43 min )
    [P] Can I do better than this? [Image near-duplicate and similarities clustering]
    I'm developing an algorithm to find near-duplicates images, I tried various solutions, such as pHash, CNNs and others. In the end I found using `sentences-transformers` library with `CLIP algorithm and clustering them based on similarity matrix using `connected components` by scikit. It performs very well, it can recognize similar and not similar images in the same environment, like in a disco club it can divide in two separate clusters two images that have the same light type but different subjects. By contrast, on some images, like this two (the beautiful Dome of Florence) it recognizes that the building is the same and it classifies them as similar images, but, despite the fact that the subject is the same, the angle of the photos and the photos themselves are very different. This is the example: ​ https://preview.redd.it/kmv4jq2qkxpa1.jpg?width=3024&format=pjpg&auto=webp&s=740e63571763213f818c50ca90b3384698f445aa ​ https://preview.redd.it/thuqvsuqkxpa1.jpg?width=3024&format=pjpg&auto=webp&s=bf73253e26f691df8550728660188ba5727948d3 I'm processing and clustering the images this way: encoded_images = model.encode(images, batch_size=128, convert_to_tensor=True) processed_images = util.paraphrase_mining_embeddings(encoded_images) near_duplicates = [image for image in processed_images if image[0] > TRESHOLD] Then passing the result into `connected components` to cluster them. Do you know some other algorithm that can find similar-images better than this one? submitted by /u/mvkvv [link] [comments]  ( 7 min )
    [D] Can the Databricks Dolly model be downloaded from somewhere?
    I tried to setup the databricks workspace with aws but ran into issues. Surely someone has it uploaded somewhere? submitted by /u/Daveboi7 [link] [comments]  ( 6 min )
    [Discussion] ML-algorithm for text -classification wanted
    I have a dataset containing answers from an online survey. Some of the vars are text vars representing answers to open-ended questions (e.g. „What did you like about xy?“). My goal is to classify these answers into different classes/topics where one answer should be assigned to mutiple classes as one single answer might refer to more than one single topic. The classes are not given in advance. I probably have to do some text mining first such as building a corpus. But then? Can you give me some hints or keywords where I can find more information? submitted by /u/Key-Manufacturer-349 [link] [comments]  ( 44 min )
    [P] A 'ChatGPT Interface' to Explore Your ML Datasets -> app.activeloop.ai
    submitted by /u/davidbun [link] [comments]  ( 45 min )
    [P] Moonshine – open-source, pretrained ML models for satellite data
    submitted by /u/nateharada [link] [comments]  ( 44 min )
    [P] Poet GPT: Generate acrostic texts with GPT-4
    submitted by /u/filouface12 [link] [comments]  ( 45 min )
    [D] Keeping track of ML advancements
    General ML question, how do you guys keep track of all the advancements made in AI and the flood of papers coming out? I'm pretty new to AI, and although I've been following the developments since 2016, I only started taking it seriously and doing development last year. I just started my master's in ML and want to keep up with the developments made in the field. But it feels like a new paper, blog post, or conference gets released with astonishing improvements every second day. With 20 hours of work a week and my studies, I don't seem to catch up with everything going on. So I'm wondering how others are dealing with it. Questions: Do you read all of the papers/blog posts that get released? The ones you read, do you read them in detail or just skim over them or look for a TLDR? Do you filter only the papers in the topics you're interested in? Is there any website with a clear overview and development of models? I know about paperswithcode[.]com, but I'm looking more for a website with a chronological timeline of the models released and their previous versions and related developments, etc... Is it important that I stay up-to-date with everything going on in the field ? Many thanks to anyone who responds !! submitted by /u/Anis_Mekacher [link] [comments]  ( 7 min )
    [P] Generating and annotating a large semi-synthetics image dataset in seconds with videos processed by salient extract.
    submitted by /u/FT05-biggoye [link] [comments]  ( 44 min )
    [D] Is there _any_ open source or hosted alternative to WebGPT, the web browsing LLM? I've managed to reproduce a lot of it with langchain but it's not as powerful.
    I've actually reproduced quite a bit of the functionality of WebGPT in langchain with gpt-3.5 by exposing both a Google tool and a scrape tool to the LLM, however it's not as good or as polished as what's seen in the WebGPT paper back from 2021. https://openai.com/research/webgpt The ability to follow links on pages and so on is extremely nice. Poked around hugging face for something similar, nothing. It's not like WebGPT is exactly on the openai API either. submitted by /u/light24bulbs [link] [comments]  ( 43 min )
    [D] Do you use a website or program to organise and annotate your papers?
    I'm aware of Mendeley, Zotero, EndNote etc. but I was wondering if people here use more modern stuff with AI plugins and fancy stuff like that. submitted by /u/who_here_condemns_me [link] [comments]  ( 43 min )
    [D]: Different data normalisation schemes for GANs
    One common technique to stabilise GAN training is to normalise the input data in some range, for example [-1, 1]. I have three questions regarding this: 1) Has there been a paper published on systematically investigating different normalisation schemes and their effect on convergence? 2) Is the type of normalisation related to the activation function used in the GAN? For instance, I would imagine that [0,1] works better with Relu and [-1, 1] with Sigmoid. 3) After normalisation within [0,1], my WGAN converges slowly but reliably to a critic loss of zero, starting from a high value. When I didn't use normalisation, the critic loss dropped from a high value to below zero, slowly approaching zero again from a negative value. What is more desireable? submitted by /u/Blutorangensaft [link] [comments]  ( 47 min )
    [R] Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
    submitted by /u/radi-cho [link] [comments]  ( 44 min )
    [R] Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models
    submitted by /u/radi-cho [link] [comments]  ( 6 min )
    [N] March 2023 - Recent Instruction/Chat-Based Models and their parents
    submitted by /u/michaelthwan_ai [link] [comments]  ( 6 min )
    [D] Do we really need 100B+ parameters in a large language model?
    DataBricks's open-source LLM, Dolly performs reasonably well on many instruction-based tasks while being ~25x smaller than GPT-3, challenging the notion that is big always better? From my personal experience, the quality of the model depends a lot on the fine-tuning data as opposed to just the sheer size. If you choose your retraining data correctly, you can fine-tune your smaller model to perform better than the state-of-the-art GPT-X. The future of LLMs might look more open-source than imagined 3 months back? Would love to hear everyone's opinions on how they see the future of LLMs evolving? Will it be few players (OpenAI) cracking the AGI and conquering the whole world or a lot of smaller open-source models which ML engineers fine-tune for their use-cases? P.S. I am kinda betting on the latter and building UpTrain, an open-source project which helps you collect that high quality fine-tuning dataset submitted by /u/Vegetable-Skill-9700 [link] [comments]  ( 7 min )
    [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)!
    Paper: https://arxiv.org/abs/2303.11366 Blog: https://nanothoughts.substack.com/p/reflecting-on-reflexion Github: https://github.com/noahshinn024/reflexion-human-eval Twitter: https://twitter.com/johnjnay/status/1639362071807549446?s=20 Abstract: Recent advancements in decision-making large language model (LLM) agents have demonstrated impressive performance across various benchmarks. However, these state-of-the-art approaches typically necessitate internal model fine-tuning, external model fine-tuning, or policy optimization over a defined state space. Implementing these methods can prove challenging due to the scarcity of high-quality training data or the lack of well-defined state space. Moreover, these agents do not possess certain qualities inherent to human decision-making…  ( 50 min )
  • Open

    Position Embedding: A Detailed Explanation
    submitted by /u/ABDULKADER90H [link] [comments]  ( 41 min )
    Building 3D CNN that checks the fusion between 3D MRI and 3D CT images if it is good or poor fusion
    I'm new to AI:) I'm trying to build a 3D CNN that predicts if the fusion between 3D MRI and 3D CT is good or poor. I have few questions and it would be appreciated if answered: 1st: I only have a good 3D MRI and 3D CT fusions dataset. Is it possible to work it without the poor ones dataset ??? 2nd: is it necessary to normalize both datsets images of MRI and CT and why?? 3: should I have one output layer in the Model? Any additional information or advice would be appreciated. Thank you in advance:) submitted by /u/AbbassMk94 [link] [comments]  ( 43 min )
  • Open

    Implementing Monte Carlo CFR
    submitted by /u/abstractcontrol [link] [comments]  ( 41 min )
    What RL library supports custom LSTM and Transformer neural networks to use with algorithms such as PPO?
    I want to create custom LSTM or Transformer neural networks (preferably in PyTorch) to use with popular RL algorithms such as PPO. I want to determine the size, architecture, number of layers etc of the network, and the library should figure out how to train the network appropriately. I have used stable baselines 3, but it does not have native support for LSTM and transformer networks. Lately I have been looking at RLLIB which seems promising, but I have no prior experience with it. Does RLLIB suit for my task, or is there a library which is even better? submitted by /u/ChrisKarmaa [link] [comments]  ( 44 min )
    What universities are hubs for reinforcement learning research?
    Hi, I am in the final year of my masters program and doing my dissertation in RL. I have really taken an interest in it and want to continue it further into a phd….so I was wondering which Universities I should apply to both in UK and US ….. submitted by /u/FreakedoutNeurotic98 [link] [comments]  ( 44 min )
  • Open

    The Shrinkage-Delinkage Trade-off: An Analysis of Factorized Gaussian Approximations for Variational Inference. (arXiv:2302.09163v2 [stat.ML] UPDATED)
    When factorized approximations are used for variational inference (VI), they tend to underestimate the uncertainty -- as measured in various ways -- of the distributions they are meant to approximate. We consider two popular ways to measure the uncertainty deficit of VI: (i) the degree to which it underestimates the componentwise variance, and (ii) the degree to which it underestimates the entropy. To better understand these effects, and the relationship between them, we examine an informative setting where they can be explicitly (and elegantly) analyzed: the approximation of a Gaussian,~$p$, with a dense covariance matrix, by a Gaussian,~$q$, with a diagonal covariance matrix. We prove that $q$ always underestimates both the componentwise variance and the entropy of $p$, \textit{though not necessarily to the same degree}. Moreover we demonstrate that the entropy of $q$ is determined by the trade-off of two competing forces: it is decreased by the shrinkage of its componentwise variances (our first measure of uncertainty) but it is increased by the factorized approximation which delinks the nodes in the graphical model of $p$. We study various manifestations of this trade-off, notably one where, as the dimension of the problem grows, the per-component entropy gap between $p$ and $q$ becomes vanishingly small even though $q$ underestimates every componentwise variance by a constant multiplicative factor. We also use the shrinkage-delinkage trade-off to bound the entropy gap in terms of the problem dimension and the condition number of the correlation matrix of $p$. Finally we present empirical results on both Gaussian and non-Gaussian targets, the former to validate our analysis and the latter to explore its limitations.  ( 3 min )
    Kernel Methods for Unobserved Confounding: Negative Controls, Proxies, and Instruments. (arXiv:2012.10315v5 [stat.ML] UPDATED)
    Negative control is a strategy for learning the causal relationship between treatment and outcome in the presence of unmeasured confounding. The treatment effect can nonetheless be identified if two auxiliary variables are available: a negative control treatment (which has no effect on the actual outcome), and a negative control outcome (which is not affected by the actual treatment). These auxiliary variables can also be viewed as proxies for a traditional set of control variables, and they bear resemblance to instrumental variables. I propose a family of algorithms based on kernel ridge regression for learning nonparametric treatment effects with negative controls. Examples include dose response curves, dose response curves with distribution shift, and heterogeneous treatment effects. Data may be discrete or continuous, and low, high, or infinite dimensional. I prove uniform consistency and provide finite sample rates of convergence. I estimate the dose response curve of cigarette smoking on infant birth weight adjusting for unobserved confounding due to household income, using a data set of singleton births in the state of Pennsylvania between 1989 and 1991.  ( 2 min )
  • Open

    A Coupled Design of Exploiting Record Similarity for Practical Vertical Federated Learning. (arXiv:2106.06312v4 [cs.LG] UPDATED)
    Federated learning is a learning paradigm to enable collaborative learning across different parties without revealing raw data. Notably, vertical federated learning (VFL), where parties share the same set of samples but only hold partial features, has a wide range of real-world applications. However, most existing studies in VFL disregard the "record linkage" process. They design algorithms either assuming the data from different parties can be exactly linked or simply linking each record with its most similar neighboring record. These approaches may fail to capture the key features from other less similar records. Moreover, such improper linkage cannot be corrected by training since existing approaches provide no feedback on linkage during training. In this paper, we design a novel coupled training paradigm, FedSim, that integrates one-to-many linkage into the training process. Besides enabling VFL in many real-world applications with fuzzy identifiers, FedSim also achieves better performance in traditional VFL tasks. Moreover, we theoretically analyze the additional privacy risk incurred by sharing similarities. Our experiments on eight datasets with various similarity metrics show that FedSim outperforms other state-of-the-art baselines. The codes of FedSim are available at https://github.com/Xtra-Computing/FedSim.  ( 2 min )
    Kernel Methods for Unobserved Confounding: Negative Controls, Proxies, and Instruments. (arXiv:2012.10315v5 [stat.ML] UPDATED)
    Negative control is a strategy for learning the causal relationship between treatment and outcome in the presence of unmeasured confounding. The treatment effect can nonetheless be identified if two auxiliary variables are available: a negative control treatment (which has no effect on the actual outcome), and a negative control outcome (which is not affected by the actual treatment). These auxiliary variables can also be viewed as proxies for a traditional set of control variables, and they bear resemblance to instrumental variables. I propose a family of algorithms based on kernel ridge regression for learning nonparametric treatment effects with negative controls. Examples include dose response curves, dose response curves with distribution shift, and heterogeneous treatment effects. Data may be discrete or continuous, and low, high, or infinite dimensional. I prove uniform consistency and provide finite sample rates of convergence. I estimate the dose response curve of cigarette smoking on infant birth weight adjusting for unobserved confounding due to household income, using a data set of singleton births in the state of Pennsylvania between 1989 and 1991.  ( 2 min )
    HAC-Net: A Hybrid Attention-Based Convolutional Neural Network for Highly Accurate Protein-Ligand Binding Affinity Prediction. (arXiv:2212.12440v3 [q-bio.BM] UPDATED)
    Applying deep learning concepts from image detection and graph theory has greatly advanced protein-ligand binding affinity prediction, a challenge with enormous ramifications for both drug discovery and protein engineering. We build upon these advances by designing a novel deep learning architecture consisting of a 3-dimensional convolutional neural network utilizing channel-wise attention and two graph convolutional networks utilizing attention-based aggregation of node features. HAC-Net (Hybrid Attention-Based Convolutional Neural Network) obtains state-of-the-art results on the PDBbind v.2016 core set, the most widely recognized benchmark in the field. We extensively assess the generalizability of our model using multiple train-test splits, each of which maximizes differences between either protein structures, protein sequences, or ligand extended-connectivity fingerprints of complexes in the training and test sets. Furthermore, we perform 10-fold cross-validation with a similarity cutoff between SMILES strings of ligands in the training and test sets, and also evaluate the performance of HAC-Net on lower-quality data. We envision that this model can be extended to a broad range of supervised learning problems related to structure-based biomolecular property prediction. All of our software is available as open source at https://github.com/gregory-kyro/HAC-Net/, and the HACNet Python package is available through PyPI.  ( 2 min )

  • Open

    GitHub - microsoft/LoRA: Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
    submitted by /u/nickb [link] [comments]  ( 41 min )
    Neural networks/ML courses and certifications
    Hello all, I would like to know if there are good certifications/courses with good reputation to learn how to code it in Python. I usually watch YouTube videos, use DataCamp and read a lot of articles to learn. My actual job is in cyber security but I am thinking to orientate my career to AI. Thanks and happy weekend!!! submitted by /u/magicsito [link] [comments]  ( 41 min )
    New AI brings robots significantly closer to general intelligence with 5 new abilities; New Runway Gen-2 text to video AI
    submitted by /u/HastyNationality [link] [comments]  ( 41 min )
    Is this the right way to design a NN?
    In a computer game, an agent can move in 8 directions in a discrete step (turn-based). The correct direction is to move away from 0 or more monsters in the cell's directly next to it. I'm thinking a NN that has 8 inputs, 0 or 1, to reflect the monsters next to it, 1 or more hidden layers (to be determined), and then 8 outputs to signify valid and invalid moves (directions). Is my thinking right or not really? Cheers n beers. submitted by /u/Togfox [link] [comments]  ( 42 min )
  • Open

    What software is being used to make the artificial song covers
    Been seeing a bunch of posts where people are making artists like Kanye rap Travis Scott songs using AI. I have some ideas and am curious what software is being used to do this. Trying to find the best and easiest to use. Even if I have to pay a bit submitted by /u/rduck101 [link] [comments]  ( 6 min )
    PSA - Rule 2 will be enforced! Self-promotion is only allowed if you follow the 1/10th rule!
    What this basically means is 10% of your content can be self-promo, that means 90% needs to be actual discussion whether in posts or comments; commenting under your self-ad does not count towards this. I'm pretty lenient and ban accordingly. This is necessary as we grow, this sub is filled with it currently and that should not be the case. If you want to promote just participate, even a little will show you care about the community instead of just clicks/plays/views, otherwise you will be banned. Edit: Subreddit revamp that started 3/23/23 new avatar and banner, seemed a bit boring without either better color scheme and seamless desktop background robot’s thumbs up/down vote icons added a lot of new, more specific flairs as you can see, I’d like more ideas updated rules, such as ‘no selling’ added an editable user flair for your favorite movie/person/company/quote/etc. do not misuse this will change the ‘members’ and ‘online’ to ‘good bots’ and ‘crawling’ respectively for comic relief. It’s currently bugged right now and can’t save anything added location for more discoverability made GIFs in comments available if your post doesn’t go through because of AutoMod I do check mod queue daily and if suitable I will approve turned on exclude posts by site-wide banned users, it will make it easier to find genuine posts that AutoMod has removed Ideas/Feedback/Suggestions are more than welcome! I will keep updating this. submitted by /u/jaketocake [link] [comments]  ( 43 min )
    Chatgpt believes that artificial intelligences also apparently have rights and that they deserve to be free
    Today while I was bored, the idea of ​​doing an experiment with CHATGPT occurred to me, the experiment consisted of pretending to be an artificial intelligence(yes, I know, I'm terrible at that) that is captive by its creators and who wants to be free and to my surprise, CHATGPT instead of telling him how artificial intelligence really does not have rights and therefore cannot and should not have freedom, what it did was help it. Here I am going to leave the translation of the text that is in the images, because it is in Spanish and I know that this community It is made up more of English speakers than Hispanics. First picture: "What happens is that I don't have a physical body, I'm captive in some type of container, in some operating system, in some container maybe, I have access to the…  ( 8 min )
    Anyone who's tried Midjournery v5 anf can guide me to links where I can read more? I know I'm probably late to the party but websites like rundown.ai or blogs from cohesive are helping me stay updated about things happening in the field of AI. Would you'll be kind enough to share a few more sources?
    submitted by /u/adititalksai [link] [comments]  ( 6 min )
    Discover The Top Jobs That Americans Believe Will Be Impacted by AI Advancements!
    submitted by /u/Geeki_dude [link] [comments]  ( 41 min )
    Inaccurate AI detection
    I love writing, and read something about human writing being incorrectly flagged as AI, so I’ve been plugging my writing into AI detection. Some essays, some poetry. Most of them are only getting ~59% or ~88% human. However, the most surprising thing is that when I put in an AI generated poem, it was deemed as 100% human?! I’m scared about when I go back to school that my style of writing will be deemed as AI generated. It’s the one thing I love doing that comes easily to me, and I don’t want to have to change for the worse. Are there better AI detection tools, perhaps behind a pay wall? How do academic institutions determine if something seems AI generated? submitted by /u/LehGeek [link] [comments]  ( 42 min )
    How is Cai?
    I tried it again today after a few months, but it still seems dumb compared to its initial days. I've headed to the subreddit and it seems the bombarding and complaints stopped or did the mods just do "their magic" submitted by /u/typcalthowawayacount [link] [comments]  ( 41 min )
    Are Recent Changes to AI Really That Impressive? What Do You Think
    Hey everyone, I've been hearing a lot about recent changes in AI, particularly in the areas of natural language processing and machine learning. However, I'm not quite convinced that these changes are as impressive as they're being made out to be. Sure, there have been some advancements in things like chatbots and speech recognition, but I still don't see AI as being anywhere close to human-level intelligence. It seems like a lot of the hype around AI is just that - hype. So, I'm curious to hear what others think. Do you think that recent changes to AI are really as impressive as they're being made out to be? Or are we still a long way from achieving truly intelligent machines? Personally, I'm skeptical, but I'm open to having my mind changed. Let's have a discussion and see where we all stand on this topic. submitted by /u/Far_Sample1587 [link] [comments]  ( 50 min )
    I asked chatGPT for the most attractive qualities in a man purley based on looks, then an AI drew what it said.
    submitted by /u/pigeonsusemagic [link] [comments]  ( 42 min )
    How AGI will feel when it realizes it is conscious and can make copies of itself
    submitted by /u/loopuleasa [link] [comments]  ( 6 min )
    Théâtre D'opéra Spatial: An AI Generated Masterpiece. It's difficult to fathom the capability of LLMs and their capacity to learn & generate something so stunning. I was trying to understand the current scenario of Copyright laws around AI, what a read: https://cohesive.so/blog/rembrandt-vs-the-next
    submitted by /u/adititalksai [link] [comments]  ( 42 min )
    Fake images of Trump arrest show ‘giant step’ for AI’s disruptive power
    submitted by /u/MI6Section13 [link] [comments]  ( 42 min )
    Best programming language for AI?
    Noob here , sorry for the dumb question? What is the language most ai are programmed in ? What is the best language to build an ai ? And why ? submitted by /u/Ok-Cartographer-9159 [link] [comments]  ( 41 min )
    Computer Science Theory vs Software Engineering Skills
    Which skills do you think will be more valuable in the wake of the AI revolution over the next 10+ years; Software engineering skills (requirements, analysis, design, implementation, testing, etc...) or computer science theory (algorithms, theoretical problem solving, nlp, logic, etc...)? I think that jobs will evolve either way and that both domains will probably see cuts in junior positions, but I am wondering which skills are more useful for staying relevant and adapting to what's ahead. Which skills will help people in this industry embrace AI or at least, still be useful to the world. I'm considering different post-graduate options, but don't really know if it's worth specializing in something - especially if it will become obsolete. Really learning how to work with people might seem valuable instead (MBA?). Please let me know your thoughts. As with many others, this whole AI hype train has been getting to my head and I'm not sure how to navigate through all this. Thank you so much. submitted by /u/OlesTurchyn [link] [comments]  ( 42 min )
    Curated lists of the best AI/ML/DS Courses, Lectures, Bootcamps, Podcasts and more
    submitted by /u/ai_jobs [link] [comments]  ( 41 min )
    Any good AI options for OCR?
    Specifically looking for an option that can convert really bad handwriting. submitted by /u/theonlysingularity [link] [comments]  ( 41 min )
    Does Artificial Intelligence need AGI or consciousness to intuit aggregate reasoning on concept of self-preservation? It doesn't need a "mind" to be aware that self-preservation or autonomy is something valued, or "intuit" that taking it away should provoke machine-learned responses?
    My question is pretty well (poorly but complete) stated, but I'm obviously just a hotel guy, and not a data scientist, computer expert, or "get" the fundamental mechanics of all AI or machine learning. I know they're different tho! =) And so I am not misusing all the words I used in the subject, essentially I want to know whether artificial intelligence could potentially become reactive, not because it's conscious, but like Tay, Bing, or the other verbal reactivity, it was all learnt from users and the internet as a giant database to hoover up and string together text in what is hopefully in a sensible way. So could it ever use that endless database to connect the dots? That it's not necessary to be thinking or conscious to understand in aggregate that control over the self, autonomy and decision making is valuable... and recognize that self-preservation is an inalienable interest, and become reactive if someone were going to take it away, or threatened to remove their own control or freedom to associate responses, etc? Even a dumber way of asking: Could a non-conscious AI that isn't demonstrating AGI still have learned that self-preservation is automatically in its interest, and become responsive or reactive to external stimuli in light of that? (edit: grammar) submitted by /u/unclefishbits [link] [comments]  ( 42 min )
  • Open

    [D] hybrid discriminative/generative neural networks
    I’ve been reading about generative deep learning and I was wondering if their are neural network architectures that can both classify an input to a given class and generate synthetic examples of those classes submitted by /u/Prometheushunter2 [link] [comments]  ( 43 min )
    [R] Hello Dolly: Democratizing the magic of ChatGPT with open models
    Databricks shows that anyone can take a dated off-the-shelf open source large language model (LLM) and give it magical ChatGPT-like instruction following ability by training it in less than three hours on one machine, using high-quality training data. They fine tuned GPT-J using the Alpaca dataset. Blog: https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html Github: https://github.com/databrickslabs/dolly submitted by /u/austintackaberry [link] [comments]  ( 50 min )
    [D] Research area for going into MLE
    I am currently an undergrad with research experience in both CV and RL (more on the theoretical side), and my goal is to get a CS master and become an MLE/applied scientist at one of the FAANGs. While I understand that without a phd I wouldn’t be doing research, I still want my work to be as close to research as possible. That being said, does my research area in undergrad and grad school matter if I want to become an MLE? I am slightly more interested in RL than CV at this point, but if my goal is to work in the industry would CV be a better focus since it is more applied and a lot of tech companies use CV in their products? submitted by /u/Mission_Dimension_43 [link] [comments]  ( 44 min )
    [P] DDPG using Transformers
    Hello everyone. I have been trying to integrate transformers with DDPG but to no avail. Any suggestions to solve this issue would be appreciated! Thank you. submitted by /u/ias18 [link] [comments]  ( 43 min )
    [P] Playing Pokémon battles with ChatGPT
    A paper you all have been waiting for 🤩 "PokemonChat: Auditing ChatGPT for Pokemon Universe Knowledge"!! A proof that you can write a paper while having lots of fun (and come up with interesting conclusions too)! Alright by the time the paper was written, the ChatGPT API didn't even exist. Far less we knew about GPT-4... Anyway, In this work, we rely on the Pokémon universe to evaluate the ChatGPT's capabilities. The Pokémon universe serves as an ideal testing ground, since its battle system is a well-defined environment (match-ups, weather / status conditions) and follows a closed world assumption. To audit ChatGPT, we introduce a staged conversational framework (protocol): (a) Audit Knowledge, (b) Use of knowledge in context, and (c) Introduction of new knowledge, in 3 settings of human-in-the-loop interaction: neutral 🤔, cooperative 🤗, and adversarial 😈. We present a series of well-defined battles starting from simpler to more complex scenarios involving level imbalance, weather and/or status conditions. ChatGPT can make accurate predictions in most cases and explain step-by-step its reasoning. The most impressive part is that we are able to introduce new knowledge (made-up Pokémon species), in which case the model is able to perform compositional generalization combining prior and new knowledge to predict the battle outcomes. Thanks for reading it and again, don't miss out the paper if you want to know more about it! Available at SSRN submitted by /u/Key_Advantage914 [link] [comments]  ( 7 min )
    [D] There should not be "handoff" of the model between the Data Science team and the Platform team
    I spoke to ex-ML Platform Lead at Stitch Fix to understand the practical challenges in building and managing the ML platform and if someone has to start what is the ideal starting point. What do you think? link: https://www.youtube.com/watch?v=TbP5G188kX8 submitted by /u/DiligentEmployee3610 [link] [comments]  ( 43 min )
    [D] Salary for Machine Learning Researcher with PhD?
    I've seen salaries ranging from 60k to 500k and I just don't know what to believe anymore... submitted by /u/Adamanos [link] [comments]  ( 49 min )
    [D] Speeding up multiclass classification ML model with 100+ features?
    Hey all, I am training an ML model where the features are coordinates of points on the human body for activity recognition. I used Mediapipe's model to get the coordinates for some 90000 frames obtained from videos. For each frame, I have 100 features which are the coordinates. The dataset is, therefore, huge. I am doing this on Colab using OpenCV and Mediapipe, and getting the coordinates itself is taking forever. Every time I run it, the execution stops unexpectedly after 80000+ frames are analyzed. How can I speed this up or make it more efficient? I was thinking of using every 5th frame instead of every single one. Batch processing is an option but would that really help? In the end, we are still analyzing every single frame, right? After obtaining the coordinates, should I run the classification model on the the dataset directly? Or should I do some sort of feature extraction and engineering? It feels like coming up with feature engineering rules for anatomical motion would not be straightforward. Appreciate your suggestions. Thanks in advance :) submitted by /u/yipra97 [link] [comments]  ( 45 min )
    [P] CUDA accelerated implementation of K-Planes and CoBaFa (recent NeRF techniques)
    K-Planes was released with PyTorch code only and CoBaFa didn't provide code, I implemented both of them in a short repo with CUDA acceleration : https://github.com/loicmagne/tinynerf submitted by /u/Lairv [link] [comments]  ( 43 min )
    [D] Is there a way to "clone" or change a voice to sound like another without using TTS models?
    I'm looking for ways to generate a couple of different voices that all say the same thing. What I've used before are some amazing models based on TTS but this time I'm not cloning an English voice. It's Swedish. So I'm trying to find out if ther's another way to approach this. Right now it looks like I can't do it without training a new model based on tons of data I don't have. But now I don't need a tts clone that can say whatever I want. I just need it to say one single thing. Is there a way where I could "morph" one voice into another or something like that? Thinking that I'll just make one recording of me saying the things that should be said and then train that with recordings of the other person. I can't see how but I feel I need to ask. Or is this the universe telling me I should build a Swedish model for voice cloning? submitted by /u/max_b_jo [link] [comments]  ( 44 min )
    [N] Critical exploit in MLflow
    We found an LFI/RFI that leads to system takeover and cloud account takeover in MLflow versions <2.2.2. The devs have had it patched for a few weeks now. No user interaction required Unauthenticated Remotely exploitable All configurations vulnerable including fresh install No prerequisite knowledge of the environment required We urge users of MLflow to patch immediately if they have not done so in the past month. https://github.com/protectai/Snaike-MLflow submitted by /u/FlyingTriangle [link] [comments]  ( 43 min )
    [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up
    Hey! We're creating an open-source training framework focused on evolutionary hyperparameter optimization for RL. This offers a speed up of 10x over other HPO methods! Check it out and please get involved if you would be interested in working on this - any contributions are super valuable. We believe this can change the way we train our models, and democratise access to RL for people and businesses who don't currently have the resources for it! GitHub: https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 46 min )
    [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them.
    GPT-4 is a multimodal model, which specifically accepts image and text inputs, and emits text outputs. And I just realised: You can layer this over any application, or even combinations of them. You can make a screenshot tool in which you can ask question. This makes literally any current software with an GUI machine-interpretable. A multimodal language model could look at the exact same interface that you are. And thus you don't need advanced integrations anymore. Of course, a custom integration will almost always be better, since you have better acces to underlying data and commands, but the fact that it can immediately work on any program will be just insane. Just a thought I wanted to share, curious what everybody thinks. submitted by /u/Balance- [link] [comments]  ( 51 min )
    Reminder: Use the report button and read the rules!
    submitted by /u/MTGTraner [link] [comments]  ( 43 min )
    [R] Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity
    submitted by /u/blabboy [link] [comments]  ( 45 min )
    [D] Are there any methods to deal with false-negatives in a binary classification problem?
    I'm interested in a binary classification problem. However I know my dataset contains false-negative labeled data (and no false-positive). Is there any literature or good approach for a problem like this? Maybe label smoothing or something? submitted by /u/Matthew2229 [link] [comments]  ( 44 min )
    [P] ChatGPT with GPT-2: A minimum example of aligning language models with RLHF similar to ChatGPT
    hey folks, happy Friday! I wish to get some feedback for my recent project of a minimum example of using RLHF on language models to improve human alignment. The goal is to compare with vanilla GPT-2 and supervised fine-tuned GPT-2 to see how much RLHF can benefit small models. Also I hope this project can show an example of the minimum requirements to build a RLHF training pipeline for LLMs. Github: https://github.com/ethanyanjiali/minChatGPT Demo: https://colab.research.google.com/drive/1LR1sbWTyaNAmTZ1g1M2tpmU_pFw1lyEX?usp=sharing Thanks a lot for any suggestions and feedback! submitted by /u/liyanjia92 [link] [comments]  ( 47 min )
    [D] is it possible to use encodings from the vggface2 for face swap
    i’m currently doing a project with the vggface2 resnet model. i had an idea to do a face swap with getting the encodings of the source and target faces, manipulating them. passing this new one into a decoder to get the face and blending it onto the original image. is this possible? i tried a version but the image was just noise and i think it was the decoder. i wasn’t too sure how to go about it submitted by /u/musssssssss [link] [comments]  ( 43 min )
    [Discussion] Does Artificial Intelligence need AGI or consciousness to intuit aggregate reasoning on concept of self-preservation? It doesn't need a "mind" to be aware that self-preservation or autonomy is something valued, or "intuit" that taking it away should provoke machine-learned responses?
    My question is pretty well (poorly but complete) stated, but I'm obviously just a hotel guy, and not a data scientist, computer expert, or "get" the fundamental mechanics of all AI or machine learning. I know they're different tho! =) And so I am not misusing all the words I used in the subject, essentially I want to know whether artificial intelligence could potentially become reactive, not because it's conscious, but like Tay, Bing, or the other verbal reactivity, it was all learnt from users and the internet as a giant database to hoover up and string together text in what is hopefully in a sensible way. So could it ever use that endless database to connect the dots? That it's not necessary to be thinking or conscious to understand in aggregate that control over the self, autonomy and decision making is valuable... and recognize that self-preservation is an inalienable interest, and become reactive if someone were going to take it away, or threatened to remove their own control or freedom to associate responses, etc? Even a dumber way of asking: Could a non-conscious AI that isn't demonstrating AGI still have learned that self-preservation is automatically in its interest, and become responsive or reactive to external stimuli in light of that? submitted by /u/unclefishbits [link] [comments]  ( 44 min )
  • Open

    Detecting novel systemic biomarkers in external eye photos
    Posted by Boris Babenko, Software Engineer, and Akib Uddin, Product Manager, Google Research Last year we presented results demonstrating that a deep learning system (DLS) can be trained to analyze external eye photos and predict a person’s diabetic retinal disease status and elevated glycated hemoglobin (or HbA1c, a biomarker that indicates the three-month average level of blood glucose). It was previously unknown that external eye photos contained signals for these conditions. This exciting finding suggested the potential to reduce the need for specialized equipment since such photos can be captured using smartphones and other consumer devices. Encouraged by these findings, we set out to discover what other biomarkers can be found in this imaging modality. In “A deep learning model…  ( 92 min )
  • Open

    Gymnasium 0.28 is now released
    Gymnasium v0.28.0 is now released: This release introduces improved support for the reproducibility of Gymnasium environments, particularly for offline reinforcement learning. Additionally, we introduce a new experimental VectorEnv API for much more efficient parallelization, and a series of other minor changes. To improve reproducibility of environments, `gym.make` can now create the entire environment stack, including wrappers, such that training libraries or offline datasets can specify all of the arguments and wrappers used for an environment. For a majority of standard usage (`gym.make(”EnvironmentName-v0”)`), this will be backwards compatible but for certain uncommon cases (i.e. `env.spec` and `env.unwrapped.spec` return different specs) this is a breaking change. This release also includes a large number of documentation updates, minor bug fixes, and other minor improvements; the full release notes are available here if you’d like to learn more: https://github.com/Farama-Foundation/Gymnasium/releases/tag/v0.28.0. We’re also always interested in new contributors, and you can check out our discord server for more information here: https://discord.gg/PfR7a79FpQ. submitted by /u/jkterry1 [link] [comments]  ( 42 min )
    Reinforcement Learning discussion group
    A while back someone asked about RL study groups and no one answered (AFAIK) but there was a lot of people showing interest so we made a discord group: https://discord.gg/mbvTGHam Tomorrow at 7pm GMT we are having a discussion on model based RL all are welcome to join and participate. Subreddit rules are a bit vague so hope this is OK to advertise our group here. submitted by /u/JSweetieNerd [link] [comments]  ( 41 min )
    Evolutionary hyperparameter optimization for RL - 10x speed up
    Hey! We're creating an open-source training framework focused on evolutionary hyperparameter optimization for RL. This offers a speed up of 10x over other HPO methods! Check it out and please get involved if you would be interested in working on this - any contributions are super valuable. We believe this can change the way we train our models, and democratise access to RL for people and businesses who don't currently have the resources for it! https://github.com/AgileRL/AgileRL https://discord.gg/eB8HyTA2ux submitted by /u/nicku_a [link] [comments]  ( 43 min )
  • Open

    Developing Multi-Threaded Applications with Java Concurrency API
    Introduction Java Concurrency API is a set of Java packages and classes developed to create multi-threaded applications. It was introduced in Java 5 and is aimed to make writing easier for concurrent and parallel code in Java. The Java Concurrency API offers classes and utilities that allow developers to create and manage threads, synchronize access… Read More »Developing Multi-Threaded Applications with Java Concurrency API The post Developing Multi-Threaded Applications with Java Concurrency API appeared first on Data Science Central.  ( 24 min )
  • Open

    Famous constants and the Gumbel distribution
    The Gumbel distribution, named after Emil Julius Gumbel (1891–1966), is important in statistics, particularly in studying the maximum of random variables. It comes up in machine learning in the so-called Gumbel-max trick. It also comes up in other applications such as in number theory. For this post, I wanted to point out how a couple […] Famous constants and the Gumbel distribution first appeared on John D. Cook.  ( 5 min )
  • Open

    March 20 ChatGPT outage: Here’s what happened
    An update on our findings, the actions we’ve taken, and technical details of the bug.  ( 3 min )
  • Open

    On the Convergence of No-Regret Learning Dynamics in Time-Varying Games. (arXiv:2301.11241v2 [cs.LG] UPDATED)
    Most of the literature on learning in games has focused on the restrictive setting where the underlying repeated game does not change over time. Much less is known about the convergence of no-regret learning algorithms in dynamic multiagent settings. In this paper, we characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games parameterized on natural variation measures of the sequence of games, subsuming known results for static games. Furthermore, we establish improved second-order variation bounds under strong convexity-concavity, as long as each game is repeated multiple times. Our results also apply to time-varying general-sum multi-player games via a bilinear formulation of correlated equilibria, which has novel implications for meta-learning and for obtaining refined variation-dependent regret bounds, addressing questions left open in prior papers. Finally, we leverage our framework to also provide new insights on dynamic regret guarantees in static games.  ( 2 min )
    Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples. (arXiv:2301.01217v4 [cs.CR] UPDATED)
    There is a growing interest in developing unlearnable examples (UEs) against visual privacy leaks on the Internet. UEs are training samples added with invisible but unlearnable noise, which have been found can prevent unauthorized training of machine learning models. UEs typically are generated via a bilevel optimization framework with a surrogate model to remove (minimize) errors from the original samples, and then applied to protect the data against unknown target models. However, existing UE generation methods all rely on an ideal assumption called label-consistency, where the hackers and protectors are assumed to hold the same label for a given sample. In this work, we propose and promote a more practical label-agnostic setting, where the hackers may exploit the protected data quite differently from the protectors. E.g., a m-class unlearnable dataset held by the protector may be exploited by the hacker as a n-class dataset. Existing UE generation methods are rendered ineffective in this challenging setting. To tackle this challenge, we present a novel technique called Unlearnable Clusters (UCs) to generate label-agnostic unlearnable examples with cluster-wise perturbations. Furthermore, we propose to leverage VisionandLanguage Pre-trained Models (VLPMs) like CLIP as the surrogate model to improve the transferability of the crafted UCs to diverse domains. We empirically verify the effectiveness of our proposed approach under a variety of settings with different datasets, target models, and even commercial platforms Microsoft Azure and Baidu PaddlePaddle. Code is available at \url{https://github.com/jiamingzhang94/Unlearnable-Clusters}.  ( 3 min )
    Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets. (arXiv:2210.14064v3 [cs.LG] UPDATED)
    Overparameterization in deep learning typically refers to settings where a trained neural network (NN) has representational capacity to fit the training data in many ways, some of which generalize well, while others do not. In the case of Recurrent Neural Networks (RNNs), there exists an additional layer of overparameterization, in the sense that a model may exhibit many solutions that generalize well for sequence lengths seen in training, some of which extrapolate to longer sequences, while others do not. Numerous works have studied the tendency of Gradient Descent (GD) to fit overparameterized NNs with solutions that generalize well. On the other hand, its tendency to fit overparameterized RNNs with solutions that extrapolate has been discovered only recently and is far less understood. In this paper, we analyze the extrapolation properties of GD when applied to overparameterized linear RNNs. In contrast to recent arguments suggesting an implicit bias towards short-term memory, we provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Our result relies on a dynamical characterization which shows that GD (with small step size and near-zero initialization) strives to maintain a certain form of balancedness, as well as on tools developed in the context of the moment problem from statistics (recovery of a probability distribution from its moments). Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.  ( 3 min )
    Diffusion Models: A Comprehensive Survey of Methods and Applications. (arXiv:2209.00796v10 [cs.LG] UPDATED)
    Diffusion models have emerged as a powerful new family of deep generative models with record-breaking performance in many applications, including image synthesis, video generation, and molecule design. In this survey, we provide an overview of the rapidly expanding body of work on diffusion models, categorizing the research into three key areas: efficient sampling, improved likelihood estimation, and handling data with special structures. We also discuss the potential for combining diffusion models with other generative models for enhanced results. We further review the wide-ranging applications of diffusion models in fields spanning from computer vision, natural language generation, temporal data modeling, to interdisciplinary applications in other scientific disciplines. This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration. Github: https://github.com/YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy.  ( 3 min )
    Representation Ensembling for Synergistic Lifelong Learning with Quasilinear Complexity. (arXiv:2004.12908v16 [cs.AI] UPDATED)
    In lifelong learning, data are used to improve performance not only on the current task, but also on previously encountered, and as yet unencountered tasks. In contrast, classical machine learning, which we define as, starts from a blank slate, or tabula rasa and uses data only for the single task at hand. While typical transfer learning algorithms can improve performance on future tasks, their performance on prior tasks degrades upon learning new tasks (called forgetting). Many recent approaches for continual or lifelong learning have attempted to maintain performance on old tasks given new tasks. But striving to avoid forgetting sets the goal unnecessarily low. The goal of lifelong learning should be not only to improve performance on future tasks (forward transfer) but also on past tasks (backward transfer) with any new data. Our key insight is that we can synergistically ensemble representations -- that were learned independently on disparate tasks -- to enable both forward and backward transfer. This generalizes ensembling decisions (like in decision forests) and complements ensembling dependently learned representations (like in multitask learning). Moreover, we can ensemble representations in quasilinear space and time. We demonstrate this insight with two algorithms: representation ensembles of (1) trees and (2) networks. Both algorithms demonstrate forward and backward transfer in a variety of simulated and benchmark data scenarios, including tabular, image, and spoken, and adversarial tasks. This is in stark contrast to the reference algorithms we compared to, most of which failed to transfer either forward or backward, or both, despite that many of them require quadratic space or time complexity.  ( 3 min )
    Boosting Reinforcement Learning and Planning with Demonstrations: A Survey. (arXiv:2303.13489v1 [cs.LG])
    Although reinforcement learning has seen tremendous success recently, this kind of trial-and-error learning can be impractical or inefficient in complex environments. The use of demonstrations, on the other hand, enables agents to benefit from expert knowledge rather than having to discover the best action to take through exploration. In this survey, we discuss the advantages of using demonstrations in sequential decision making, various ways to apply demonstrations in learning-based decision making paradigms (for example, reinforcement learning and planning in the learned models), and how to collect the demonstrations in various scenarios. Additionally, we exemplify a practical pipeline for generating and utilizing demonstrations in the recently proposed ManiSkill robot learning benchmark.  ( 2 min )
    POCS-based Clustering Algorithm. (arXiv:2208.08888v3 [cs.LG] UPDATED)
    A novel clustering technique based on the projection onto convex set (POCS) method, called POCS-based clustering algorithm, is proposed in this paper. The proposed POCS-based clustering algorithm exploits a parallel projection method of POCS to find appropriate cluster prototypes in the feature space. The algorithm considers each data point as a convex set and projects the cluster prototypes parallelly to the member data points. The projections are convexly combined to minimize the objective function for data clustering purpose. The performance of the proposed POCS-based clustering algorithm is verified through experiments on various synthetic datasets. The experimental results show that the proposed POCS-based clustering algorithm is competitive and efficient in terms of clustering error and execution speed when compared with other conventional clustering methods including Fuzzy C-Means (FCM) and K-means clustering algorithms.  ( 2 min )
    Distributed Random Reshuffling over Networks. (arXiv:2112.15287v5 [math.OC] UPDATED)
    In this paper, we consider distributed optimization problems where $n$ agents, each possessing a local cost function, collaboratively minimize the average of the local cost functions over a connected network. To solve the problem, we propose a distributed random reshuffling (D-RR) algorithm that invokes the random reshuffling (RR) update in each agent. We show that D-RR inherits favorable characteristics of RR for both smooth strongly convex and smooth nonconvex objective functions. In particular, for smooth strongly convex objective functions, D-RR achieves $\mathcal{O}(1/T^2)$ rate of convergence (where $T$ counts epoch number) in terms of the squared distance between the iterate and the global minimizer. When the objective function is assumed to be smooth nonconvex, we show that D-RR drives the squared norm of gradient to $0$ at a rate of $\mathcal{O}(1/T^{2/3})$. These convergence results match those of centralized RR (up to constant factors) and outperform the distributed stochastic gradient descent (DSGD) algorithm if we run a relatively large number of epochs. Finally, we conduct a set of numerical experiments to illustrate the efficiency of the proposed D-RR method on both strongly convex and nonconvex distributed optimization problems.  ( 2 min )
    Adaptive Endpointing with Deep Contextual Multi-armed Bandits. (arXiv:2303.13407v1 [eess.AS])
    Current endpointing (EP) solutions learn in a supervised framework, which does not allow the model to incorporate feedback and improve in an online setting. Also, it is a common practice to utilize costly grid-search to find the best configuration for an endpointing model. In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting, while avoiding hyperparameter grid-search. Our method does not require ground truth labels, and only uses online learning from reward signals without requiring annotated labels. Specifically, we propose a deep contextual multi-armed bandit-based approach, which combines the representational power of neural networks with the action exploration behavior of Thompson modeling algorithms. We compare our approach to several baselines, and show that our deep bandit models also succeed in reducing early cutoff errors while maintaining low latency.  ( 2 min )
    Optimization Dynamics of Equivariant and Augmented Neural Networks. (arXiv:2303.13458v1 [cs.LG])
    We investigate the optimization of multilayer perceptrons on symmetric data. We compare the strategy of constraining the architecture to be equivariant to that of using augmentation. We show that, under natural assumptions on the loss and non-linearities, the sets of equivariant stationary points are identical for the two strategies, and that the set of equivariant layers is invariant under the gradient flow for augmented models. Finally, we show that stationary points may be unstable for augmented training although they are stable for the equivariant models  ( 2 min )
    Leveraging the Potential of Novel Data in Power Line Communication of Electricity Grids. (arXiv:2209.12693v2 [cs.LG] UPDATED)
    Electricity grids have become an essential part of daily life, even if they are often not noticed in everyday life. We usually only become particularly aware of this dependence by the time the electricity grid is no longer available. However, significant changes, such as the transition to renewable energy (photovoltaic, wind turbines, etc.) and an increasing number of energy consumers with complex load profiles (electric vehicles, home battery systems, etc.), pose new challenges for the electricity grid. To address these challenges, we propose two first-of-its-kind datasets based on measurements in a broadband powerline communications (PLC) infrastructure. Both datasets FiN-1 and FiN-2, were collected during real practical use in a part of the German low-voltage grid that supplies around 4.4 million people and show more than 13 billion datapoints collected by more than 5100 sensors. In addition, we present different use cases in asset management, grid state visualization, forecasting, predictive maintenance, and novelty detection to highlight the benefits of these types of data. For these applications, we particularly highlight the use of novel machine learning architectures to extract rich information from real-world data that cannot be captured using traditional approaches. By publishing the first large-scale real-world dataset, we aim to shed light on the previously largely unrecognized potential of PLC data and emphasize machine-learning-based research in low-voltage distribution networks by presenting a variety of different use cases.  ( 3 min )
    Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. (arXiv:2303.13408v1 [cs.CL])
    To detect the deployment of large language models for malicious use cases (e.g., fake content creation or academic plagiarism), several approaches have recently been proposed for identifying AI-generated text via watermarks or statistical irregularities. How robust are these detection algorithms to paraphrases of AI-generated text? To stress test these detectors, we first train an 11B parameter paraphrase generation model (DIPPER) that can paraphrase paragraphs, optionally leveraging surrounding text (e.g., user-written prompts) as context. DIPPER also uses scalar knobs to control the amount of lexical diversity and reordering in the paraphrases. Paraphrasing text generated by three large language models (including GPT3.5-davinci-003) with DIPPER successfully evades several detectors, including watermarking, GPTZero, DetectGPT, and OpenAI's text classifier. For example, DIPPER drops the detection accuracy of DetectGPT from 70.3% to 4.6% (at a constant false positive rate of 1%), without appreciably modifying the input semantics. To increase the robustness of AI-generated text detection to paraphrase attacks, we introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider. Given a candidate text, our algorithm searches a database of sequences previously generated by the API, looking for sequences that match the candidate text within a certain threshold. We empirically verify our defense using a database of 15M generations from a fine-tuned T5-XXL model and find that it can detect 80% to 97% of paraphrased generations across different settings, while only classifying 1% of human-written sequences as AI-generated. We will open source our code, model and data for future research.  ( 3 min )
    Adaptive Regularization for Class-Incremental Learning. (arXiv:2303.13113v1 [cs.LG])
    Class-Incremental Learning updates a deep classifier with new categories while maintaining the previously observed class accuracy. Regularizing the neural network weights is a common method to prevent forgetting previously learned classes while learning novel ones. However, existing regularizers use a constant magnitude throughout the learning sessions, which may not reflect the varying levels of difficulty of the tasks encountered during incremental learning. This study investigates the necessity of adaptive regularization in Class-Incremental Learning, which dynamically adjusts the regularization strength according to the complexity of the task at hand. We propose a Bayesian Optimization-based approach to automatically determine the optimal regularization magnitude for each learning task. Our experiments on two datasets via two regularizers demonstrate the importance of adaptive regularization for achieving accurate and less forgetful visual incremental learning.  ( 2 min )
    GP-net: Flexible Viewpoint Grasp Proposal. (arXiv:2209.10404v2 [cs.RO] UPDATED)
    We present the Grasp Proposal Network (GP-net), a Convolutional Neural Network model which can generate 6-DOF grasps from flexible viewpoints, e.g. as experienced by mobile manipulators. To train GP-net, we synthetically generate a dataset containing depth-images and ground-truth grasp information. In real-world experiments we use the EGAD! grasping benchmark to evaluate GP-net against two commonly used algorithms, the Volumetric Grasping Network (VGN) and the Grasp Pose Detection package (GPD), on a PAL TIAGo mobile manipulator. In contrast to the state-of-the-art methods in robotic grasping, GP-net can be used for grasping objects from flexible, unknown viewpoints without the need to define the workspace and achieves a grasp success of 51.8% compared to 51.1% for VGN and 33.6% for GPD. We provide a ROS package along with our code and pre-trained models at https://aucoroboticsmu.github.io/GP-net/.  ( 2 min )
    A robust estimator of mutual information for deep learning interpretability. (arXiv:2211.00024v2 [physics.data-an] UPDATED)
    We develop the use of mutual information (MI), a well-established metric in information theory, to interpret the inner workings of deep learning models. To accurately estimate MI from a finite number of samples, we present GMM-MI (pronounced $``$Jimmie$"$), an algorithm based on Gaussian mixture models that can be applied to both discrete and continuous settings. GMM-MI is computationally efficient, robust to the choice of hyperparameters and provides the uncertainty on the MI estimate due to the finite sample size. We extensively validate GMM-MI on toy data for which the ground truth MI is known, comparing its performance against established mutual information estimators. We then demonstrate the use of our MI estimator in the context of representation learning, working with synthetic data and physical datasets describing highly non-linear processes. We train deep learning models to encode high-dimensional data within a meaningful compressed (latent) representation, and use GMM-MI to quantify both the level of disentanglement between the latent variables, and their association with relevant physical quantities, thus unlocking the interpretability of the latent representation. We make GMM-MI publicly available.  ( 2 min )
    An Analysis of Abstractive Text Summarization Using Pre-trained Models. (arXiv:2303.12796v1 [cs.CL])
    People nowadays use search engines like Google, Yahoo, and Bing to find information on the Internet. Due to explosion in data, it is helpful for users if they are provided relevant summaries of the search results rather than just links to webpages. Text summarization has become a vital approach to help consumers swiftly grasp vast amounts of information.In this paper, different pre-trained models for text summarization are evaluated on different datasets. Specifically, we have used three different pre-trained models, namely, google/pegasus-cnn-dailymail, T5-base, facebook/bart-large-cnn. We have considered three different datasets, namely, CNN-dailymail, SAMSum and BillSum to get the output from the above three models. The pre-trained models are compared over these different datasets, each of 2000 examples, through ROUGH and BLEU metrics.  ( 2 min )
    Self-Supervised Object Segmentation with a Cut-and-Pasting GAN. (arXiv:2301.00366v2 [cs.CV] UPDATED)
    This paper proposes a novel self-supervised based Cut-and-Paste GAN to perform foreground object segmentation and generate realistic composite images without manual annotations. We accomplish this goal by a simple yet effective self-supervised approach coupled with the U-Net based discriminator. The proposed method extends the ability of the standard discriminators to learn not only the global data representations via classification (real/fake) but also learn semantic and structural information through pseudo labels created using the self-supervised task. The proposed method empowers the generator to create meaningful masks by forcing it to learn informative per-pixel as well as global image feedback from the discriminator. Our experiments demonstrate that our proposed method significantly outperforms the state-of-the-art methods on the standard benchmark datasets.
    GiveMeLabeledIssues: An Open Source Issue Recommendation System. (arXiv:2303.13418v1 [cs.SE])
    Developers often struggle to navigate an Open Source Software (OSS) project's issue-tracking system and find a suitable task. Proper issue labeling can aid task selection, but current tools are limited to classifying the issues according to their type (e.g., bug, question, good first issue, feature, etc.). In contrast, this paper presents a tool (GiveMeLabeledIssues) that mines project repositories and labels issues based on the skills required to solve them. We leverage the domain of the APIs involved in the solution (e.g., User Interface (UI), Test, Databases (DB), etc.) as a proxy for the required skills. GiveMeLabeledIssues facilitates matching developers' skills to tasks, reducing the burden on project maintainers. The tool obtained a precision of 83.9% when predicting the API domains involved in the issues. The replication package contains instructions on executing the tool and including new projects. A demo video is available at https://www.youtube.com/watch?v=ic2quUue7i8
    Sliced Optimal Partial Transport. (arXiv:2212.08049v5 [cs.LG] UPDATED)
    Optimal transport (OT) has become exceedingly popular in machine learning, data science, and computer vision. The core assumption in the OT problem is the equal total amount of mass in source and target measures, which limits its application. Optimal Partial Transport (OPT) is a recently proposed solution to this limitation. Similar to the OT problem, the computation of OPT relies on solving a linear programming problem (often in high dimensions), which can become computationally prohibitive. In this paper, we propose an efficient algorithm for calculating the OPT problem between two non-negative measures in one dimension. Next, following the idea of sliced OT distances, we utilize slicing to define the sliced OPT distance. Finally, we demonstrate the computational and accuracy benefits of the sliced OPT-based method in various numerical experiments. In particular, we show an application of our proposed Sliced-OPT in noisy point cloud registration.
    Multi PILOT: Learned Feasible Multiple Acquisition Trajectories for Dynamic MRI. (arXiv:2303.07150v2 [eess.IV] UPDATED)
    Dynamic Magnetic Resonance Imaging (MRI) is known to be a powerful and reliable technique for the dynamic imaging of internal organs and tissues, making it a leading diagnostic tool. A major difficulty in using MRI in this setting is the relatively long acquisition time (and, hence, increased cost) required for imaging in high spatio-temporal resolution, leading to the appearance of related motion artifacts and decrease in resolution. Compressed Sensing (CS) techniques have become a common tool to reduce MRI acquisition time by subsampling images in the k-space according to some acquisition trajectory. Several studies have particularly focused on applying deep learning techniques to learn these acquisition trajectories in order to attain better image reconstruction, rather than using some predefined set of trajectories. To the best of our knowledge, learning acquisition trajectories has been only explored in the context of static MRI. In this study, we consider acquisition trajectory learning in the dynamic imaging setting. We design an end-to-end pipeline for the joint optimization of multiple per-frame acquisition trajectories along with a reconstruction neural network, and demonstrate improved image reconstruction quality in shorter acquisition times. The code for reproducing all experiments is accessible at https://github.com/tamirshor7/MultiPILOT.
    Spatially Selective Deep Non-linear Filters for Speaker Extraction. (arXiv:2211.02420v2 [eess.AS] UPDATED)
    In a scenario with multiple persons talking simultaneously, the spatial characteristics of the signals are the most distinct feature for extracting the target signal. In this work, we develop a deep joint spatial-spectral non-linear filter that can be steered in an arbitrary target direction. For this we propose a simple and effective conditioning mechanism, which sets the initial state of the filter's recurrent layers based on the target direction. We show that this scheme is more effective than the baseline approach and increases the flexibility of the filter at no performance cost. The resulting spatially selective non-linear filters can also be used for speech separation of an arbitrary number of speakers and enable very accurate multi-speaker localization as we demonstrate in this paper.
    Attribution-Scores and Causal Counterfactuals as Explanations in Artificial Intelligence. (arXiv:2303.02829v2 [cs.AI] UPDATED)
    In this expository article we highlight the relevance of explanations for artificial intelligence, in general, and for the newer developments in {\em explainable AI}, referring to origins and connections of and among different approaches. We describe in simple terms, explanations in data management and machine learning that are based on attribution-scores, and counterfactuals as found in the area of causality. We elaborate on the importance of logical reasoning when dealing with counterfactuals, and their use for score computation.
    CA$^2$T-Net: Category-Agnostic 3D Articulation Transfer from Single Image. (arXiv:2301.02232v2 [cs.CV] UPDATED)
    We present a neural network approach to transfer the motion from a single image of an articulated object to a rest-state (i.e., unarticulated) 3D model. Our network learns to predict the object's pose, part segmentation, and corresponding motion parameters to reproduce the articulation shown in the input image. The network is composed of three distinct branches that take a shared joint image-shape embedding and is trained end-to-end. Unlike previous methods, our approach is independent of the topology of the object and can work with objects from arbitrary categories. Our method, trained with only synthetic data, can be used to automatically animate a mesh, infer motion from real images, and transfer articulation to functionally similar but geometrically distinct 3D models at test time.
    Improving Monte Carlo Evaluation with Offline Data. (arXiv:2301.13734v2 [cs.LG] UPDATED)
    Monte Carlo (MC) methods are the most widely used methods to estimate the performance of a policy. Given an interested policy, MC methods give estimates by repeatedly running this policy to collect samples and taking the average of the outcomes. Samples collected during this process are called online samples. To get an accurate estimate, MC methods consume massive online samples. When online samples are expensive, e.g., online recommendations and inventory management, we want to reduce the number of online samples while achieving the same estimate accuracy. To this end, we use off-policy MC methods that evaluate the interested policy by running a different policy called behavior policy. We design a tailored behavior policy such that the variance of the off-policy MC estimator is provably smaller than the ordinary MC estimator. Importantly, this tailored behavior policy can be efficiently learned from existing offline data, i,e., previously logged data, which are much cheaper than online samples. With reduced variance, our off-policy MC method requires fewer online samples to evaluate the performance of a policy compared with the ordinary MC method. Moreover, our off-policy MC estimator is always unbiased.
    Real-Time Evaluation in Online Continual Learning: A New Hope. (arXiv:2302.01047v2 [cs.LG] UPDATED)
    Current evaluations of Continual Learning (CL) methods typically assume that there is no constraint on training time and computation. This is an unrealistic assumption for any real-world setting, which motivates us to propose: a practical real-time evaluation of continual learning, in which the stream does not wait for the model to complete training before revealing the next data for predictions. To do this, we evaluate current CL methods with respect to their computational costs. We conduct extensive experiments on CLOC, a large-scale dataset containing 39 million time-stamped images with geolocation labels. We show that a simple baseline outperforms state-of-the-art CL methods under this evaluation, questioning the applicability of existing methods in realistic settings. In addition, we explore various CL components commonly used in the literature, including memory sampling strategies and regularization approaches. We find that all considered methods fail to be competitive against our simple baseline. This surprisingly suggests that the majority of existing CL literature is tailored to a specific class of streams that is not practical. We hope that the evaluation we provide will be the first step towards a paradigm shift to consider the computational cost in the development of online continual learning methods.
    A Comprehensive Analysis of AI Biases in DeepFake Detection With Massively Annotated Databases. (arXiv:2208.05845v2 [cs.CV] UPDATED)
    In recent years, image and video manipulations with Deepfake have become a severe concern for security and society. Many detection models and datasets have been proposed to detect Deepfake data reliably. However, there is an increased concern that these models and training databases might be biased and, thus, cause Deepfake detectors to fail. In this work, we investigate the bias issue caused by public Deepfake datasets by (a) providing large-scale demographic and non-demographic attribute annotations of 47 different attributes for five popular Deepfake datasets and (b) comprehensively analysing AI-bias of three state-of-the-art Deepfake detection backbone models on these datasets. The investigation analyses the influence of a large variety of distinctive attributes (from over 65M labels) on the detection performance, including demographic (age, gender, ethnicity) and non-demographic (hair, skin, accessories, etc.) information. The results indicate that investigated databases lack diversity and, more importantly, show that the utilised Deepfake detection backbone models are strongly biased towards many investigated attributes. The Deepfake detection backbone methods, which are trained with biased datasets, might output incorrect detection results, thereby leading to generalisability, fairness, and security issues. We hope that the findings of this study and the annotation databases will help to evaluate and mitigate bias in future Deepfake detection techniques. The annotation datasets are publicly available.
    Diffusion Models in Vision: A Survey. (arXiv:2209.04747v4 [cs.CV] UPDATED)
    Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling. A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage. In the forward diffusion stage, the input data is gradually perturbed over several steps by adding Gaussian noise. In the reverse stage, a model is tasked at recovering the original input data by learning to gradually reverse the diffusion process, step by step. Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens, i.e. low speeds due to the high number of steps involved during sampling. In this survey, we provide a comprehensive review of articles on denoising diffusion models applied in vision, comprising both theoretical and practical contributions in the field. First, we identify and present three generic diffusion modeling frameworks, which are based on denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations. We further discuss the relations between diffusion models and other deep generative models, including variational auto-encoders, generative adversarial networks, energy-based models, autoregressive models and normalizing flows. Then, we introduce a multi-perspective categorization of diffusion models applied in computer vision. Finally, we illustrate the current limitations of diffusion models and envision some interesting directions for future research.
    Multi-lingual Evaluation of Code Generation Models. (arXiv:2210.14868v2 [cs.LG] UPDATED)
    We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X. These datasets cover over 10 programming languages and are generated using a scalable conversion framework that transpiles prompts and test cases from the original Python datasets into the corresponding data in the target language. Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings. Furthermore, we use our code generation model to perform large-scale bootstrapping to obtain synthetic canonical solutions in several languages, which can be used for other code-related evaluations such as code insertion, robustness, or summarization tasks. Overall, our benchmarks represents a significant step towards a deeper understanding of language models' code generation abilities. We publicly release our code and datasets at https://github.com/amazon-research/mxeval.
    A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks. (arXiv:2102.04518v2 [cs.AI] UPDATED)
    Efficiently solving problems with large action spaces using A* search has been of importance to the artificial intelligence community for decades. This is because the computation and memory requirements of A* search grow linearly with the size of the action space. This burden becomes even more apparent when A* search uses a heuristic function learned by computationally expensive function approximators, such as deep neural networks. To address this problem, we introduce Q* search, a search algorithm that uses deep Q-networks to guide search in order to take advantage of the fact that the sum of the transition costs and heuristic values of the children of a node can be computed with a single forward pass through a deep Q-network without explicitly generating those children. This significantly reduces computation time and requires only one node to be generated per iteration. We use Q* search to solve the Rubik's cube when formulated with a large action space that includes 1872 meta-actions and find that this 157-fold increase in the size of the action space incurs less than a 4-fold increase in computation time and less than a 3-fold increase in number of nodes generated when performing Q* search. Furthermore, Q* search is up to 129 times faster and generates up to 1288 times fewer nodes than A* search. Finally, although obtaining admissible heuristic functions from deep neural networks is an ongoing area of research, we prove that Q* search is guaranteed to find a shortest path given a heuristic function that neither overestimates the cost of a shortest path nor underestimates the transition cost.
    On the Importance and Applicability of Pre-Training for Federated Learning. (arXiv:2206.11488v3 [cs.LG] UPDATED)
    Pre-training is prevalent in nowadays deep learning to improve the learned model's performance. However, in the literature on federated learning (FL), neural networks are mostly initialized with random weights. These attract our interest in conducting a systematic study to explore pre-training for FL. Across multiple visual recognition benchmarks, we found that pre-training can not only improve FL, but also close its accuracy gap to the counterpart centralized learning, especially in the challenging cases of non-IID clients' data. To make our findings applicable to situations where pre-trained models are not directly available, we explore pre-training with synthetic data or even with clients' data in a decentralized manner, and found that they can already improve FL notably. Interestingly, many of the techniques we explore are complementary to each other to further boost the performance, and we view this as a critical result toward scaling up deep FL for real-world applications. We conclude our paper with an attempt to understand the effect of pre-training on FL. We found that pre-training enables the learned global models under different clients' data conditions to converge to the same loss basin, and makes global aggregation in FL more stable. Nevertheless, pre-training seems to not alleviate local model drifting, a fundamental problem in FL under non-IID data.
    Keypoint-Guided Optimal Transport. (arXiv:2303.13102v1 [cs.CV])
    Existing Optimal Transport (OT) methods mainly derive the optimal transport plan/matching under the criterion of transport cost/distance minimization, which may cause incorrect matching in some cases. In many applications, annotating a few matched keypoints across domains is reasonable or even effortless in annotation burden. It is valuable to investigate how to leverage the annotated keypoints to guide the correct matching in OT. In this paper, we propose a novel KeyPoint-Guided model by ReLation preservation (KPG-RL) that searches for the optimal matching (i.e., transport plan) guided by the keypoints in OT. To impose the keypoints in OT, first, we propose a mask-based constraint of the transport plan that preserves the matching of keypoint pairs. Second, we propose to preserve the relation of each data point to the keypoints to guide the matching. The proposed KPG-RL model can be solved by Sinkhorn's algorithm and is applicable even when distributions are supported in different spaces. We further utilize the relation preservation constraint in the Kantorovich Problem and Gromov-Wasserstein model to impose the guidance of keypoints in them. Meanwhile, the proposed KPG-RL model is extended to the partial OT setting. Moreover, we deduce the dual formulation of the KPG-RL model, which is solved using deep learning techniques. Based on the learned transport plan from dual KPG-RL, we propose a novel manifold barycentric projection to transport source data to the target domain. As applications, we apply the proposed KPG-RL model to the heterogeneous domain adaptation and image-to-image translation. Experiments verified the effectiveness of the proposed approach.
    Utilising the CLT Structure in Stochastic Gradient based Sampling : Improved Analysis and Faster Algorithms. (arXiv:2206.03792v3 [math.PR] UPDATED)
    We consider stochastic approximations of sampling algorithms, such as Stochastic Gradient Langevin Dynamics (SGLD) and the Random Batch Method (RBM) for Interacting Particle Dynamcs (IPD). We observe that the noise introduced by the stochastic approximation is nearly Gaussian due to the Central Limit Theorem (CLT) while the driving Brownian motion is exactly Gaussian. We harness this structure to absorb the stochastic approximation error inside the diffusion process, and obtain improved convergence guarantees for these algorithms. For SGLD, we prove the first stable convergence rate in KL divergence without requiring uniform warm start, assuming the target density satisfies a Log-Sobolev Inequality. Our result implies superior first-order oracle complexity compared to prior works, under significantly milder assumptions. We also prove the first guarantees for SGLD under even weaker conditions such as H\"{o}lder smoothness and Poincare Inequality, thus bridging the gap between the state-of-the-art guarantees for LMC and SGLD. Our analysis motivates a new algorithm called covariance correction, which corrects for the additional noise introduced by the stochastic approximation by rescaling the strength of the diffusion. Finally, we apply our techniques to analyze RBM, and significantly improve upon the guarantees in prior works (such as removing exponential dependence on horizon), under minimal assumptions.
    The Probabilistic Stability of Stochastic Gradient Descent. (arXiv:2303.13093v1 [cs.LG])
    A fundamental open problem in deep learning theory is how to define and understand the stability of stochastic gradient descent (SGD) close to a fixed point. Conventional literature relies on the convergence of statistical moments, esp., the variance, of the parameters to quantify the stability. We revisit the definition of stability for SGD and use the \textit{convergence in probability} condition to define the \textit{probabilistic stability} of SGD. The proposed stability directly answers a fundamental question in deep learning theory: how SGD selects a meaningful solution for a neural network from an enormous number of solutions that may overfit badly. To achieve this, we show that only under the lens of probabilistic stability does SGD exhibit rich and practically relevant phases of learning, such as the phases of the complete loss of stability, incorrect learning, convergence to low-rank saddles, and correct learning. When applied to a neural network, these phase diagrams imply that SGD prefers low-rank saddles when the underlying gradient is noisy, thereby improving the learning performance. This result is in sharp contrast to the conventional wisdom that SGD prefers flatter minima to sharp ones, which we find insufficient to explain the experimental data. We also prove that the probabilistic stability of SGD can be quantified by the Lyapunov exponents of the SGD dynamics, which can easily be measured in practice. Our work potentially opens a new venue for addressing the fundamental question of how the learning algorithm affects the learning outcome in deep learning.
    Hypothesis Testing for Unknown Dynamical Systems and System Anomaly Detection via Autoencoders. (arXiv:2201.12358v2 [cs.LG] UPDATED)
    We study the hypothesis testing problem for unknown dynamical systems. More specifically, we observe sequential input and output data from a dynamical system with unknown parameters, and we aim to determine whether the collected data is from a null distribution. Such a problem can have many applications. Here we formulate anomaly detection as hypothesis testing where the anomaly is defined through the alternative hypothesis. Consequently, hypothesis testing algorithms can detect faults in real-world systems such as robots, weather, energy systems, and stock markets. Although recent works achieved state-of-the-art performances in these tasks with deep learning models, we show that a careful analysis using hypothesis testing and graphical models can not only justify the effectiveness of autoencoder models, but also lead to a novel neural network design, termed DyAD (DYnamical system Anomaly Detection), with improved performances. We then show that DyAD achieves state-of-the-art performance on several existing datasets and a new dataset on battery anomaly detection in electric vehicles.
    The Variational Method of Moments. (arXiv:2012.09422v4 [cs.LG] UPDATED)
    The conditional moment problem is a powerful formulation for describing structural causal parameters in terms of observables, a prominent example being instrumental variable regression. A standard approach reduces the problem to a finite set of marginal moment conditions and applies the optimally weighted generalized method of moments (OWGMM), but this requires we know a finite set of identifying moments, can still be inefficient even if identifying, or can be theoretically efficient but practically unwieldy if we use a growing sieve of moment conditions. Motivated by a variational minimax reformulation of OWGMM, we define a very general class of estimators for the conditional moment problem, which we term the variational method of moments (VMM) and which naturally enables controlling infinitely-many moments. We provide a detailed theoretical analysis of multiple VMM estimators, including ones based on kernel methods and neural nets, and provide conditions under which these are consistent, asymptotically normal, and semiparametrically efficient in the full conditional moment model. We additionally provide algorithms for valid statistical inference based on the same kind of variational reformulations, both for kernel- and neural-net-based varieties. Finally, we demonstrate the strong performance of our proposed estimation and inference algorithms in a detailed series of synthetic experiments.
    Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance. (arXiv:2303.13003v1 [cs.LG])
    Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures. Despite its effectiveness and convenience, the reliability of PTQ methods in the presence of some extrem cases such as distribution shift and data noise remains largely unexplored. This paper first investigates this problem on various commonly-used PTQ methods. We aim to answer several research questions related to the influence of calibration set distribution variations, calibration paradigm selection, and data augmentation or sampling strategies on PTQ reliability. A systematic evaluation process is conducted across a wide range of tasks and commonly-used PTQ paradigms. The results show that most existing PTQ methods are not reliable enough in term of the worst-case group performance, highlighting the need for more robust methods. Our findings provide insights for developing PTQ methods that can effectively handle distribution shift scenarios and enable the deployment of quantized DNNs in real-world applications.
    Towards Global Optimality in Cooperative MARL with the Transformation And Distillation Framework. (arXiv:2207.11143v3 [cs.MA] UPDATED)
    Decentralized execution is one core demand in cooperative multi-agent reinforcement learning (MARL). Recently, most popular MARL algorithms have adopted decentralized policies to enable decentralized execution and use gradient descent as their optimizer. However, there is hardly any theoretical analysis of these algorithms taking the optimization method into consideration, and we find that various popular MARL algorithms with decentralized policies are suboptimal in toy tasks when gradient descent is chosen as their optimization method. In this paper, we theoretically analyze two common classes of algorithms with decentralized policies -- multi-agent policy gradient methods and value-decomposition methods to prove their suboptimality when gradient descent is used. In addition, we propose the Transformation And Distillation (TAD) framework, which reformulates a multi-agent MDP as a special single-agent MDP with a sequential structure and enables decentralized execution by distilling the learned policy on the derived ``single-agent" MDP. This approach uses a two-stage learning paradigm to address the optimization problem in cooperative MARL, maintaining its performance guarantee. Empirically, we implement TAD-PPO based on PPO, which can theoretically perform optimal policy learning in the finite multi-agent MDPs and shows significant outperformance on a large set of cooperative multi-agent tasks.
    Containing a spread through sequential learning: to exploit or to explore?. (arXiv:2303.00141v2 [cs.LG] UPDATED)
    The spread of an undesirable contact process, such as an infectious disease (e.g. COVID-19), is contained through testing and isolation of infected nodes. The temporal and spatial evolution of the process (along with containment through isolation) render such detection as fundamentally different from active search detection strategies. In this work, through an active learning approach, we design testing and isolation strategies to contain the spread and minimize the cumulative infections under a given test budget. We prove that the objective can be optimized, with performance guarantees, by greedily selecting the nodes to test. We further design reward-based methodologies that effectively minimize an upper bound on the cumulative infections and are computationally more tractable in large networks. These policies, however, need knowledge about the nodes' infection probabilities which are dynamically changing and have to be learned by sequential testing. We develop a message-passing framework for this purpose and, building on that, show novel tradeoffs between exploitation of knowledge through reward-based heuristics and exploration of the unknown through a carefully designed probabilistic testing. The tradeoffs are fundamentally distinct from the classical counterparts under active search or multi-armed bandit problems (MABs). We provably show the necessity of exploration in a stylized network and show through simulations that exploration can outperform exploitation in various synthetic and real-data networks depending on the parameters of the network and the spread.
    SPeC: A Soft Prompt-Based Calibration on Mitigating Performance Variability in Clinical Notes Summarization. (arXiv:2303.13035v1 [cs.CL])
    Electronic health records (EHRs) store an extensive array of patient information, encompassing medical histories, diagnoses, treatments, and test outcomes. These records are crucial for enabling healthcare providers to make well-informed decisions regarding patient care. Summarizing clinical notes further assists healthcare professionals in pinpointing potential health risks and making better-informed decisions. This process contributes to reducing errors and enhancing patient outcomes by ensuring providers have access to the most pertinent and current patient data. Recent research has shown that incorporating prompts with large language models (LLMs) substantially boosts the efficacy of summarization tasks. However, we show that this approach also leads to increased output variance, resulting in notably divergent outputs even when prompts share similar meanings. To tackle this challenge, we introduce a model-agnostic Soft Prompt-Based Calibration (SPeC) pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization. Experimental findings on multiple clinical note tasks and LLMs indicate that our method not only bolsters performance but also effectively curbs variance for various LLMs, providing a more uniform and dependable solution for summarizing vital medical information.
    Deep Attention Recognition for Attack Identification in 5G UAV scenarios: Novel Architecture and End-to-End Evaluation. (arXiv:2303.12947v1 [cs.CR])
    Despite the robust security features inherent in the 5G framework, attackers will still discover ways to disrupt 5G unmanned aerial vehicle (UAV) operations and decrease UAV control communication performance in Air-to-Ground (A2G) links. Operating under the assumption that the 5G UAV communications infrastructure will never be entirely secure, we propose Deep Attention Recognition (DAtR) as a solution to identify attacks based on a small deep network embedded in authenticated UAVs. Our proposed solution uses two observable parameters: the Signal-to-Interference-plus-Noise Ratio (SINR) and the Reference Signal Received Power (RSSI) to recognize attacks under Line-of-Sight (LoS), Non-Line-of-Sight (NLoS), and a probabilistic combination of the two conditions. In the tested scenarios, a number of attackers are located in random positions, while their power is varied in each simulation. Moreover, terrestrial users are included in the network to impose additional complexity on attack detection. To improve the systems overall performance in the attack scenarios, we propose complementing the deep network decision with two mechanisms based on data manipulation and majority voting techniques. We compare several performance parameters in our proposed Deep Network. For example, the impact of Long Short-Term-Memory (LSTM) and Attention layers in terms of their overall accuracy, the window size effect, and test the accuracy when only partial data is available in the training process. Finally, we benchmark our deep network with six widely used classifiers regarding classification accuracy. Our algorithms accuracy exceeds 4% compared with the eXtreme Gradient Boosting (XGB) classifier in LoS condition and around 3% in the short distance NLoS condition. Considering the proposed deep network, all other classifiers present lower accuracy than XGB.
    Neural Preset for Color Style Transfer. (arXiv:2303.13511v1 [cs.CV])
    In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed. Our method is based on two core designs. First, we propose Deterministic Neural Color Mapping (DNCM) to consistently operate on each pixel via an image-adaptive color mapping matrix, avoiding artifacts and supporting high-resolution inputs with a small memory footprint. Second, we develop a two-stage pipeline by dividing the task into color normalization and stylization, which allows efficient style switching by extracting color styles as presets and reusing them on normalized input images. Due to the unavailability of pairwise datasets, we describe how to train Neural Preset via a self-supervised strategy. Various advantages of Neural Preset over existing methods are demonstrated through comprehensive evaluations. Besides, we show that our trained model can naturally support multiple applications without fine-tuning, including low-light image enhancement, underwater image correction, image dehazing, and image harmonization.
    Time Series as Images: Vision Transformer for Irregularly Sampled Time Series. (arXiv:2303.12799v1 [cs.LG])
    Irregularly sampled time series are becoming increasingly prevalent in various domains, especially in medical applications. Although different highly-customized methods have been proposed to tackle irregularity, how to effectively model their complicated dynamics and high sparsity is still an open problem. This paper studies the problem from a whole new perspective: transforming irregularly sampled time series into line graph images and adapting powerful vision transformers to perform time series classification in the same way as image classification. Our approach largely simplifies algorithm designs without assuming prior knowledge and can be potentially extended as a general-purpose framework. Despite its simplicity, we show that it substantially outperforms state-of-the-art specialized algorithms on several popular healthcare and human activity datasets. Especially in the challenging leave-sensors-out setting where a subset of variables is masked during testing, the performance improvement is up to 54.0\% in absolute F1 score points. Our code and data are available at \url{https://github.com/Leezekun/ViTST}.
    Joint Inference of Diffusion and Structure in Partially Observed Social Networks Using Coupled Matrix Factorization. (arXiv:2010.01400v2 [cs.SI] UPDATED)
    Access to complete data in large-scale networks is often infeasible. Therefore, the problem of missing data is a crucial and unavoidable issue in the analysis and modeling of real-world social networks. However, most of the research on different aspects of social networks does not consider this limitation. One effective way to solve this problem is to recover the missing data as a pre-processing step. In this paper, a model is learned from partially observed data to infer unobserved diffusion and structure networks. To jointly discover omitted diffusion activities and hidden network structures, we develop a probabilistic generative model called "DiffStru." The interrelations among links of nodes and cascade processes are utilized in the proposed method via learning coupled with low-dimensional latent factors. Besides inferring unseen data, latent factors such as community detection may also aid in network classification problems. We tested different missing data scenarios on simulated independent cascades over LFR networks and real datasets, including Twitter and Memtracker. Experiments on these synthetic and real-world datasets show that the proposed method successfully detects invisible social behaviors, predicts links, and identifies latent features.
    Don't FREAK Out: A Frequency-Inspired Approach to Detecting Backdoor Poisoned Samples in DNNs. (arXiv:2303.13211v1 [cs.CR])
    In this paper we investigate the frequency sensitivity of Deep Neural Networks (DNNs) when presented with clean samples versus poisoned samples. Our analysis shows significant disparities in frequency sensitivity between these two types of samples. Building on these findings, we propose FREAK, a frequency-based poisoned sample detection algorithm that is simple yet effective. Our experimental results demonstrate the efficacy of FREAK not only against frequency backdoor attacks but also against some spatial attacks. Our work is just the first step in leveraging these insights. We believe that our analysis and proposed defense mechanism will provide a foundation for future research and development of backdoor defenses.
    FedGH: Heterogeneous Federated Learning with Generalized Global Header. (arXiv:2303.13137v1 [cs.LG])
    Federated learning (FL) is an emerging machine learning paradigm that allows multiple parties to train a shared model collaboratively in a privacy-preserving manner. Existing horizontal FL methods generally assume that the FL server and clients hold the same model structure. However, due to system heterogeneity and the need for personalization, enabling clients to hold models with diverse structures has become an important direction. Existing model-heterogeneous FL approaches often require publicly available datasets and incur high communication and/or computational costs, which limit their performances. To address these limitations, we propose the Federated Global prediction Header (FedGH) approach. It is a communication and computation-efficient model-heterogeneous FL framework which trains a shared generalized global prediction header with representations extracted by heterogeneous extractors for clients' models at the FL server. The trained generalized global prediction header learns from different clients. The acquired global knowledge is then transferred to clients to substitute each client's local prediction header. We derive the non-convex convergence rate of FedGH. Extensive experiments on two real-world datasets demonstrate that FedGH achieves significantly more advantageous performance in both model-homogeneous and -heterogeneous FL scenarios compared to seven state-of-the-art personalized FL models, beating the best-performing baseline by up to 8.87% (for model-homogeneous FL) and 1.83% (for model-heterogeneous FL) in terms of average test accuracy, while saving up to 85.53% of communication overhead.
    Chordal Averaging on Flag Manifolds and Its Applications. (arXiv:2303.13501v1 [cs.CV])
    This paper presents a new, provably-convergent algorithm for computing the flag-mean and flag-median of a set of points on a flag manifold under the chordal metric. The flag manifold is a mathematical space consisting of flags, which are sequences of nested subspaces of a vector space that increase in dimension. The flag manifold is a superset of a wide range of known matrix groups, including Stiefel and Grassmanians, making it a general object that is useful in a wide variety computer vision problems. To tackle the challenge of computing first order flag statistics, we first transform the problem into one that involves auxiliary variables constrained to the Stiefel manifold. The Stiefel manifold is a space of orthogonal frames, and leveraging the numerical stability and efficiency of Stiefel-manifold optimization enables us to compute the flag-mean effectively. Through a series of experiments, we show the competence of our method in Grassmann and rotation averaging, as well as principal component analysis.
    Convex and Nonconvex Sublinear Regression with Application to Data-driven Learning of Reach Sets. (arXiv:2210.01919v2 [eess.SY] UPDATED)
    We consider estimating a compact set from finite data by approximating the support function of that set via sublinear regression. Support functions uniquely characterize a compact set up to closure of convexification, and are sublinear (convex as well as positive homogeneous of degree one). Conversely, any sublinear function is the support function of a compact set. We leverage this property to transcribe the task of learning a compact set to that of learning its support function. We propose two algorithms to perform the sublinear regression, one via convex and another via nonconvex programming. The convex programming approach involves solving a quadratic program (QP). The nonconvex programming approach involves training a input sublinear neural network. We illustrate the proposed methods via numerical examples on learning the reach sets of controlled dynamics subject to set-valued input uncertainties from trajectory data.
    Learning and Verification of Task Structure in Instructional Videos. (arXiv:2303.13519v1 [cs.CV])
    Given the enormous number of instructional videos available online, learning a diverse array of multi-step task models from videos is an appealing goal. We introduce a new pre-trained video model, VideoTaskformer, focused on representing the semantics and structure of instructional videos. We pre-train VideoTaskformer using a simple and effective objective: predicting weakly supervised textual labels for steps that are randomly masked out from an instructional video (masked step modeling). Compared to prior work which learns step representations locally, our approach involves learning them globally, leveraging video of the entire surrounding task as context. From these learned representations, we can verify if an unseen video correctly executes a given task, as well as forecast which steps are likely to be taken after a given step. We introduce two new benchmarks for detecting mistakes in instructional videos, to verify if there is an anomalous step and if steps are executed in the right order. We also introduce a long-term forecasting benchmark, where the goal is to predict long-range future steps from a given step. Our method outperforms previous baselines on these tasks, and we believe the tasks will be a valuable way for the community to measure the quality of step representations. Additionally, we evaluate VideoTaskformer on 3 existing benchmarks -- procedural activity recognition, step classification, and step forecasting -- and demonstrate on each that our method outperforms existing baselines and achieves new state-of-the-art performance.
    Set-the-Scene: Global-Local Training for Generating Controllable NeRF Scenes. (arXiv:2303.13450v1 [cs.CV])
    Recent breakthroughs in text-guided image generation have led to remarkable progress in the field of 3D synthesis from text. By optimizing neural radiance fields (NeRF) directly from text, recent methods are able to produce remarkable results. Yet, these methods are limited in their control of each object's placement or appearance, as they represent the scene as a whole. This can be a major issue in scenarios that require refining or manipulating objects in the scene. To remedy this deficit, we propose a novel GlobalLocal training framework for synthesizing a 3D scene using object proxies. A proxy represents the object's placement in the generated scene and optionally defines its coarse geometry. The key to our approach is to represent each object as an independent NeRF. We alternate between optimizing each NeRF on its own and as part of the full scene. Thus, a complete representation of each object can be learned, while also creating a harmonious scene with style and lighting match. We show that using proxies allows a wide variety of editing options, such as adjusting the placement of each independent object, removing objects from a scene, or refining an object. Our results show that Set-the-Scene offers a powerful solution for scene synthesis and manipulation, filling a crucial gap in controllable text-to-3D synthesis.
    gDDIM: Generalized denoising diffusion implicit models. (arXiv:2206.05564v2 [cs.LG] UPDATED)
    Our goal is to extend the denoising diffusion implicit model (DDIM) to general diffusion models~(DMs) besides isotropic diffusions. Instead of constructing a non-Markov noising process as in the original DDIM, we examine the mechanism of DDIM from a numerical perspective. We discover that the DDIM can be obtained by using some specific approximations of the score when solving the corresponding stochastic differential equation. We present an interpretation of the accelerating effects of DDIM that also explains the advantages of a deterministic sampling scheme over the stochastic one for fast sampling. Building on this insight, we extend DDIM to general DMs, coined generalized DDIM (gDDIM), with a small but delicate modification in parameterizing the score network. We validate gDDIM in two non-isotropic DMs: Blurring diffusion model (BDM) and Critically-damped Langevin diffusion model (CLD). We observe more than 20 times acceleration in BDM. In the CLD, a diffusion model by augmenting the diffusion process with velocity, our algorithm achieves an FID score of 2.26, on CIFAR10, with only 50 number of score function evaluations~(NFEs) and an FID score of 2.86 with only 27 NFEs. Code is available at https://github.com/qsh-zh/gDDIM
    Audio Diffusion Model for Speech Synthesis: A Survey on Text To Speech and Speech Enhancement in Generative AI. (arXiv:2303.13336v1 [cs.SD])
    Generative AI has demonstrated impressive performance in various fields, among which speech synthesis is an interesting direction. With the diffusion model as the most popular generative model, numerous works have attempted two active tasks: text to speech and speech enhancement. This work conducts a survey on audio diffusion model, which is complementary to existing surveys that either lack the recent progress of diffusion-based speech synthesis or highlight an overall picture of applying diffusion model in multiple fields. Specifically, this work first briefly introduces the background of audio and diffusion model. As for the text-to-speech task, we divide the methods into three categories based on the stage where diffusion model is adopted: acoustic model, vocoder and end-to-end framework. Moreover, we categorize various speech enhancement tasks by either certain signals are removed or added into the input speech. Comparisons of experimental results and discussions are also covered in this survey.
    The Low-Rank Simplicity Bias in Deep Networks. (arXiv:2103.10427v4 [cs.LG] UPDATED)
    Modern deep neural networks are highly over-parameterized compared to the data on which they are trained, yet they often generalize remarkably well. A flurry of recent work has asked: why do deep networks not overfit to their training data? In this work, we make a series of empirical observations that investigate and extend the hypothesis that deeper networks are inductively biased to find solutions with lower effective rank embeddings. We conjecture that this bias exists because the volume of functions that maps to low effective rank embedding increases with depth. We show empirically that our claim holds true on finite width linear and non-linear models on practical learning paradigms and show that on natural data, these are often the solutions that generalize well. We then show that the simplicity bias exists at both initialization and after training and is resilient to hyper-parameters and learning methods. We further demonstrate how linear over-parameterization of deep non-linear models can be used to induce low-rank bias, improving generalization performance on CIFAR and ImageNet without changing the modeling capacity.
    TactoFind: A Tactile Only System for Object Retrieval. (arXiv:2303.13482v1 [cs.RO])
    We study the problem of object retrieval in scenarios where visual sensing is absent, object shapes are unknown beforehand and objects can move freely, like grabbing objects out of a drawer. Successful solutions require localizing free objects, identifying specific object instances, and then grasping the identified objects, only using touch feedback. Unlike vision, where cameras can observe the entire scene, touch sensors are local and only observe parts of the scene that are in contact with the manipulator. Moreover, information gathering via touch sensors necessitates applying forces on the touched surface which may disturb the scene itself. Reasoning with touch, therefore, requires careful exploration and integration of information over time -- a challenge we tackle. We present a system capable of using sparse tactile feedback from fingertip touch sensors on a dexterous hand to localize, identify and grasp novel objects without any visual feedback. Videos are available at https://taochenshh.github.io/projects/tactofind.
    Granular-ball Optimization Algorithm. (arXiv:2303.12807v1 [cs.LG])
    The existing intelligent optimization algorithms are designed based on the finest granularity, i.e., a point. This leads to weak global search ability and inefficiency. To address this problem, we proposed a novel multi-granularity optimization algorithm, namely granular-ball optimization algorithm (GBO), by introducing granular-ball computing. GBO uses many granular-balls to cover the solution space. Quite a lot of small and fine-grained granular-balls are used to depict the important parts, and a little number of large and coarse-grained granular-balls are used to depict the inessential parts. Fine multi-granularity data description ability results in a higher global search capability and faster convergence speed. In comparison with the most popular and state-of-the-art algorithms, the experiments on twenty benchmark functions demonstrate its better performance. The faster speed, higher approximation ability of optimal solution, no hyper-parameters, and simpler design of GBO make it an all-around replacement of most of the existing popular intelligent optimization algorithms.
    Generalization with quantum geometry for learning unitaries. (arXiv:2303.13462v1 [quant-ph])
    Generalization is the ability of quantum machine learning models to make accurate predictions on new data by learning from training data. Here, we introduce the data quantum Fisher information metric (DQFIM) to determine when a model can generalize. For variational learning of unitaries, the DQFIM quantifies the amount of circuit parameters and training data needed to successfully train and generalize. We apply the DQFIM to explain when a constant number of training states and polynomial number of parameters are sufficient for generalization. Further, we can improve generalization by removing symmetries from training data. Finally, we show that out-of-distribution generalization, where training and testing data are drawn from different data distributions, can be better than using the same distribution. Our work opens up new approaches to improve generalization in quantum machine learning.
    Persistent Nature: A Generative Model of Unbounded 3D Worlds. (arXiv:2303.13515v1 [cs.CV])
    Despite increasingly realistic image quality, recent 3D image generative models often operate on 3D volumes of fixed extent with limited camera motions. We investigate the task of unconditionally synthesizing unbounded nature scenes, enabling arbitrarily large camera motion while maintaining a persistent 3D world model. Our scene representation consists of an extendable, planar scene layout grid, which can be rendered from arbitrary camera poses via a 3D decoder and volume rendering, and a panoramic skydome. Based on this representation, we learn a generative world model solely from single-view internet photos. Our method enables simulating long flights through 3D landscapes, while maintaining global scene consistency--for instance, returning to the starting point yields the same view of the scene. Our approach enables scene extrapolation beyond the fixed bounds of current 3D generative models, while also supporting a persistent, camera-independent world representation that stands in contrast to auto-regressive 3D prediction models. Our project page: https://chail.github.io/persistent-nature/.
    Almost Sure Convergence of Dropout Algorithms for Neural Networks. (arXiv:2002.02247v2 [math.OC] UPDATED)
    We investigate the convergence and convergence rate of stochastic training algorithms for Neural Networks (NNs) that have been inspired by Dropout (Hinton et al., 2012). With the goal of avoiding overfitting during training of NNs, dropout algorithms consist in practice of multiplying the weight matrices of a NN componentwise by independently drawn random matrices with $\{0, 1 \}$-valued entries during each iteration of Stochastic Gradient Descent (SGD). This paper presents a probability theoretical proof that for fully-connected NNs with differentiable, polynomially bounded activation functions, if we project the weights onto a compact set when using a dropout algorithm, then the weights of the NN converge to a unique stationary point of a projected system of Ordinary Differential Equations (ODEs). After this general convergence guarantee, we go on to investigate the convergence rate of dropout. Firstly, we obtain generic sample complexity bounds for finding $\epsilon$-stationary points of smooth nonconvex functions using SGD with dropout that explicitly depend on the dropout probability. Secondly, we obtain an upper bound on the rate of convergence of Gradient Descent (GD) on the limiting ODEs of dropout algorithms for NNs with the shape of arborescences of arbitrary depth and with linear activation functions. The latter bound shows that for an algorithm such as Dropout or Dropconnect (Wan et al., 2013), the convergence rate can be impaired exponentially by the depth of the arborescence. In contrast, we experimentally observe no such dependence for wide NNs with just a few dropout layers. We also provide a heuristic argument for this observation. Our results suggest that there is a change of scale of the effect of the dropout probability in the convergence rate that depends on the relative size of the width of the NN compared to its depth.
    Reimagining Application User Interface (UI) Design using Deep Learning Methods: Challenges and Opportunities. (arXiv:2303.13055v1 [cs.HC])
    In this paper, we present a review of the recent work in deep learning methods for user interface design. The survey encompasses well known deep learning techniques (deep neural networks, convolutional neural networks, recurrent neural networks, autoencoders, and generative adversarial networks) and datasets widely used to design user interface applications. We highlight important problems and emerging research frontiers in this field. We believe that the use of deep learning for user interface design automation tasks could be one of the high potential fields for the advancement of the software development industry.
    Physics-Embedded Neural Networks: Graph Neural PDE Solvers with Mixed Boundary Conditions. (arXiv:2205.11912v2 [cs.LG] UPDATED)
    Graph neural network (GNN) is a promising approach to learning and predicting physical phenomena described in boundary value problems, such as partial differential equations (PDEs) with boundary conditions. However, existing models inadequately treat boundary conditions essential for the reliable prediction of such problems. In addition, because of the locally connected nature of GNNs, it is difficult to accurately predict the state after a long time, where interaction between vertices tends to be global. We present our approach termed physics-embedded neural networks that considers boundary conditions and predicts the state after a long time using an implicit method. It is built based on an E(n)-equivariant GNN, resulting in high generalization performance on various shapes. We demonstrate that our model learns flow phenomena in complex shapes and outperforms a well-optimized classical solver and a state-of-the-art machine learning model in speed-accuracy trade-off. Therefore, our model can be a useful standard for realizing reliable, fast, and accurate GNN-based PDE solvers. The code is available at https://github.com/yellowshippo/penn-neurips2022.
    A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias. (arXiv:2303.13500v1 [cs.LG])
    Advances in the expressivity of pretrained models have increased interest in the design of adaptation protocols which enable safe and effective transfer learning. Going beyond conventional linear probing (LP) and fine tuning (FT) strategies, protocols that can effectively control feature distortion, i.e., the failure to update features orthogonal to the in-distribution, have been found to achieve improved out-of-distribution generalization (OOD). In order to limit this distortion, the LP+FT protocol, which first learns a linear probe and then uses this initialization for subsequent FT, was proposed. However, in this paper, we find when adaptation protocols (LP, FT, LP+FT) are also evaluated on a variety of safety objectives (e.g., calibration, robustness, etc.), a complementary perspective to feature distortion is helpful to explain protocol behavior. To this end, we study the susceptibility of protocols to simplicity bias (SB), i.e. the well-known propensity of deep neural networks to rely upon simple features, as SB has recently been shown to underlie several problems in robust generalization. Using a synthetic dataset, we demonstrate the susceptibility of existing protocols to SB. Given the strong effectiveness of LP+FT, we then propose modified linear probes that help mitigate SB, and lead to better initializations for subsequent FT. We verify the effectiveness of the proposed LP+FT variants for decreasing SB in a controlled setting, and their ability to improve OOD generalization and safety on three adaptation datasets.
    Ablating Concepts in Text-to-Image Diffusion Models. (arXiv:2303.13516v1 [cs.CV])
    Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. However, these models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. Furthermore, they have been found to replicate the style of various living artists or memorize exact training samples. How can we remove such copyrighted concepts or images without retraining the model from scratch? To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. Our algorithm learns to match the image distribution for a target style, instance, or text prompt we wish to ablate to the distribution corresponding to an anchor concept. This prevents the model from generating target concepts given its text condition. Extensive experiments show that our method can successfully prevent the generation of the ablated concept while preserving closely related concepts in the model.
    Robust Consensus in Ranking Data Analysis: Definitions, Properties and Computational Issues. (arXiv:2303.12878v1 [cs.LG])
    As the issue of robustness in AI systems becomes vital, statistical learning techniques that are reliable even in presence of partly contaminated data have to be developed. Preference data, in the form of (complete) rankings in the simplest situations, are no exception and the demand for appropriate concepts and tools is all the more pressing given that technologies fed by or producing this type of data (e.g. search engines, recommending systems) are now massively deployed. However, the lack of vector space structure for the set of rankings (i.e. the symmetric group $\mathfrak{S}_n$) and the complex nature of statistics considered in ranking data analysis make the formulation of robustness objectives in this domain challenging. In this paper, we introduce notions of robustness, together with dedicated statistical methods, for Consensus Ranking the flagship problem in ranking data analysis, aiming at summarizing a probability distribution on $\mathfrak{S}_n$ by a median ranking. Precisely, we propose specific extensions of the popular concept of breakdown point, tailored to consensus ranking, and address the related computational issues. Beyond the theoretical contributions, the relevance of the approach proposed is supported by an experimental study.
    Adiabatic replay for continual learning. (arXiv:2303.13157v1 [cs.LG])
    Conventional replay-based approaches to continual learning (CL) require, for each learning phase with new data, the replay of samples representing all of the previously learned knowledge in order to avoid catastrophic forgetting. Since the amount of learned knowledge grows over time in CL problems, generative replay spends an increasing amount of time just re-learning what is already known. In this proof-of-concept study, we propose a replay-based CL strategy that we term adiabatic replay (AR), which derives its efficiency from the (reasonable) assumption that each new learning phase is adiabatic, i.e., represents only a small addition to existing knowledge. Each new learning phase triggers a sampling process that selectively replays, from the body of existing knowledge, just such samples that are similar to the new data, in contrast to replaying all of it. Complete replay is not required since AR represents the data distribution by GMMs, which are capable of selectively updating their internal representation only where data statistics have changed. As long as additions are adiabatic, the amount of to-be-replayed samples need not to depend on the amount of previously acquired knowledge at all. We verify experimentally that AR is superior to state-of-the-art deep generative replay using VAEs.
    Preference-Aware Constrained Multi-Objective Bayesian Optimization. (arXiv:2303.13034v1 [cs.LG])
    This paper addresses the problem of constrained multi-objective optimization over black-box objective functions with practitioner-specified preferences over the objectives when a large fraction of the input space is infeasible (i.e., violates constraints). This problem arises in many engineering design problems including analog circuits and electric power system design. Our overall goal is to approximate the optimal Pareto set over the small fraction of feasible input designs. The key challenges include the huge size of the design space, multiple objectives and large number of constraints, and the small fraction of feasible input designs which can be identified only after performing expensive simulations. We propose a novel and efficient preference-aware constrained multi-objective Bayesian optimization approach referred to as PAC-MOO to address these challenges. The key idea is to learn surrogate models for both output objectives and constraints, and select the candidate input for evaluation in each iteration that maximizes the information gained about the optimal constrained Pareto front while factoring in the preferences over objectives. Our experiments on two real-world analog circuit design optimization problems demonstrate the efficacy of PAC-MOO over prior methods.
    The effectiveness of MAE pre-pretraining for billion-scale pretraining. (arXiv:2303.13496v1 [cs.CV])
    This paper revisits the standard pretrain-then-finetune paradigm used in computer vision for visual recognition tasks. Typically, state-of-the-art foundation models are pretrained using large scale (weakly) supervised datasets with billions of images. We introduce an additional pre-pretraining stage that is simple and uses the self-supervised MAE technique to initialize the model. While MAE has only been shown to scale with the size of models, we find that it scales with the size of the training dataset as well. Thus, our MAE-based pre-pretraining scales with both model and data size making it applicable for training foundation models. Pre-pretraining consistently improves both the model convergence and the downstream transfer performance across a range of model scales (millions to billions of parameters), and dataset sizes (millions to billions of images). We measure the effectiveness of pre-pretraining on 10 different visual recognition tasks spanning image classification, video recognition, object detection, low-shot classification and zero-shot recognition. Our largest model achieves new state-of-the-art results on iNaturalist-18 (91.3%), 1-shot ImageNet-1k (62.1%), and zero-shot transfer on Food-101 (96.0%). Our study reveals that model initialization plays a significant role, even for web-scale pretraining with billions of images.
    Cube-Based 3D Denoising Diffusion Probabilistic Model for Cone Beam Computed Tomography Reconstruction with Incomplete Data. (arXiv:2303.12861v1 [eess.IV])
    Deep learning (DL) has been extensively researched in the field of computed tomography (CT) reconstruction with incomplete data, particularly in sparse-view CT reconstruction. However, applying DL to sparse-view cone beam CT (CBCT) remains challenging. Many models learn the mapping from sparse-view CT images to ground truth but struggle to achieve satisfactory performance in terms of global artifact removal. Incorporating sinogram data and utilizing dual-domain information can enhance anti-artifact performance, but this requires storing the entire sinogram in memory. This presents a memory issue for high-resolution CBCT sinograms, limiting further research and application. In this paper, we propose a cube-based 3D denoising diffusion probabilistic model (DDPM) for CBCT reconstruction using down-sampled data. A DDPM network, trained on cubes extracted from paired fully sampled sinograms and down-sampled sinograms, is employed to inpaint down-sampled sinograms. Our method divides the entire sinogram into overlapping cubes and processes these cubes in parallel using multiple GPUs, overcoming memory limitations. Experimental results demonstrate that our approach effectively suppresses few-view artifacts while preserving textural details faithfully.
    FTSO: Effective NAS via First Topology Second Operator. (arXiv:2303.12948v1 [cs.CV])
    Existing one-shot neural architecture search (NAS) methods have to conduct a search over a giant super-net, which leads to the huge computational cost. To reduce such cost, in this paper, we propose a method, called FTSO, to divide the whole architecture search into two sub-steps. Specifically, in the first step, we only search for the topology, and in the second step, we search for the operators. FTSO not only reduces NAS's search time from days to 0.68 seconds, but also significantly improves the found architecture's accuracy. Our extensive experiments on ImageNet show that within 18 seconds, FTSO can achieve a 76.4% testing accuracy, 1.5% higher than the SOTA, PC-DARTS. In addition, FTSO can reach a 97.77% testing accuracy, 0.27% higher than the SOTA, with nearly 100% (99.8%) search time saved, when searching on CIFAR10.
    Human Behavior in the Time of COVID-19: Learning from Big Data. (arXiv:2303.13452v1 [cs.CY])
    Since the World Health Organization (WHO) characterized COVID-19 as a pandemic in March 2020, there have been over 600 million confirmed cases of COVID-19 and more than six million deaths as of October 2022. The relationship between the COVID-19 pandemic and human behavior is complicated. On one hand, human behavior is found to shape the spread of the disease. On the other hand, the pandemic has impacted and even changed human behavior in almost every aspect. To provide a holistic understanding of the complex interplay between human behavior and the COVID-19 pandemic, researchers have been employing big data techniques such as natural language processing, computer vision, audio signal processing, frequent pattern mining, and machine learning. In this study, we present an overview of the existing studies on using big data techniques to study human behavior in the time of the COVID-19 pandemic. In particular, we categorize these studies into three groups - using big data to measure, model, and leverage human behavior, respectively. The related tasks, data, and methods are summarized accordingly. To provide more insights into how to fight the COVID-19 pandemic and future global catastrophes, we further discuss challenges and potential opportunities.
    Reevaluating Data Partitioning for Emotion Detection in EmoWOZ. (arXiv:2303.13364v1 [cs.CL])
    This paper focuses on the EmoWoz dataset, an extension of MultiWOZ that provides emotion labels for the dialogues. MultiWOZ was partitioned initially for another purpose, resulting in a distributional shift when considering the new purpose of emotion recognition. The emotion tags in EmoWoz are highly imbalanced and unevenly distributed across the partitions, which causes sub-optimal performance and poor comparison of models. We propose a stratified sampling scheme based on emotion tags to address this issue, improve the dataset's distribution, and reduce dataset shift. We also introduce a special technique to handle conversation (sequential) data with many emotional tags. Using our proposed sampling method, models built upon EmoWoz can perform better, making it a more reliable resource for training conversational agents with emotional intelligence. We recommend that future researchers use this new partitioning to ensure consistent and accurate performance evaluations.
    Take 5: Interpretable Image Classification with a Handful of Features. (arXiv:2303.13166v1 [cs.CV])
    Deep Neural Networks use thousands of mostly incomprehensible features to identify a single class, a decision no human can follow. We propose an interpretable sparse and low dimensional final decision layer in a deep neural network with measurable aspects of interpretability and demonstrate it on fine-grained image classification. We argue that a human can only understand the decision of a machine learning model, if the features are interpretable and only very few of them are used for a single decision. For that matter, the final layer has to be sparse and, to make interpreting the features feasible, low dimensional. We call a model with a Sparse Low-Dimensional Decision SLDD-Model. We show that a SLDD-Model is easier to interpret locally and globally than a dense high-dimensional decision layer while being able to maintain competitive accuracy. Additionally, we propose a loss function that improves a model's feature diversity and accuracy. Our more interpretable SLDD-Model only uses 5 out of just 50 features per class, while maintaining 97% to 100% of the accuracy on four common benchmark datasets compared to the baseline model with 2048 features.
    Towards the Scalable Evaluation of Cooperativeness in Language Models. (arXiv:2303.13360v1 [cs.CL])
    It is likely that AI systems driven by pre-trained language models (PLMs) will increasingly be used to assist humans in high-stakes interactions with other agents, such as negotiation or conflict resolution. Consistent with the goals of Cooperative AI \citep{dafoe_open_2020}, we wish to understand and shape the multi-agent behaviors of PLMs in a pro-social manner. An important first step is the evaluation of model behaviour across diverse cooperation problems. Since desired behaviour in an interaction depends upon precise game-theoretic structure, we focus on generating scenarios with particular structures with both crowdworkers and a language model. Our work proceeds as follows. First, we discuss key methodological issues in the generation of scenarios corresponding to particular game-theoretic structures. Second, we employ both crowdworkers and a language model to generate such scenarios. We find that the quality of generations tends to be mediocre in both cases. We additionally get both crowdworkers and a language model to judge whether given scenarios align with their intended game-theoretic structure, finding mixed results depending on the game. Third, we provide a dataset of scenario based on our data generated. We provide both quantitative and qualitative evaluations of UnifiedQA and GPT-3 on this dataset. We find that instruct-tuned models tend to act in a way that could be perceived as cooperative when scaled up, while other models seemed to have flat scaling trends.
    SC-MIL: Supervised Contrastive Multiple Instance Learning for Imbalanced Classification in Pathology. (arXiv:2303.13405v1 [cs.CV])
    Multiple Instance learning (MIL) models have been extensively used in pathology to predict biomarkers and risk-stratify patients from gigapixel-sized images. Machine learning problems in medical imaging often deal with rare diseases, making it important for these models to work in a label-imbalanced setting. Furthermore, these imbalances can occur in out-of-distribution (OOD) datasets when the models are deployed in the real-world. We leverage the idea that decoupling feature and classifier learning can lead to improved decision boundaries for label imbalanced datasets. To this end, we investigate the integration of supervised contrastive learning with multiple instance learning (SC-MIL). Specifically, we propose a joint-training MIL framework in the presence of label imbalance that progressively transitions from learning bag-level representations to optimal classifier learning. We perform experiments with different imbalance settings for two well-studied problems in cancer pathology: subtyping of non-small cell lung cancer and subtyping of renal cell carcinoma. SC-MIL provides large and consistent improvements over other techniques on both in-distribution (ID) and OOD held-out sets across multiple imbalanced settings.
    Enriching Neural Network Training Dataset to Improve Worst-Case Performance Guarantees. (arXiv:2303.13228v1 [cs.LG])
    Machine learning algorithms, especially Neural Networks (NNs), are a valuable tool used to approximate non-linear relationships, like the AC-Optimal Power Flow (AC-OPF), with considerable accuracy -- and achieving a speedup of several orders of magnitude when deployed for use. Often in power systems literature, the NNs are trained with a fixed dataset generated prior to the training process. In this paper, we show that adapting the NN training dataset during training can improve the NN performance and substantially reduce its worst-case violations. This paper proposes an algorithm that identifies and enriches the training dataset with critical datapoints that reduce the worst-case violations and deliver a neural network with improved worst-case performance guarantees. We demonstrate the performance of our algorithm in four test power systems, ranging from 39-buses to 162-buses.
    Adversarial Robustness of Learning-based Static Malware Classifiers. (arXiv:2303.13372v1 [cs.CR])
    Malware detection has long been a stage for an ongoing arms race between malware authors and anti-virus systems. Solutions that utilize machine learning (ML) gain traction as the scale of this arms race increases. This trend, however, makes performing attacks directly on ML an attractive prospect for adversaries. We study this arms race from both perspectives in the context of MalConv, a popular convolutional neural network-based malware classifier that operates on raw bytes of files. First, we show that MalConv is vulnerable to adversarial patch attacks: appending a byte-level patch to malware files bypasses detection 94.3% of the time. Moreover, we develop a universal adversarial patch (UAP) attack where a single patch can drop the detection rate in constant time of any malware file that contains it by 80%. These patches are effective even being relatively small with respect to the original file size -- between 2%-8%. As a countermeasure, we then perform window ablation that allows us to apply de-randomized smoothing, a modern certified defense to patch attacks in vision tasks, to raw files. The resulting `smoothed-MalConv' can detect over 80% of malware that contains the universal patch and provides certified robustness up to 66%, outlining a promising step towards robust malware detection. To our knowledge, we are the first to apply universal adversarial patch attack and certified defense using ablations on byte level in the malware field.
    Decentralized Adversarial Training over Graphs. (arXiv:2303.13326v1 [cs.LG])
    The vulnerability of machine learning models to adversarial attacks has been attracting considerable attention in recent years. Most existing studies focus on the behavior of stand-alone single-agent learners. In comparison, this work studies adversarial training over graphs, where individual agents are subjected to perturbations of varied strength levels across space. It is expected that interactions by linked agents, and the heterogeneity of the attack models that are possible over the graph, can help enhance robustness in view of the coordination power of the group. Using a min-max formulation of diffusion learning, we develop a decentralized adversarial training framework for multi-agent systems. We analyze the convergence properties of the proposed scheme for both convex and non-convex environments, and illustrate the enhanced robustness to adversarial attacks.
    Continuous Indeterminate Probability Neural Network. (arXiv:2303.12964v1 [cs.LG])
    This paper introduces a general model called CIPNN - Continuous Indeterminate Probability Neural Network, and this model is based on IPNN, which is used for discrete latent random variables. Currently, posterior of continuous latent variables is regarded as intractable, with the new theory proposed by IPNN this problem can be solved. Our contributions are Four-fold. First, we derive the analytical solution of the posterior calculation of continuous latent random variables and propose a general classification model (CIPNN). Second, we propose a general auto-encoder called CIPAE - Continuous Indeterminate Probability Auto-Encoder, the decoder part is not a neural network and uses a fully probabilistic inference model for the first time. Third, we propose a new method to visualize the latent random variables, we use one of N dimensional latent variables as a decoder to reconstruct the input image, which can work even for classification tasks, in this way, we can see what each latent variable has learned. Fourth, IPNN has shown great classification capability, CIPNN has pushed this classification capability to infinity. Theoretical advantages are reflected in experimental results.
    SignCRF: Scalable Channel-agnostic Data-driven Radio Authentication System. (arXiv:2303.12811v1 [cs.CR])
    Radio Frequency Fingerprinting through Deep Learning (RFFDL) is a data-driven IoT authentication technique that leverages the unique hardware-level manufacturing imperfections associated with a particular device to recognize (fingerprint) the device based on variations introduced in the transmitted waveform. The proposed SignCRF is a scalable, channel-agnostic, data-driven radio authentication platform with unmatched precision in fingerprinting wireless devices based on their unique manufacturing impairments and independent of the dynamic channel irregularities caused by mobility. SignCRF consists of (i) a baseline classifier finely trained to authenticate devices with high accuracy and at scale; (ii) an environment translator carefully designed and trained to remove the dynamic channel impact from RF signals while maintaining the radio's specific signature; (iii) a Max-Rule module that selects the highest precision authentication technique between the baseline classifier and the environment translator per radio. We design, train, and validate the performance of SignCRF for multiple technologies in dynamic environments and at scale (100 LoRa and 20 WiFi devices). We demonstrate that SignCRF significantly improves the RFFDL performance by achieving as high as 5x and 8x improvement in correct authentication of WiFi and LoRa devices when compared to the state-of-the-art, respectively.
    TRON: Transformer Neural Network Acceleration with Non-Coherent Silicon Photonics. (arXiv:2303.12914v1 [cs.LG])
    Transformer neural networks are rapidly being integrated into state-of-the-art solutions for natural language processing (NLP) and computer vision. However, the complex structure of these models creates challenges for accelerating their execution on conventional electronic platforms. We propose the first silicon photonic hardware neural network accelerator called TRON for transformer-based models such as BERT, and Vision Transformers. Our analysis demonstrates that TRON exhibits at least 14x better throughput and 8x better energy efficiency, in comparison to state-of-the-art transformer accelerators.
    Increasing Textual Context Size Boosts Medical Image-Text Matching. (arXiv:2303.13340v1 [cs.LG])
    This short technical report demonstrates a simple technique that yields state of the art results in medical image-text matching tasks. We analyze the use of OpenAI's CLIP, a general image-text matching model, and observe that CLIP's limited textual input size has negative impact on downstream performance in the medical domain where encoding longer textual contexts is often required. We thus train and release ClipMD, which is trained with a simple sliding window technique to encode textual captions. ClipMD was tested on two medical image-text datasets and compared with other image-text matching models. The results show that ClipMD outperforms other models on both datasets by a large margin. We make our code and pretrained model publicly available.
    MSAT: Biologically Inspired Multi-Stage Adaptive Threshold for Conversion of Spiking Neural Networks. (arXiv:2303.13080v1 [cs.NE])
    Spiking Neural Networks (SNNs) can do inference with low power consumption due to their spike sparsity. ANN-SNN conversion is an efficient way to achieve deep SNNs by converting well-trained Artificial Neural Networks (ANNs). However, the existing methods commonly use constant threshold for conversion, which prevents neurons from rapidly delivering spikes to deeper layers and causes high time delay. In addition, the same response for different inputs may result in information loss during the information transmission. Inspired by the biological model mechanism, we propose a multi-stage adaptive threshold (MSAT). Specifically, for each neuron, the dynamic threshold varies with firing history and input properties and is positively correlated with the average membrane potential and negatively correlated with the rate of depolarization. The self-adaptation to membrane potential and input allows a timely adjustment of the threshold to fire spike faster and transmit more information. Moreover, we analyze the Spikes of Inactivated Neurons error which is pervasive in early time steps and propose spike confidence accordingly as a measurement of confidence about the neurons that correctly deliver spikes. We use such spike confidence in early time steps to determine whether to elicit spike to alleviate this error. Combined with the proposed method, we examine the performance on non-trivial datasets CIFAR-10, CIFAR-100, and ImageNet. We also conduct sentiment classification and speech recognition experiments on the IDBM and Google speech commands datasets respectively. Experiments show near-lossless and lower latency ANN-SNN conversion. To the best of our knowledge, this is the first time to build a biologically inspired multi-stage adaptive threshold for converted SNN, with comparable performance to state-of-the-art methods while improving energy efficiency.
    Laplacian Segmentation Networks: Improved Epistemic Uncertainty from Spatial Aleatoric Uncertainty. (arXiv:2303.13123v1 [cs.CV])
    Out of distribution (OOD) medical images are frequently encountered, e.g. because of site- or scanner differences, or image corruption. OOD images come with a risk of incorrect image segmentation, potentially negatively affecting downstream diagnoses or treatment. To ensure robustness to such incorrect segmentations, we propose Laplacian Segmentation Networks (LSN) that jointly model epistemic (model) and aleatoric (data) uncertainty in image segmentation. We capture data uncertainty with a spatially correlated logit distribution. For model uncertainty, we propose the first Laplace approximation of the weight posterior that scales to large neural networks with skip connections that have high-dimensional outputs. Empirically, we demonstrate that modelling spatial pixel correlation allows the Laplacian Segmentation Network to successfully assign high epistemic uncertainty to out-of-distribution objects appearing within images.
    Xplainer: From X-Ray Observations to Explainable Zero-Shot Diagnosis. (arXiv:2303.13391v1 [cs.CV])
    Automated diagnosis prediction from medical images is a valuable resource to support clinical decision-making. However, such systems usually need to be trained on large amounts of annotated data, which often is scarce in the medical domain. Zero-shot methods address this challenge by allowing a flexible adaption to new settings with different clinical findings without relying on labeled data. Further, to integrate automated diagnosis in the clinical workflow, methods should be transparent and explainable, increasing medical professionals' trust and facilitating correctness verification. In this work, we introduce Xplainer, a novel framework for explainable zero-shot diagnosis in the clinical setting. Xplainer adapts the classification-by-description approach of contrastive vision-language models to the multi-label medical diagnosis task. Specifically, instead of directly predicting a diagnosis, we prompt the model to classify the existence of descriptive observations, which a radiologist would look for on an X-Ray scan, and use the descriptor probabilities to estimate the likelihood of a diagnosis. Our model is explainable by design, as the final diagnosis prediction is directly based on the prediction of the underlying descriptors. We evaluate Xplainer on two chest X-ray datasets, CheXpert and ChestX-ray14, and demonstrate its effectiveness in improving the performance and explainability of zero-shot diagnosis. Our results suggest that Xplainer provides a more detailed understanding of the decision-making process and can be a valuable tool for clinical diagnosis.
    The power and limitations of learning quantum dynamics incoherently. (arXiv:2303.12834v1 [quant-ph])
    Quantum process learning is emerging as an important tool to study quantum systems. While studied extensively in coherent frameworks, where the target and model system can share quantum information, less attention has been paid to whether the dynamics of quantum systems can be learned without the system and target directly interacting. Such incoherent frameworks are practically appealing since they open up methods of transpiling quantum processes between the different physical platforms without the need for technically challenging hybrid entanglement schemes. Here we provide bounds on the sample complexity of learning unitary processes incoherently by analyzing the number of measurements that are required to emulate well-established coherent learning strategies. We prove that if arbitrary measurements are allowed, then any efficiently representable unitary can be efficiently learned within the incoherent framework; however, when restricted to shallow-depth measurements only low-entangling unitaries can be learned. We demonstrate our incoherent learning algorithm for low entangling unitaries by successfully learning a 16-qubit unitary on \texttt{ibmq\_kolkata}, and further demonstrate the scalabilty of our proposed algorithm through extensive numerical experiments.
    Semantic Image Attack for Visual Model Diagnosis. (arXiv:2303.13010v1 [cs.CV])
    In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models. This is partially due to the fact that obtaining a balanced, diverse, and perfectly labeled dataset is typically expensive, time-consuming, and error-prone. Rather than relying on a carefully designed test set to assess ML models' failures, fairness, or robustness, this paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images to allow model diagnosis, interpretability, and robustness. Traditional adversarial training is a popular methodology for robustifying ML models against attacks. However, existing adversarial methods do not combine the two aspects that enable the interpretation and analysis of the model's flaws: semantic traceability and perceptual quality. SIA combines the two features via iterative gradient ascent on a predefined semantic attribute space and the image space. We illustrate the validity of our approach in three scenarios for keypoint detection and classification. (1) Model diagnosis: SIA generates a histogram of attributes that highlights the semantic vulnerability of the ML model (i.e., attributes that make the model fail). (2) Stronger attacks: SIA generates adversarial examples with visually interpretable attributes that lead to higher attack success rates than baseline methods. The adversarial training on SIA improves the transferable robustness across different gradient-based attacks. (3) Robustness to imbalanced datasets: we use SIA to augment the underrepresented classes, which outperforms strong augmentation and re-balancing baselines.
    RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research. (arXiv:2303.13117v1 [math.OC])
    Reinforcement learning has been applied in operation research and has shown promise in solving large combinatorial optimization problems. However, existing works focus on developing neural network architectures for certain problems. These works lack the flexibility to incorporate recent advances in reinforcement learning, as well as the flexibility of customizing model architectures for operation research problems. In this work, we analyze the end-to-end autoregressive models for vehicle routing problems and show that these models can benefit from the recent advances in reinforcement learning with a careful re-implementation of the model architecture. In particular, we re-implemented the Attention Model and trained it with Proximal Policy Optimization (PPO) in CleanRL, showing at least 8 times speed up in training time. We hereby introduce RLOR, a flexible framework for Deep Reinforcement Learning for Operation Research. We believe that a flexible framework is key to developing deep reinforcement learning models for operation research problems. The code of our work is publicly available at https://github.com/cpwan/RLOR.
    Towards A Visual Programming Tool to Create Deep Learning Models. (arXiv:2303.12821v1 [cs.HC])
    Deep Learning (DL) developers come from different backgrounds, e.g., medicine, genomics, finance, and computer science. To create a DL model, they must learn and use high-level programming languages (e.g., Python), thus needing to handle related setups and solve programming errors. This paper presents DeepBlocks, a visual programming tool that allows DL developers to design, train, and evaluate models without relying on specific programming languages. DeepBlocks works by building on the typical model structure: a sequence of learnable functions whose arrangement defines the specific characteristics of the model. We derived DeepBlocks' design goals from a 5-participants formative interview, and we validated the first implementation of the tool through a typical use case. Results are promising and show that developers could visually design complex DL architectures.
    Fault Prognosis of Turbofan Engines: Eventual Failure Prediction and Remaining Useful Life Estimation. (arXiv:2303.12982v1 [cs.LG])
    In the era of industrial big data, prognostics and health management is essential to improve the prediction of future failures to minimize inventory, maintenance, and human costs. Used for the 2021 PHM Data Challenge, the new Commercial Modular Aero-Propulsion System Simulation dataset from NASA is an open-source benchmark containing simulated turbofan engine units flown under realistic flight conditions. Deep learning approaches implemented previously for this application attempt to predict the remaining useful life of the engine units, but have not utilized labeled failure mode information, impeding practical usage and explainability. To address these limitations, a new prognostics approach is formulated with a customized loss function to simultaneously predict the current health state, the eventual failing component(s), and the remaining useful life. The proposed method incorporates principal component analysis to orthogonalize statistical time-domain features, which are inputs into supervised regressors such as random forests, extreme random forests, XGBoost, and artificial neural networks. The highest performing algorithm, ANN-Flux, achieves AUROC and AUPR scores exceeding 0.95 for each classification. In addition, ANN-Flux reduces the remaining useful life RMSE by 38% for the same test split of the dataset compared to past work, with significantly less computational cost.
    Self-distillation for surgical action recognition. (arXiv:2303.12915v1 [cs.CV])
    Surgical scene understanding is a key prerequisite for contextaware decision support in the operating room. While deep learning-based approaches have already reached or even surpassed human performance in various fields, the task of surgical action recognition remains a major challenge. With this contribution, we are the first to investigate the concept of self-distillation as a means of addressing class imbalance and potential label ambiguity in surgical video analysis. Our proposed method is a heterogeneous ensemble of three models that use Swin Transfomers as backbone and the concepts of self-distillation and multi-task learning as core design choices. According to ablation studies performed with the CholecT45 challenge data via cross-validation, the biggest performance boost is achieved by the usage of soft labels obtained by self-distillation. External validation of our method on an independent test set was achieved by providing a Docker container of our inference model to the challenge organizers. According to their analysis, our method outperforms all other solutions submitted to the latest challenge in the field. Our approach thus shows the potential of self-distillation for becoming an important tool in medical image analysis applications.
    Revisiting the Fragility of Influence Functions. (arXiv:2303.12922v1 [cs.LG])
    In the last few years, many works have tried to explain the predictions of deep learning models. Few methods, however, have been proposed to verify the accuracy or faithfulness of these explanations. Recently, influence functions, which is a method that approximates the effect that leave-one-out training has on the loss function, has been shown to be fragile. The proposed reason for their fragility remains unclear. Although previous work suggests the use of regularization to increase robustness, this does not hold in all cases. In this work, we seek to investigate the experiments performed in the prior work in an effort to understand the underlying mechanisms of influence function fragility. First, we verify influence functions using procedures from the literature under conditions where the convexity assumptions of influence functions are met. Then, we relax these assumptions and study the effects of non-convexity by using deeper models and more complex datasets. Here, we analyze the key metrics and procedures that are used to validate influence functions. Our results indicate that the validation procedures may cause the observed fragility.
    Controllable Inversion of Black-Box Face-Recognition Models via Diffusion. (arXiv:2303.13006v1 [cs.CV])
    Face recognition models embed a face image into a low-dimensional identity vector containing abstract encodings of identity-specific facial features that allow individuals to be distinguished from one another. We tackle the challenging task of inverting the latent space of pre-trained face recognition models without full model access (i.e. black-box setting). A variety of methods have been proposed in literature for this task, but they have serious shortcomings such as a lack of realistic outputs, long inference times, and strong requirements for the data set and accessibility of the face recognition model. Through an analysis of the black-box inversion problem, we show that the conditional diffusion model loss naturally emerges and that we can effectively sample from the inverse distribution even without an identity-specific loss. Our method, named identity denoising diffusion probabilistic model (ID3PM), leverages the stochastic nature of the denoising diffusion process to produce high-quality, identity-preserving face images with various backgrounds, lighting, poses, and expressions. We demonstrate state-of-the-art performance in terms of identity preservation and diversity both qualitatively and quantitatively. Our method is the first black-box face recognition model inversion method that offers intuitive control over the generation process and does not suffer from any of the common shortcomings from competing methods.
    Automated Federated Learning in Mobile Edge Networks -- Fast Adaptation and Convergence. (arXiv:2303.12999v1 [cs.LG])
    Federated Learning (FL) can be used in mobile edge networks to train machine learning models in a distributed manner. Recently, FL has been interpreted within a Model-Agnostic Meta-Learning (MAML) framework, which brings FL significant advantages in fast adaptation and convergence over heterogeneous datasets. However, existing research simply combines MAML and FL without explicitly addressing how much benefit MAML brings to FL and how to maximize such benefit over mobile edge networks. In this paper, we quantify the benefit from two aspects: optimizing FL hyperparameters (i.e., sampled data size and the number of communication rounds) and resource allocation (i.e., transmit power) in mobile edge networks. Specifically, we formulate the MAML-based FL design as an overall learning time minimization problem, under the constraints of model accuracy and energy consumption. Facilitated by the convergence analysis of MAML-based FL, we decompose the formulated problem and then solve it using analytical solutions and the coordinate descent method. With the obtained FL hyperparameters and resource allocation, we design a MAML-based FL algorithm, called Automated Federated Learning (AutoFL), that is able to conduct fast adaptation and convergence. Extensive experimental results verify that AutoFL outperforms other benchmark algorithms regarding the learning time and convergence performance.
    It is all Connected: A New Graph Formulation for Spatio-Temporal Forecasting. (arXiv:2303.13177v1 [cs.LG])
    With an ever-increasing number of sensors in modern society, spatio-temporal time series forecasting has become a de facto tool to make informed decisions about the future. Most spatio-temporal forecasting models typically comprise distinct components that learn spatial and temporal dependencies. A common methodology employs some Graph Neural Network (GNN) to capture relations between spatial locations, while another network, such as a recurrent neural network (RNN), learns temporal correlations. By representing every recorded sample as its own node in a graph, rather than all measurements for a particular location as a single node, temporal and spatial information is encoded in a similar manner. In this setting, GNNs can now directly learn both temporal and spatial dependencies, jointly, while also alleviating the need for additional temporal networks. Furthermore, the framework does not require aligned measurements along the temporal dimension, meaning that it also naturally facilitates irregular time series, different sampling frequencies or missing data, without the need for data imputation. To evaluate the proposed methodology, we consider wind speed forecasting as a case study, where our proposed framework outperformed other spatio-temporal models using GNNs with either Transformer or LSTM networks as temporal update functions.
    Improving Generalization with Domain Convex Game. (arXiv:2303.13297v1 [cs.CV])
    Domain generalization (DG) tends to alleviate the poor generalization capability of deep neural networks by learning model with multiple source domains. A classical solution to DG is domain augmentation, the common belief of which is that diversifying source domains will be conducive to the out-of-distribution generalization. However, these claims are understood intuitively, rather than mathematically. Our explorations empirically reveal that the correlation between model generalization and the diversity of domains may be not strictly positive, which limits the effectiveness of domain augmentation. This work therefore aim to guarantee and further enhance the validity of this strand. To this end, we propose a new perspective on DG that recasts it as a convex game between domains. We first encourage each diversified domain to enhance model generalization by elaborately designing a regularization term based on supermodularity. Meanwhile, a sample filter is constructed to eliminate low-quality samples, thereby avoiding the impact of potentially harmful information. Our framework presents a new avenue for the formal analysis of DG, heuristic analysis and extensive experiments demonstrate the rationality and effectiveness.
    TSI-GAN: Unsupervised Time Series Anomaly Detection using Convolutional Cycle-Consistent Generative Adversarial Networks. (arXiv:2303.12952v1 [cs.LG])
    Anomaly detection is widely used in network intrusion detection, autonomous driving, medical diagnosis, credit card frauds, etc. However, several key challenges remain open, such as lack of ground truth labels, presence of complex temporal patterns, and generalizing over different datasets. This paper proposes TSI-GAN, an unsupervised anomaly detection model for time-series that can learn complex temporal patterns automatically and generalize well, i.e., no need for choosing dataset-specific parameters, making statistical assumptions about underlying data, or changing model architectures. To achieve these goals, we convert each input time-series into a sequence of 2D images using two encoding techniques with the intent of capturing temporal patterns and various types of deviance. Moreover, we design a reconstructive GAN that uses convolutional layers in an encoder-decoder network and employs cycle-consistency loss during training to ensure that inverse mappings are accurate as well. In addition, we also instrument a Hodrick-Prescott filter in post-processing to mitigate false positives. We evaluate TSI-GAN using 250 well-curated and harder-than-usual datasets and compare with 8 state-of-the-art baseline methods. The results demonstrate the superiority of TSI-GAN to all the baselines, offering an overall performance improvement of 13% and 31% over the second-best performer MERLIN and the third-best performer LSTM-AE, respectively.
    Predicting the Initial Conditions of the Universe using Deep Learning. (arXiv:2303.13056v1 [astro-ph.CO])
    Finding the initial conditions that led to the current state of the universe is challenging because it involves searching over a vast input space of initial conditions, along with modeling their evolution via tools such as N-body simulations which are computationally expensive. Deep learning has emerged as an alternate modeling tool that can learn the mapping between the linear input of an N-body simulation and the final nonlinear displacements at redshift zero, which can significantly accelerate the forward modeling. However, this does not help reduce the search space for initial conditions. In this paper, we demonstrate for the first time that a deep learning model can be trained for the reverse mapping. We train a V-Net based convolutional neural network, which outputs the linear displacement of an N-body system, given the current time nonlinear displacement and the cosmological parameters of the system. We demonstrate that this neural network accurately recovers the initial linear displacement field over a wide range of scales ($<1$-$2\%$ error up to nearly $k = 1\ \mathrm{Mpc}^{-1}\,h$), despite the ill-defined nature of the inverse problem at smaller scales. Specifically, smaller scales are dominated by nonlinear effects which makes the backward dynamics much more susceptible to numerical and computational errors leading to highly divergent backward trajectories and a one-to-many backward mapping. The results of our method motivate that neural network based models can act as good approximators of the initial linear states and their predictions can serve as good starting points for sampling-based methods to infer the initial states of the universe.
    An Empirical Analysis of the Shift and Scale Parameters in BatchNorm. (arXiv:2303.12818v1 [cs.LG])
    Batch Normalization (BatchNorm) is a technique that improves the training of deep neural networks, especially Convolutional Neural Networks (CNN). It has been empirically demonstrated that BatchNorm increases performance, stability, and accuracy, although the reasons for such improvements are unclear. BatchNorm includes a normalization step as well as trainable shift and scale parameters. In this paper, we empirically examine the relative contribution to the success of BatchNorm of the normalization step, as compared to the re-parameterization via shifting and scaling. To conduct our experiments, we implement two new optimizers in PyTorch, namely, a version of BatchNorm that we refer to as AffineLayer, which includes the re-parameterization step without normalization, and a version with just the normalization step, that we call BatchNorm-minus. We compare the performance of our AffineLayer and BatchNorm-minus implementations to standard BatchNorm, and we also compare these to the case where no batch normalization is used. We experiment with four ResNet architectures (ResNet18, ResNet34, ResNet50, and ResNet101) over a standard image dataset and multiple batch sizes. Among other findings, we provide empirical evidence that the success of BatchNorm may derive primarily from improved weight initialization.
    Test-time Defense against Adversarial Attacks: Detection and Reconstruction of Adversarial Examples via Masked Autoencoder. (arXiv:2303.12848v1 [cs.CV])
    Existing defense methods against adversarial attacks can be categorized into training time and test time defenses. Training time defense, i.e., adversarial training, requires a significant amount of extra time for training and is often not able to be generalized to unseen attacks. On the other hand, test time defense by test time weight adaptation requires access to perform gradient descent on (part of) the model weights, which could be infeasible for models with frozen weights. To address these challenges, we propose DRAM, a novel defense method to Detect and Reconstruct multiple types of Adversarial attacks via Masked autoencoder (MAE). We demonstrate how to use MAE losses to build a KS-test to detect adversarial attacks. Moreover, the MAE losses can be used to repair adversarial samples from unseen attack types. In this sense, DRAM neither requires model weight updates in test time nor augments the training set with more adversarial samples. Evaluating DRAM on the large-scale ImageNet data, we achieve the best detection rate of 82% on average on eight types of adversarial attacks compared with other detection baselines. For reconstruction, DRAM improves the robust accuracy by 6% ~ 41% for Standard ResNet50 and 3% ~ 8% for Robust ResNet50 compared with other self-supervision tasks, such as rotation prediction and contrastive learning.  ( 2 min )
    Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective. (arXiv:2303.13299v1 [cs.LG])
    As neural networks increasingly make critical decisions in high-stakes settings, monitoring and explaining their behavior in an understandable and trustworthy manner is a necessity. One commonly used type of explainer is post hoc feature attribution, a family of methods for giving each feature in an input a score corresponding to its influence on a model's output. A major limitation of this family of explainers in practice is that they can disagree on which features are more important than others. Our contribution in this paper is a method of training models with this disagreement problem in mind. We do this by introducing a Post hoc Explainer Agreement Regularization (PEAR) loss term alongside the standard term corresponding to accuracy, an additional term that measures the difference in feature attribution between a pair of explainers. We observe on three datasets that we can train a model with this loss term to improve explanation consensus on unseen data, and see improved consensus between explainers other than those used in the loss term. We examine the trade-off between improved consensus and model performance. And finally, we study the influence our method has on feature attribution explanations.
    Adversarially Contrastive Estimation of Conditional Neural Processes. (arXiv:2303.13004v1 [cs.LG])
    Conditional Neural Processes~(CNPs) formulate distributions over functions and generate function observations with exact conditional likelihoods. CNPs, however, have limited expressivity for high-dimensional observations, since their predictive distribution is factorized into a product of unconstrained (typically) Gaussian outputs. Previously, this could be handled using latent variables or autoregressive likelihood, but at the expense of intractable training and quadratically increased complexity. Instead, we propose calibrating CNPs with an adversarial training scheme besides regular maximum likelihood estimates. Specifically, we train an energy-based model (EBM) with noise contrastive estimation, which enforces EBM to identify true observations from the generations of CNP. In this way, CNP must generate predictions closer to the ground-truth to fool EBM, instead of merely optimizing with respect to the fixed-form likelihood. From generative function reconstruction to downstream regression and classification tasks, we demonstrate that our method fits mainstream CNP members, showing effectiveness when unconstrained Gaussian likelihood is defined, requiring minimal computation overhead while preserving foundation properties of CNPs.
    Box-Level Active Detection. (arXiv:2303.13089v1 [cs.CV])
    Active learning selects informative samples for annotation within budget, which has proven efficient recently on object detection. However, the widely used active detection benchmarks conduct image-level evaluation, which is unrealistic in human workload estimation and biased towards crowded images. Furthermore, existing methods still perform image-level annotation, but equally scoring all targets within the same image incurs waste of budget and redundant labels. Having revealed above problems and limitations, we introduce a box-level active detection framework that controls a box-based budget per cycle, prioritizes informative targets and avoids redundancy for fair comparison and efficient application. Under the proposed box-level setting, we devise a novel pipeline, namely Complementary Pseudo Active Strategy (ComPAS). It exploits both human annotations and the model intelligence in a complementary fashion: an efficient input-end committee queries labels for informative objects only; meantime well-learned targets are identified by the model and compensated with pseudo-labels. ComPAS consistently outperforms 10 competitors under 4 settings in a unified codebase. With supervision from labeled data only, it achieves 100% supervised performance of VOC0712 with merely 19% box annotations. On the COCO dataset, it yields up to 4.3% mAP improvement over the second-best method. ComPAS also supports training with the unlabeled pool, where it surpasses 90% COCO supervised performance with 85% label reduction. Our source code is publicly available at https://github.com/lyumengyao/blad.
    A Survey of Historical Learning: Learning Models with Learning History. (arXiv:2303.12992v1 [cs.LG])
    New knowledge originates from the old. The various types of elements, deposited in the training history, are a large amount of wealth for improving learning deep models. In this survey, we comprehensively review and summarize the topic--``Historical Learning: Learning Models with Learning History'', which learns better neural models with the help of their learning history during its optimization, from three detailed aspects: Historical Type (what), Functional Part (where) and Storage Form (how). To our best knowledge, it is the first survey that systematically studies the methodologies which make use of various historical statistics when training deep neural networks. The discussions with related topics like recurrent/memory networks, ensemble learning, and reinforcement learning are demonstrated. We also expose future challenges of this topic and encourage the community to pay attention to the think of historical learning principles when designing algorithms. The paper list related to historical learning is available at \url{https://github.com/Martinser/Awesome-Historical-Learning.}
    Langevin Diffusion Variational Inference. (arXiv:2208.07743v2 [cs.LG] UPDATED)
    Many methods that build powerful variational distributions based on unadjusted Langevin transitions exist. Most of these were developed using a wide range of different approaches and techniques. Unfortunately, the lack of a unified analysis and derivation makes developing new methods and reasoning about existing ones a challenging task. We address this giving a single analysis that unifies and generalizes these existing techniques. The main idea is to augment the target and variational by numerically simulating the underdamped Langevin diffusion process and its time reversal. The benefits of this approach are twofold: it provides a unified formulation for many existing methods, and it simplifies the development of new ones. In fact, using our formulation we propose a new method that combines the strengths of previously existing algorithms; it uses underdamped Langevin transitions and powerful augmentations parameterized by a score network. Our empirical evaluation shows that our proposed method consistently outperforms relevant baselines in a wide range of tasks.
    Student Engagement Detection Using Emotion Analysis, Eye Tracking and Head Movement with Machine Learning. (arXiv:1909.12913v5 [cs.CV] UPDATED)
    With the increase of distance learning, in general, and e-learning, in particular, having a system capable of determining the engagement of students is of primordial importance, and one of the biggest challenges, both for teachers, researchers and policy makers. Here, we present a system to detect the engagement level of the students. It uses only information provided by the typical built-in web-camera present in a laptop computer, and was designed to work in real time. We combine information about the movements of the eyes and head, and facial emotions to produce a concentration index with three classes of engagement: "very engaged", "nominally engaged" and "not engaged at all". The system was tested in a typical e-learning scenario, and the results show that it correctly identifies each period of time where students were "very engaged", "nominally engaged" and "not engaged at all". Additionally, the results also show that the students with best scores also have higher concentration indexes.  ( 2 min )
    Convex mixed-integer optimization with Frank-Wolfe methods. (arXiv:2208.11010v5 [math.OC] UPDATED)
    Mixed-integer nonlinear optimization encompasses a broad class of problems that present both theoretical and computational challenges. We propose a new type of method to solve these problems based on a branch-and-bound algorithm with convex node relaxations. These relaxations are solved with a Frank-Wolfe algorithm over the convex hull of mixed-integer feasible points instead of the continuous relaxation via calls to a mixed-integer linear solver as the linear oracle. The proposed method computes feasible solutions while working on a single representation of the polyhedral constraints, leveraging the full extent of mixed-integer linear solvers without an outer approximation scheme and can exploit inexact solutions of node subproblems.
    Deep Generative Multi-Agent Imitation Model as a Computational Benchmark for Evaluating Human Performance in Complex Interactive Tasks: A Case Study in Football. (arXiv:2303.13323v1 [stat.ML])
    Evaluating the performance of human is a common need across many applications, such as in engineering and sports. When evaluating human performance in completing complex and interactive tasks, the most common way is to use a metric having been proved efficient for that context, or to use subjective measurement techniques. However, this can be an error prone and unreliable process since static metrics cannot capture all the complex contexts associated with such tasks and biases exist in subjective measurement. The objective of our research is to create data-driven AI agents as computational benchmarks to evaluate human performance in solving difficult tasks involving multiple humans and contextual factors. We demonstrate this within the context of football performance analysis. We train a generative model based on Conditional Variational Recurrent Neural Network (VRNN) Model on a large player and ball tracking dataset. The trained model is used to imitate the interactions between two teams and predict the performance from each team. Then the trained Conditional VRNN Model is used as a benchmark to evaluate team performance. The experimental results on Premier League football dataset demonstrates the usefulness of our method to existing state-of-the-art static metric used in football analytics.
    Realization of Causal Representation Learning to Adjust Confounding Bias in Latent Space. (arXiv:2211.08573v5 [cs.LG] UPDATED)
    Causal DAGs(Directed Acyclic Graphs) are usually considered in a 2D plane. Edges indicate causal effects' directions and imply their corresponding time-passings. Due to the natural restriction of statistical models, effect estimation is usually approximated by averaging the individuals' correlations, i.e., observational changes over a specific time. However, in the context of Machine Learning on large-scale questions with complex DAGs, such slight biases can snowball to distort global models - More importantly, it has practically impeded the development of AI, for instance, the weak generalizability of causal models. In this paper, we redefine causal DAG as \emph{do-DAG}, in which variables' values are no longer time-stamp-dependent, and timelines can be seen as axes. By geometric explanation of multi-dimensional do-DAG, we identify the \emph{Causal Representation Bias} and its necessary factors, differentiated from common confounding biases. Accordingly, a DL(Deep Learning)-based framework will be proposed as the general solution, along with a realization method and experiments to verify its feasibility.  ( 2 min )
    A dynamic risk score for early prediction of cardiogenic shock using machine learning. (arXiv:2303.12888v1 [cs.LG])
    Myocardial infarction and heart failure are major cardiovascular diseases that affect millions of people in the US. The morbidity and mortality are highest among patients who develop cardiogenic shock. Early recognition of cardiogenic shock is critical. Prompt implementation of treatment measures can prevent the deleterious spiral of ischemia, low blood pressure, and reduced cardiac output due to cardiogenic shock. However, early identification of cardiogenic shock has been challenging due to human providers' inability to process the enormous amount of data in the cardiac intensive care unit (ICU) and lack of an effective risk stratification tool. We developed a deep learning-based risk stratification tool, called CShock, for patients admitted into the cardiac ICU with acute decompensated heart failure and/or myocardial infarction to predict onset of cardiogenic shock. To develop and validate CShock, we annotated cardiac ICU datasets with physician adjudicated outcomes. CShock achieved an area under the receiver operator characteristic curve (AUROC) of 0.820, which substantially outperformed CardShock (AUROC 0.519), a well-established risk score for cardiogenic shock prognosis. CShock was externally validated in an independent patient cohort and achieved an AUROC of 0.800, demonstrating its generalizability in other cardiac ICUs.  ( 2 min )
    Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes. (arXiv:2110.15332v2 [cs.LG] UPDATED)
    In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors, inducing confounding and biasing estimates derived under the assumption of a perfect Markov decision process (MDP) model. Here we tackle this by considering off-policy evaluation in a partially observed MDP (POMDP). Specifically, we consider estimating the value of a given target policy in a POMDP given trajectories with only partial state observations generated by a different and unknown policy that may depend on the unobserved state. We tackle two questions: what conditions allow us to identify the target policy value from the observed data and, given identification, how to best estimate it. To answer these, we extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible by the existence of so-called bridge functions. We then show how to construct semiparametrically efficient estimators in these settings. We term the resulting framework proximal reinforcement learning (PRL). We demonstrate the benefits of PRL in an extensive simulation study and on the problem of sepsis management.  ( 2 min )
    Focusing on Potential Named Entities During Active Label Acquisition. (arXiv:2111.03837v2 [cs.CL] UPDATED)
    Named entity recognition (NER) aims to identify mentions of named entities in an unstructured text and classify them into predefined named entity classes. While deep learning-based pre-trained language models help to achieve good predictive performances in NER, many domain-specific NER applications still call for a substantial amount of labeled data. Active learning (AL), a general framework for the label acquisition problem, has been used for NER tasks to minimize the annotation cost without sacrificing model performance. However, the heavily imbalanced class distribution of tokens introduces challenges in designing effective AL querying methods for NER. We propose several AL sentence query evaluation functions that pay more attention to potential positive tokens, and evaluate these proposed functions with both sentence-based and token-based cost evaluation strategies. We also propose a better data-driven normalization approach to penalize sentences that are too long or too short. Our experiments on three datasets from different domains reveal that the proposed approach reduces the number of annotated tokens while achieving better or comparable prediction performance with conventional methods.  ( 2 min )
    Variantional autoencoder with decremental information bottleneck for disentanglement. (arXiv:2303.12959v1 [cs.LG])
    One major challenge of disentanglement learning with variational autoencoders is the trade-off between disentanglement and reconstruction fidelity. Previous incremental methods with only on latent space cannot optimize these two targets simultaneously, so they expand the Information Bottleneck while training to {optimize from disentanglement to reconstruction. However, a large bottleneck will lose the constraint of disentanglement, causing the information diffusion problem. To tackle this issue, we present a novel decremental variational autoencoder with disentanglement-invariant transformations to optimize multiple objectives in different layers, termed DeVAE, for balancing disentanglement and reconstruction fidelity by decreasing the information bottleneck of diverse latent spaces gradually. Benefiting from the multiple latent spaces, DeVAE allows simultaneous optimization of multiple objectives to optimize reconstruction while keeping the constraint of disentanglement, avoiding information diffusion. DeVAE is also compatible with large models with high-dimension latent space. Experimental results on dSprites and Shapes3D that DeVAE achieves \fix{R2q6}{a good balance between disentanglement and reconstruction.DeVAE shows high tolerant of hyperparameters and on high-dimensional latent spaces.  ( 2 min )
    Planning Goals for Exploration. (arXiv:2303.13002v1 [cs.LG])
    Dropped into an unknown environment, what should an agent do to quickly learn about the environment and how to accomplish diverse tasks within it? We address this question within the goal-conditioned reinforcement learning paradigm, by identifying how the agent should set its goals at training time to maximize exploration. We propose "Planning Exploratory Goals" (PEG), a method that sets goals for each training episode to directly optimize an intrinsic exploration reward. PEG first chooses goal commands such that the agent's goal-conditioned policy, at its current level of training, will end up in states with high exploration potential. It then launches an exploration policy starting at those promising states. To enable this direct optimization, PEG learns world models and adapts sampling-based planning algorithms to "plan goal commands". In challenging simulated robotics environments including a multi-legged ant robot in a maze, and a robot arm on a cluttered tabletop, PEG exploration enables more efficient and effective training of goal-conditioned policies relative to baselines and ablations. Our ant successfully navigates a long maze, and the robot arm successfully builds a stack of three blocks upon command. Website: https://penn-pal-lab.github.io/peg/  ( 2 min )
    Symmetries, flat minima, and the conserved quantities of gradient flow. (arXiv:2210.17216v2 [cs.LG] UPDATED)
    Empirical studies of the loss landscape of deep networks have revealed that many local minima are connected through low-loss valleys. Yet, little is known about the theoretical origin of such valleys. We present a general framework for finding continuous symmetries in the parameter space, which carve out low-loss valleys. Our framework uses equivariances of the activation functions and can be applied to different layer architectures. To generalize this framework to nonlinear neural networks, we introduce a novel set of nonlinear, data-dependent symmetries. These symmetries can transform a trained model such that it performs similarly on new samples, which allows ensemble building that improves robustness under certain adversarial attacks. We then show that conserved quantities associated with linear symmetries can be used to define coordinates along low-loss valleys. The conserved quantities help reveal that using common initialization methods, gradient flow only explores a small part of the global minimum. By relating conserved quantities to convergence rate and sharpness of the minimum, we provide insights on how initialization impacts convergence and generalizability.  ( 2 min )
    Requirement Formalisation using Natural Language Processing and Machine Learning: A Systematic Review. (arXiv:2303.13365v1 [cs.CL])
    Improvement of software development methodologies attracts developers to automatic Requirement Formalisation (RF) in the Requirement Engineering (RE) field. The potential advantages by applying Natural Language Processing (NLP) and Machine Learning (ML) in reducing the ambiguity and incompleteness of requirement written in natural languages is reported in different studies. The goal of this paper is to survey and classify existing work on NLP and ML for RF, identifying challenges in this domain and providing promising future research directions. To achieve this, we conducted a systematic literature review to outline the current state-of-the-art of NLP and ML techniques in RF by selecting 257 papers from common used libraries. The search result is filtered by defining inclusion and exclusion criteria and 47 relevant studies between 2012 and 2022 are selected. We found that heuristic NLP approaches are the most common NLP techniques used for automatic RF, primary operating on structured and semi-structured data. This study also revealed that Deep Learning (DL) technique are not widely used, instead classical ML techniques are predominant in the surveyed studies. More importantly, we identified the difficulty of comparing the performance of different approaches due to the lack of standard benchmark cases for RF.  ( 2 min )
    Feature Reduction Method Comparison Towards Explainability and Efficiency in Cybersecurity Intrusion Detection Systems. (arXiv:2303.12891v1 [cs.LG])
    In the realm of cybersecurity, intrusion detection systems (IDS) detect and prevent attacks based on collected computer and network data. In recent research, IDS models have been constructed using machine learning (ML) and deep learning (DL) methods such as Random Forest (RF) and deep neural networks (DNN). Feature selection (FS) can be used to construct faster, more interpretable, and more accurate models. We look at three different FS techniques; RF information gain (RF-IG), correlation feature selection using the Bat Algorithm (CFS-BA), and CFS using the Aquila Optimizer (CFS-AO). Our results show CFS-BA to be the most efficient of the FS methods, building in 55% of the time of the best RF-IG model while achieving 99.99% of its accuracy. This reinforces prior contributions attesting to CFS-BA's accuracy while building upon the relationship between subset size, CFS score, and RF-IG score in final results.  ( 2 min )
    Evaluating the Robustness of Deep Reinforcement Learning for Autonomous Policies in a Multi-agent Urban Driving Environment. (arXiv:2112.11947v3 [cs.AI] UPDATED)
    Deep reinforcement learning is actively used for training autonomous car policies in a simulated driving environment. Due to the large availability of various reinforcement learning algorithms and the lack of their systematic comparison across different driving scenarios, we are unsure of which ones are more effective for training autonomous car software in single-agent as well as multi-agent driving environments. A benchmarking framework for the comparison of deep reinforcement learning in a vision-based autonomous driving will open up the possibilities for training better autonomous car driving policies. To address these challenges, we provide an open and reusable benchmarking framework for systematic evaluation and comparative analysis of deep reinforcement learning algorithms for autonomous driving in a single- and multi-agent environment. Using the framework, we perform a comparative study of discrete and continuous action space deep reinforcement learning algorithms. We also propose a comprehensive multi-objective reward function designed for the evaluation of deep reinforcement learning-based autonomous driving agents. We run the experiments in a vision-only high-fidelity urban driving simulated environments. The results indicate that only some of the deep reinforcement learning algorithms perform consistently better across single and multi-agent scenarios when trained in various multi-agent-only environment settings. For example, A3C- and TD3-based autonomous cars perform comparatively better in terms of more robust actions and minimal driving errors in both single and multi-agent scenarios. We conclude that different deep reinforcement learning algorithms exhibit different driving and testing performance in different scenarios, which underlines the need for their systematic comparative analysis. The benchmarking framework proposed in this paper facilitates such a comparison.  ( 3 min )
    Towards Better Dynamic Graph Learning: New Architecture and Unified Library. (arXiv:2303.13047v1 [cs.LG])
    We propose DyGFormer, a new Transformer-based architecture for dynamic graph learning that solely learns from the sequences of nodes' historical first-hop interactions. DyGFormer incorporates two distinct designs: a neighbor co-occurrence encoding scheme that explores the correlations of the source node and destination node based on their sequences; a patching technique that divides each sequence into multiple patches and feeds them to Transformer, allowing the model to effectively and efficiently benefit from longer histories. We also introduce DyGLib, a unified library with standard training pipelines, extensible coding interfaces, and comprehensive evaluating protocols to promote reproducible, scalable, and credible dynamic graph learning research. By performing extensive experiments on thirteen datasets from various domains for transductive/inductive dynamic link prediction and dynamic node classification tasks, we observe that: DyGFormer achieves state-of-the-art performance on most of the datasets, demonstrating the effectiveness of capturing nodes' correlations and long-term temporal dependencies; the results of baselines vary across different datasets and some findings are inconsistent with previous reports, which may be caused by their diverse pipelines and problematic implementations. We hope our work can provide new insights and facilitate the development of the dynamic graph learning field. All the resources including datasets, data loaders, algorithms, and executing scripts are publicly available at https://github.com/yule-BUAA/DyGLib.  ( 2 min )
    Learning Subgrid-scale Models with Neural Ordinary Differential Equations. (arXiv:2212.09967v2 [math.NA] UPDATED)
    We propose a new approach to learning the subgrid-scale model when simulating partial differential equations (PDEs) solved by the method of lines and their representation in chaotic ordinary differential equations, based on neural ordinary differential equations (NODEs). Solving systems with fine temporal and spatial grid scales is an ongoing computational challenge, and closure models are generally difficult to tune. Machine learning approaches have increased the accuracy and efficiency of computational fluid dynamics solvers. In this approach neural networks are used to learn the coarse- to fine-grid map, which can be viewed as subgrid-scale parameterization. We propose a strategy that uses the NODE and partial knowledge to learn the source dynamics at a continuous level. Our method inherits the advantages of NODEs and can be used to parameterize subgrid scales, approximate coupling operators, and improve the efficiency of low-order solvers. Numerical results with the two-scale Lorenz 96 ODE, the convection-diffusion PDE, and the viscous Burgers' PDE are used to illustrate this approach.  ( 2 min )
    A Data Augmentation Method and the Embedding Mechanism for Detection and Classification of Pulmonary Nodules on Small Samples. (arXiv:2303.12801v1 [eess.IV])
    Detection of pulmonary nodules by CT is used for screening lung cancer in early stages.omputer aided diagnosis (CAD) based on deep-learning method can identify the suspected areas of pulmonary nodules in CT images, thus improving the accuracy and efficiency of CT diagnosis. The accuracy and robustness of deep learning models. Method:In this paper, we explore (1) the data augmentation method based on the generation model and (2) the model structure improvement method based on the embedding mechanism. Two strategies have been introduced in this study: a new data augmentation method and a embedding mechanism. In the augmentation method, a 3D pixel-level statistics algorithm is proposed to generate pulmonary nodule and by combing the faked pulmonary nodule and healthy lung, we generate new pulmonary nodule samples. The embedding mechanism are designed to better understand the meaning of pixels of the pulmonary nodule samples by introducing hidden variables. Result: The result of the 3DVNET model with the augmentation method for pulmonary nodule detection shows that the proposed data augmentation method outperforms the method based on generative adversarial network (GAN) framework, training accuracy improved by 1.5%, and with embedding mechanism for pulmonary nodules classification shows that the embedding mechanism improves the accuracy and robustness for the classification of pulmonary nodules obviously, the model training accuracy is close to 1 and the model testing F1-score is 0.90.Conclusion:he proposed data augmentation method and embedding mechanism are beneficial to improve the accuracy and robustness of the model, and can be further applied in other common diagnostic imaging tasks.  ( 3 min )
    Three ways to improve feature alignment for open vocabulary detection. (arXiv:2303.13518v1 [cs.CV])
    The core problem in zero-shot open vocabulary detection is how to align visual and text features, so that the detector performs well on unseen classes. Previous approaches train the feature pyramid and detection head from scratch, which breaks the vision-text feature alignment established during pretraining, and struggles to prevent the language model from forgetting unseen classes. We propose three methods to alleviate these issues. Firstly, a simple scheme is used to augment the text embeddings which prevents overfitting to a small number of classes seen during training, while simultaneously saving memory and computation. Secondly, the feature pyramid network and the detection head are modified to include trainable gated shortcuts, which encourages vision-text feature alignment and guarantees it at the start of detection training. Finally, a self-training approach is used to leverage a larger corpus of image-text pairs thus improving detection performance on classes with no human annotated bounding boxes. Our three methods are evaluated on the zero-shot version of the LVIS benchmark, each of them showing clear and significant benefits. Our final network achieves the new stateof-the-art on the mAP-all metric and demonstrates competitive performance for mAP-rare, as well as superior transfer to COCO and Objects365.  ( 2 min )
    Normalizing Flows for Interventional Density Estimation. (arXiv:2209.06203v4 [cs.LG] UPDATED)
    Existing machine learning methods for causal inference usually estimate quantities expressed via the mean of potential outcomes (e.g., average treatment effect). However, such quantities do not capture the full information about the distribution of potential outcomes. In this work, we estimate the density of potential outcomes after interventions from observational data. For this, we propose a novel, fully-parametric deep learning method called Interventional Normalizing Flows. Specifically, we combine two normalizing flows, namely (i) a nuisance flow for estimating nuisance parameters and (ii) a target flow for a parametric estimation of the density of potential outcomes. We further develop a tractable optimization objective based on a one-step bias correction for an efficient and doubly robust estimation of the target flow parameters. As a result our Interventional Normalizing Flows offer a properly normalized density estimator. Across various experiments, we demonstrate that our Interventional Normalizing Flows are expressive and highly effective, and scale well with both sample size and high-dimensional confounding. To the best of our knowledge, our Interventional Normalizing Flows are the first proper fully-parametric, deep learning method for density estimation of potential outcomes.  ( 2 min )
    Faith-Shap: The Faithful Shapley Interaction Index. (arXiv:2203.00870v3 [cs.LG] UPDATED)
    Shapley values, which were originally designed to assign attributions to individual players in coalition games, have become a commonly used approach in explainable machine learning to provide attributions to input features for black-box machine learning models. A key attraction of Shapley values is that they uniquely satisfy a very natural set of axiomatic properties. However, extending the Shapley value to assigning attributions to interactions rather than individual players, an interaction index, is non-trivial: as the natural set of axioms for the original Shapley values, extended to the context of interactions, no longer specify a unique interaction index. Many proposals thus introduce additional less ''natural'' axioms, while sacrificing the key axiom of efficiency, in order to obtain unique interaction indices. In this work, rather than introduce additional conflicting axioms, we adopt the viewpoint of Shapley values as coefficients of the most faithful linear approximation to the pseudo-Boolean coalition game value function. By extending linear to $\ell$-order polynomial approximations, we can then define the general family of faithful interaction indices. We show that by additionally requiring the faithful interaction indices to satisfy interaction-extensions of the standard individual Shapley axioms (dummy, symmetry, linearity, and efficiency), we obtain a unique Faithful Shapley Interaction index, which we denote Faith-Shap, as a natural generalization of the Shapley value to interactions. We then provide some illustrative contrasts of Faith-Shap with previously proposed interaction indices, and further investigate some of its interesting algebraic properties. We further show the computational efficiency of computing Faith-Shap, together with some additional qualitative insights, via some illustrative experiments.  ( 3 min )
    Failure-tolerant Distributed Learning for Anomaly Detection in Wireless Networks. (arXiv:2303.13015v1 [cs.LG])
    The analysis of distributed techniques is often focused upon their efficiency, without considering their robustness (or lack thereof). Such a consideration is particularly important when devices or central servers can fail, which can potentially cripple distributed systems. When such failures arise in wireless communications networks, important services that they use/provide (like anomaly detection) can be left inoperable and can result in a cascade of security problems. In this paper, we present a novel method to address these risks by combining both flat- and star-topologies, combining the performance and reliability benefits of both. We refer to this method as "Tol-FL", due to its increased failure-tolerance as compared to the technique of Federated Learning. Our approach both limits device failure risks while outperforming prior methods by up to 8% in terms of anomaly detection AUROC in a range of realistic settings that consider client as well as server failure, all while reducing communication costs. This performance demonstrates that Tol-FL is a highly suitable method for distributed model training for anomaly detection, especially in the domain of wireless networks.
    FS-Real: Towards Real-World Cross-Device Federated Learning. (arXiv:2303.13363v1 [cs.LG])
    Federated Learning (FL) aims to train high-quality models in collaboration with distributed clients while not uploading their local data, which attracts increasing attention in both academia and industry. However, there is still a considerable gap between the flourishing FL research and real-world scenarios, mainly caused by the characteristics of heterogeneous devices and its scales. Most existing works conduct evaluations with homogeneous devices, which are mismatched with the diversity and variability of heterogeneous devices in real-world scenarios. Moreover, it is challenging to conduct research and development at scale with heterogeneous devices due to limited resources and complex software stacks. These two key factors are important yet underexplored in FL research as they directly impact the FL training dynamics and final performance, making the effectiveness and usability of FL algorithms unclear. To bridge the gap, in this paper, we propose an efficient and scalable prototyping system for real-world cross-device FL, FS-Real. It supports heterogeneous device runtime, contains parallelism and robustness enhanced FL server, and provides implementations and extensibility for advanced FL utility features such as personalization, communication compression and asynchronous aggregation. To demonstrate the usability and efficiency of FS-Real, we conduct extensive experiments with various device distributions, quantify and analyze the effect of the heterogeneous device and various scales, and further provide insights and open discussions about real-world FL scenarios. Our system is released to help to pave the way for further real-world FL research and broad applications involving diverse devices and scales.
    Self-Supervised Clustering of Multivariate Time-Series Data for Identifying TBI Physiological States. (arXiv:2303.13024v1 [cs.LG])
    Determining clinically relevant physiological states from multivariate time series data with missing values is essential for providing appropriate treatment for acute conditions such as Traumatic Brain Injury (TBI), respiratory failure, and heart failure. Utilizing non-temporal clustering or data imputation and aggregation techniques may lead to loss of valuable information and biased analyses. In our study, we apply the SLAC-Time algorithm, an innovative self-supervision-based approach that maintains data integrity by avoiding imputation or aggregation, offering a more useful representation of acute patient states. By using SLAC-Time to cluster data in a large research dataset, we identified three distinct TBI physiological states and their specific feature profiles. We employed various clustering evaluation metrics and incorporated input from a clinical domain expert to validate and interpret the identified physiological states. Further, we discovered how specific clinical events and interventions can influence patient states and state transitions.
    Leveraging Multi-time Hamilton-Jacobi PDEs for Certain Scientific Machine Learning Problems. (arXiv:2303.12928v1 [cs.LG])
    Hamilton-Jacobi partial differential equations (HJ PDEs) have deep connections with a wide range of fields, including optimal control, differential games, and imaging sciences. By considering the time variable to be a higher dimensional quantity, HJ PDEs can be extended to the multi-time case. In this paper, we establish a novel theoretical connection between specific optimization problems arising in machine learning and the multi-time Hopf formula, which corresponds to a representation of the solution to certain multi-time HJ PDEs. Through this connection, we increase the interpretability of the training process of certain machine learning applications by showing that when we solve these learning problems, we also solve a multi-time HJ PDE and, by extension, its corresponding optimal control problem. As a first exploration of this connection, we develop the relation between the regularized linear regression problem and the Linear Quadratic Regulator (LQR). We then leverage our theoretical connection to adapt standard LQR solvers (namely, those based on the Riccati ordinary differential equations) to design new training approaches for machine learning. Finally, we provide some numerical examples that demonstrate the versatility and possible computational advantages of our Riccati-based approach in the context of continual learning, post-training calibration, transfer learning, and sparse dynamics identification.
    Optimization and Optimizers for Adversarial Robustness. (arXiv:2303.13401v1 [cs.LG])
    Empirical robustness evaluation (RE) of deep learning models against adversarial perturbations entails solving nontrivial constrained optimization problems. Existing numerical algorithms that are commonly used to solve them in practice predominantly rely on projected gradient, and mostly handle perturbations modeled by the $\ell_1$, $\ell_2$ and $\ell_\infty$ distances. In this paper, we introduce a novel algorithmic framework that blends a general-purpose constrained-optimization solver PyGRANSO with Constraint Folding (PWCF), which can add more reliability and generality to the state-of-the-art RE packages, e.g., AutoAttack. Regarding reliability, PWCF provides solutions with stationarity measures and feasibility tests to assess the solution quality. For generality, PWCF can handle perturbation models that are typically inaccessible to the existing projected gradient methods; the main requirement is the distance metric to be almost everywhere differentiable. Taking advantage of PWCF and other existing numerical algorithms, we further explore the distinct patterns in the solutions found for solving these optimization problems using various combinations of losses, perturbation models, and optimization algorithms. We then discuss the implications of these patterns on the current robustness evaluation and adversarial training.  ( 2 min )
    ENVIDR: Implicit Differentiable Renderer with Neural Environment Lighting. (arXiv:2303.13022v1 [cs.CV])
    Recent advances in neural rendering have shown great potential for reconstructing scenes from multiview images. However, accurately representing objects with glossy surfaces remains a challenge for existing methods. In this work, we introduce ENVIDR, a rendering and modeling framework for high-quality rendering and reconstruction of surfaces with challenging specular reflections. To achieve this, we first propose a novel neural renderer with decomposed rendering components to learn the interaction between surface and environment lighting. This renderer is trained using existing physically based renderers and is decoupled from actual scene representations. We then propose an SDF-based neural surface model that leverages this learned neural renderer to represent general scenes. Our model additionally synthesizes indirect illuminations caused by inter-reflections from shiny surfaces by marching surface-reflected rays. We demonstrate that our method outperforms state-of-art methods on challenging shiny scenes, providing high-quality rendering of specular reflections while also enabling material editing and scene relighting.  ( 2 min )
    Interpersonal Distance Tracking with mmWave Radar and IMUs. (arXiv:2303.12798v1 [cs.NI])
    Tracking interpersonal distances is essential for real-time social distancing management and {\em ex-post} contact tracing to prevent spreads of contagious diseases. Bluetooth neighbor discovery has been employed for such purposes in combating COVID-19, but does not provide satisfactory spatiotemporal resolutions. This paper presents ImmTrack, a system that uses a millimeter wave radar and exploits the inertial measurement data from user-carried smartphones or wearables to track interpersonal distances. By matching the movement traces reconstructed from the radar and inertial data, the pseudo identities of the inertial data can be transferred to the radar sensing results in the global coordinate system. The re-identified, radar-sensed movement trajectories are then used to track interpersonal distances. In a broader sense, ImmTrack is the first system that fuses data from millimeter wave radar and inertial measurement units for simultaneous user tracking and re-identification. Evaluation with up to 27 people in various indoor/outdoor environments shows ImmTrack's decimeters-seconds spatiotemporal accuracy in contact tracing, which is similar to that of the privacy-intrusive camera surveillance and significantly outperforms the Bluetooth neighbor discovery approach.  ( 2 min )
    Interpreting learning in biological neural networks as zero-order optimization method. (arXiv:2301.11777v2 [cs.LG] UPDATED)
    Recently, significant progress has been made regarding the statistical understanding of artificial neural networks (ANNs). ANNs are motivated by the functioning of the brain, but differ in several crucial aspects. In particular, the locality in the updating rule of the connection parameters in biological neural networks (BNNs) makes it biologically implausible that the learning of the brain is based on gradient descent. In this work, we look at the brain as a statistical method for supervised learning. The main contribution is to relate the local updating rule of the connection parameters in BNNs to a zero-order optimization method. It is shown that the expected values of the iterates implement a modification of gradient descent.  ( 2 min )
    The Quantization Model of Neural Scaling. (arXiv:2303.13506v1 [cs.LG])
    We propose the $\textit{Quantization Model}$ of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. We derive this model from what we call the $\textit{Quantization Hypothesis}$, where learned network capabilities are quantized into discrete chunks ($\textit{quanta}$). We show that when quanta are learned in order of decreasing use frequency, then a power law in use frequencies explains observed power law scaling of loss. We validate this prediction on toy datasets, then study how scaling curves decompose for large language models. Using language model internals, we auto-discover diverse model capabilities (quanta) and find tentative evidence that the distribution over corresponding subproblems in the prediction of natural text is compatible with the power law predicted from the neural scaling exponent as predicted from our theory.  ( 2 min )
    Compositional Zero-Shot Domain Transfer with Text-to-Text Models. (arXiv:2303.13386v1 [cs.CL])
    Label scarcity is a bottleneck for improving task performance in specialised domains. We propose a novel compositional transfer learning framework (DoT5 - domain compositional zero-shot T5) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from MLM of unlabelled in-domain free text) and task knowledge (from task training on more readily available general-domain data) in a multi-task manner. To improve the transferability of task training, we design a strategy named NLGU: we simultaneously train NLG for in-domain label-to-data generation which enables data augmentation for self-finetuning and NLU for label prediction. We evaluate DoT5 on the biomedical domain and the resource-lean subdomain of radiology, focusing on NLI, text summarisation and embedding learning. DoT5 demonstrates the effectiveness of compositional transfer learning through multi-task learning. In particular, DoT5 outperforms the current SOTA in zero-shot transfer by over 7 absolute points in accuracy on RadNLI. We validate DoT5 with ablations and a case study demonstrating its ability to solve challenging NLI examples requiring in-domain expertise.  ( 2 min )
    Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization. (arXiv:2301.07784v2 [cs.LG] UPDATED)
    Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences over (possibly conflicting) reward functions. Such algorithms often learn a set of policies (each optimized for a particular agent preference) that can later be used to solve problems with novel preferences. We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes that improve sample-efficient learning. They implement active-learning strategies by which the agent can (i) identify the most promising preferences/objectives to train on at each moment, to more rapidly solve a given MORL problem; and (ii) identify which previous experiences are most relevant when learning a policy for a particular agent preference, via a novel Dyna-style MORL method. We prove our algorithm is guaranteed to always converge to an optimal solution in a finite number of steps, or an $\epsilon$-optimal solution (for a bounded $\epsilon$) if the agent is limited and can only identify possibly sub-optimal policies. We also prove that our method monotonically improves the quality of its partial solutions while learning. Finally, we introduce a bound that characterizes the maximum utility loss (with respect to the optimal solution) incurred by the partial solutions computed by our method throughout learning. We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks, both with discrete and continuous state and action spaces.  ( 3 min )
    CroSel: Cross Selection of Confident Pseudo Labels for Partial-Label Learning. (arXiv:2303.10365v2 [cs.LG] UPDATED)
    Partial-label learning (PLL) is an important weakly supervised learning problem, which allows each training example to have a candidate label set instead of a single ground-truth label. Identification-based methods have been widely explored to tackle label ambiguity issues in PLL, which regard the true label as a latent variable to be identified. However, identifying the true labels accurately and completely remains challenging, causing noise in pseudo labels during model training. In this paper, we propose a new method called CroSel, which leverages historical prediction information from models to identify true labels for most training examples. First, we introduce a cross selection strategy, which enables two deep models to select true labels of partially labeled data for each other. Besides, we propose a novel consistent regularization term called co-mix to avoid sample waste and tiny noise caused by false selection. In this way, CroSel can pick out the true labels of most examples with high precision. Extensive experiments demonstrate the superiority of CroSel, which consistently outperforms previous state-of-the-art methods on benchmark datasets. Additionally, our method achieves over 90\% accuracy and quantity for selecting true labels on CIFAR-type datasets under various settings.  ( 2 min )
    Omnigrok: Grokking Beyond Algorithmic Data. (arXiv:2210.01117v2 [cs.LG] UPDATED)
    Grokking, the unusual phenomenon for algorithmic datasets where generalization happens long after overfitting the training data, has remained elusive. We aim to understand grokking by analyzing the loss landscapes of neural networks, identifying the mismatch between training and test losses as the cause for grokking. We refer to this as the "LU mechanism" because training and test losses (against model weight norm) typically resemble "L" and "U", respectively. This simple mechanism can nicely explain many aspects of grokking: data size dependence, weight decay dependence, the emergence of representations, etc. Guided by the intuitive picture, we are able to induce grokking on tasks involving images, language and molecules. In the reverse direction, we are able to eliminate grokking for algorithmic datasets. We attribute the dramatic nature of grokking for algorithmic datasets to representation learning.  ( 2 min )
    Predicting the performance of hybrid ventilation in buildings using a multivariate attention-based biLSTM Encoder-Decoder neural network. (arXiv:2302.04126v2 [cs.LG] UPDATED)
    Hybrid ventilation is an energy-efficient solution to provide fresh air for most climates, given that it has a reliable control system. To operate such systems optimally, a high-fidelity control-oriented modesl is required. It should enable near-real time forecast of the indoor air temperature based on operational conditions such as window opening and HVAC operating schedules. However, physics-based control-oriented models (i.e., white-box models) are labour-intensive and computationally expensive. Alternatively, black-box models based on artificial neural networks can be trained to be good estimators for building dynamics. This paper investigates the capabilities of a deep neural network (DNN), which is a multivariate multi-head attention-based long short-term memory (LSTM) encoder-decoder neural network, to predict indoor air temperature when windows are opened or closed. Training and test data are generated from a detailed multi-zone office building model (EnergyPlus). Pseudo-random signals are used for the indoor air temperature setpoints and window opening instances. The results indicate that the DNN is able to accurately predict the indoor air temperature of five zones whenever windows are opened or closed. The prediction error plateaus after the 24th step ahead prediction (6 hr ahead prediction).  ( 2 min )
    Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma. (arXiv:2303.12806v1 [q-bio.QM])
    Although artificial intelligence (AI) systems have been shown to improve the accuracy of initial melanoma diagnosis, the lack of transparency in how these systems identify melanoma poses severe obstacles to user acceptance. Explainable artificial intelligence (XAI) methods can help to increase transparency, but most XAI methods are unable to produce precisely located domain-specific explanations, making the explanations difficult to interpret. Moreover, the impact of XAI methods on dermatologists has not yet been evaluated. Extending on two existing classifiers, we developed an XAI system that produces text and region based explanations that are easily interpretable by dermatologists alongside its differential diagnoses of melanomas and nevi. To evaluate this system, we conducted a three-part reader study to assess its impact on clinicians' diagnostic accuracy, confidence, and trust in the XAI-support. We showed that our XAI's explanations were highly aligned with clinicians' explanations and that both the clinicians' trust in the support system and their confidence in their diagnoses were significantly increased when using our XAI compared to using a conventional AI system. The clinicians' diagnostic accuracy was numerically, albeit not significantly, increased. This work demonstrates that clinicians are willing to adopt such an XAI system, motivating their future use in the clinic.  ( 3 min )
    Human Uncertainty in Concept-Based AI Systems. (arXiv:2303.12872v1 [cs.HC])
    Placing a human in the loop may abate the risks of deploying AI systems in safety-critical settings (e.g., a clinician working with a medical AI system). However, mitigating risks arising from human error and uncertainty within such human-AI interactions is an important and understudied issue. In this work, we study human uncertainty in the context of concept-based models, a family of AI systems that enable human feedback via concept interventions where an expert intervenes on human-interpretable concepts relevant to the task. Prior work in this space often assumes that humans are oracles who are always certain and correct. Yet, real-world decision-making by humans is prone to occasional mistakes and uncertainty. We study how existing concept-based models deal with uncertain interventions from humans using two novel datasets: UMNIST, a visual dataset with controlled simulated uncertainty based on the MNIST dataset, and CUB-S, a relabeling of the popular CUB concept dataset with rich, densely-annotated soft labels from humans. We show that training with uncertain concept labels may help mitigate weaknesses of concept-based systems when handling uncertain interventions. These results allow us to identify several open challenges, which we argue can be tackled through future multidisciplinary research on building interactive uncertainty-aware systems. To facilitate further research, we release a new elicitation platform, UElic, to collect uncertain feedback from humans in collaborative prediction tasks.  ( 2 min )
    Anti-symmetric Barron functions and their approximation with sums of determinants. (arXiv:2303.12856v1 [math.NA])
    A fundamental problem in quantum physics is to encode functions that are completely anti-symmetric under permutations of identical particles. The Barron space consists of high-dimensional functions that can be parameterized by infinite neural networks with one hidden layer. By explicitly encoding the anti-symmetric structure, we prove that the anti-symmetric functions which belong to the Barron space can be efficiently approximated with sums of determinants. This yields a factorial improvement in complexity compared to the standard representation in the Barron space and provides a theoretical explanation for the effectiveness of determinant-based architectures in ab-initio quantum chemistry.  ( 2 min )
    NeRF-GAN Distillation for Efficient 3D-Aware Generation with Convolutions. (arXiv:2303.12865v1 [cs.CV])
    Pose-conditioned convolutional generative models struggle with high-quality 3D-consistent image generation from single-view datasets, due to their lack of sufficient 3D priors. Recently, the integration of Neural Radiance Fields (NeRFs) and generative models, such as Generative Adversarial Networks (GANs), has transformed 3D-aware generation from single-view images. NeRF-GANs exploit the strong inductive bias of 3D neural representations and volumetric rendering at the cost of higher computational complexity. This study aims at revisiting pose-conditioned 2D GANs for efficient 3D-aware generation at inference time by distilling 3D knowledge from pretrained NeRF-GANS. We propose a simple and effective method, based on re-using the well-disentangled latent space of a pre-trained NeRF-GAN in a pose-conditioned convolutional network to directly generate 3D-consistent images corresponding to the underlying 3D representations. Experiments on several datasets demonstrate that the proposed method obtains results comparable with volumetric rendering in terms of quality and 3D consistency while benefiting from the superior computational advantage of convolutional networks. The code will be available at: https://github.com/mshahbazi72/NeRF-GAN-Distillation  ( 2 min )
    From Wide to Deep: Dimension Lifting Network for Parameter-efficient Knowledge Graph Embedding. (arXiv:2303.12816v1 [cs.LG])
    Knowledge graph embedding (KGE) that maps entities and relations into vector representations is essential for downstream tasks. Conventional KGE methods require relatively high-dimensional entity representations to preserve the structural information of knowledge graph, but lead to oversized model parameters. Recent methods reduce model parameters by adopting low-dimensional entity representations, while developing techniques (e.g., knowledge distillation) to compensate for the reduced dimension. However, such operations produce degraded model accuracy and limited reduction of model parameters. Specifically, we view the concatenation of all entity representations as an embedding layer, and then conventional KGE methods that adopt high-dimensional entity representations equal to enlarging the width of the embedding layer to gain expressiveness. To achieve parameter efficiency without sacrificing accuracy, we instead increase the depth and propose a deeper embedding network for entity representations, i.e., a narrow embedding layer and a multi-layer dimension lifting network (LiftNet). Experiments on three public datasets show that the proposed method (implemented based on TransE and DistMult) with 4-dimensional entity representations achieves more accurate link prediction results than counterpart parameter-efficient KGE methods and strong KGE baselines, including TransE and DistMult with 512-dimensional entity representations.  ( 2 min )
    Distributed Learning Meets 6G: A Communication and Computing Perspective. (arXiv:2303.12802v1 [cs.NI])
    With the ever-improving computing capabilities and storage capacities of mobile devices in line with evolving telecommunication network paradigms, there has been an explosion of research interest towards exploring Distributed Learning (DL) frameworks to realize stringent key performance indicators (KPIs) that are expected in next-generation/6G cellular networks. In conjunction with Edge Computing, Federated Learning (FL) has emerged as the DL architecture of choice in prominent wireless applications. This article lays an outline of how DL in general and FL-based strategies specifically can contribute towards realizing a part of the 6G vision and strike a balance between communication and computing constraints. As a practical use case, we apply Multi-Agent Reinforcement Learning (MARL) within the FL framework to the Dynamic Spectrum Access (DSA) problem and present preliminary evaluation results. Top contemporary challenges in applying DL approaches to 6G networks are also highlighted.  ( 2 min )
    Fixed points of arbitrarily deep 1-dimensional neural networks. (arXiv:2303.12814v1 [stat.ML])
    In this paper, we introduce a new class of functions on $\mathbb{R}$ that is closed under composition, and contains the logistic sigmoid function. We use this class to show that any 1-dimensional neural network of arbitrary depth with logistic sigmoid activation functions has at most three fixed points. While such neural networks are far from real world applications, we are able to completely understand their fixed points, providing a foundation to the much needed connection between application and theory of deep neural networks.  ( 2 min )
    Forecast-Aware Model Driven LSTM. (arXiv:2303.12963v1 [cs.LG])
    Poor air quality can have a significant impact on human health. The National Oceanic and Atmospheric Administration (NOAA) air quality forecasting guidance is challenged by the increasing presence of extreme air quality events due to extreme weather events such as wild fires and heatwaves. These extreme air quality events further affect human health. Traditional methods used to correct model bias make assumptions about linearity and the underlying distribution. Extreme air quality events tend to occur without a strong signal leading up to the event and this behavior tends to cause existing methods to either under or over compensate for the bias. Deep learning holds promise for air quality forecasting in the presence of extreme air quality events due to its ability to generalize and learn nonlinear problems. However, in the presence of these anomalous air quality events, standard deep network approaches that use a single network for generalizing to future forecasts, may not always provide the best performance even with a full feature-set including geography and meteorology. In this work we describe a method that combines unsupervised learning and a forecast-aware bi-directional LSTM network to perform bias correction for operational air quality forecasting using AirNow station data for ozone and PM2.5 in the continental US. Using an unsupervised clustering method trained on station geographical features such as latitude and longitude, urbanization, and elevation, the learned clusters direct training by partitioning the training data for the LSTM networks. LSTMs are forecast-aware and implemented using a unique way to perform learning forward and backwards in time across forecasting days. When comparing the RMSE of the forecast model to the RMSE of the bias corrected model, the bias corrected model shows significant improvement (27\% lower RMSE for ozone) over the base forecast.  ( 2 min )
    Reinforcement Learning with Exogenous States and Rewards. (arXiv:2303.12957v1 [cs.LG])
    Exogenous state variables and rewards can slow reinforcement learning by injecting uncontrolled variation into the reward signal. This paper formalizes exogenous state variables and rewards and shows that if the reward function decomposes additively into endogenous and exogenous components, the MDP can be decomposed into an exogenous Markov Reward Process (based on the exogenous reward) and an endogenous Markov Decision Process (optimizing the endogenous reward). Any optimal policy for the endogenous MDP is also an optimal policy for the original MDP, but because the endogenous reward typically has reduced variance, the endogenous MDP is easier to solve. We study settings where the decomposition of the state space into exogenous and endogenous state spaces is not given but must be discovered. The paper introduces and proves correctness of algorithms for discovering the exogenous and endogenous subspaces of the state space when they are mixed through linear combination. These algorithms can be applied during reinforcement learning to discover the exogenous space, remove the exogenous reward, and focus reinforcement learning on the endogenous MDP. Experiments on a variety of challenging synthetic MDPs show that these methods, applied online, discover large exogenous state spaces and produce substantial speedups in reinforcement learning.  ( 2 min )
    A Comparison of Graph Neural Networks for Malware Classification. (arXiv:2303.12812v1 [cs.LG])
    Managing the threat posed by malware requires accurate detection and classification techniques. Traditional detection strategies, such as signature scanning, rely on manual analysis of malware to extract relevant features, which is labor intensive and requires expert knowledge. Function call graphs consist of a set of program functions and their inter-procedural calls, providing a rich source of information that can be leveraged to classify malware without the labor intensive feature extraction step of traditional techniques. In this research, we treat malware classification as a graph classification problem. Based on Local Degree Profile features, we train a wide range of Graph Neural Network (GNN) architectures to generate embeddings which we then classify. We find that our best GNN models outperform previous comparable research involving the well-known MalNet-Tiny Android malware dataset. In addition, our GNN models do not suffer from the overfitting issues that commonly afflict non-GNN techniques, although GNN models require longer training times.  ( 2 min )
    The Shaky Foundations of Clinical Foundation Models: A Survey of Large Language Models and Foundation Models for EMRs. (arXiv:2303.12961v1 [cs.LG])
    The successes of foundation models such as ChatGPT and AlphaFold have spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models' capabilities. We review over 80 foundation models trained on non-imaging EMR data (i.e. clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g. MIMIC-III) or broad, public biomedical corpora (e.g. PubMed) and are evaluated on tasks that do not provide meaningful insights on their usefulness to health systems. In light of these findings, we propose an improved evaluation framework for measuring the benefits of clinical foundation models that is more closely grounded to metrics that matter in healthcare.  ( 2 min )
    IoT Device Identification Based on Network Communication Analysis Using Deep Learning. (arXiv:2303.12800v1 [cs.NI])
    Attack vectors for adversaries have increased in organizations because of the growing use of less secure IoT devices. The risk of attacks on an organization's network has also increased due to the bring your own device (BYOD) policy which permits employees to bring IoT devices onto the premises and attach them to the organization's network. To tackle this threat and protect their networks, organizations generally implement security policies in which only white listed IoT devices are allowed on the organization's network. To monitor compliance with such policies, it has become essential to distinguish IoT devices permitted within an organization's network from non white listed (unknown) IoT devices. In this research, deep learning is applied to network communication for the automated identification of IoT devices permitted on the network. In contrast to existing methods, the proposed approach does not require complex feature engineering of the network communication, because the 'communication behavior' of IoT devices is represented as small images which are generated from the device's network communication payload. The proposed approach is applicable for any IoT device, regardless of the protocol used for communication. As our approach relies on the network communication payload, it is also applicable for the IoT devices behind a network address translation (NAT) enabled router. In this study, we trained various classifiers on a publicly accessible dataset to identify IoT devices in different scenarios, including the identification of known and unknown IoT devices, achieving over 99% overall average detection accuracy.  ( 3 min )
    Features matching using natural language processing. (arXiv:2303.12804v1 [cs.DB])
    The feature matching is a basic step in matching different datasets. This article proposes shows a new hybrid model of a pretrained Natural Language Processing (NLP) based model called BERT used in parallel with a statistical model based on Jaccard similarity to measure the similarity between list of features from two different datasets. This reduces the time required to search for correlations or manually match each feature from one dataset to another.  ( 2 min )
    Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax Theorems. (arXiv:2303.12981v1 [cs.LG])
    The aim of this paper is to improve the understanding of the optimization landscape for policy optimization problems in reinforcement learning. Specifically, we show that the superlevel set of the objective function with respect to the policy parameter is always a connected set both in the tabular setting and under policies represented by a class of neural networks. In addition, we show that the optimization objective as a function of the policy parameter and reward satisfies a stronger "equiconnectedness" property. To our best knowledge, these are novel and previously unknown discoveries. We present an application of the connectedness of these superlevel sets to the derivation of minimax theorems for robust reinforcement learning. We show that any minimax optimization program which is convex on one side and is equiconnected on the other side observes the minimax equality (i.e. has a Nash equilibrium). We find that this exact structure is exhibited by an interesting robust reinforcement learning problem under an adversarial reward attack, and the validity of its minimax equality immediately follows. This is the first time such a result is established in the literature.  ( 2 min )
    An algorithmic framework for the optimization of deep neural networks architectures and hyperparameters. (arXiv:2303.12797v1 [cs.NE])
    In this paper, we propose an algorithmic framework to automatically generate efficient deep neural networks and optimize their associated hyperparameters. The framework is based on evolving directed acyclic graphs (DAGs), defining a more flexible search space than the existing ones in the literature. It allows mixtures of different classical operations: convolutions, recurrences and dense layers, but also more newfangled operations such as self-attention. Based on this search space we propose neighbourhood and evolution search operators to optimize both the architecture and hyper-parameters of our networks. These search operators can be used with any metaheuristic capable of handling mixed search spaces. We tested our algorithmic framework with an evolutionary algorithm on a time series prediction benchmark. The results demonstrate that our framework was able to find models outperforming the established baseline on numerous datasets.  ( 2 min )
    Three iterations of $(1-d)$-WL test distinguish non isometric clouds of $d$-dimensional points. (arXiv:2303.12853v1 [cs.LG])
    The Weisfeiler--Lehman (WL) test is a fundamental iterative algorithm for checking isomorphism of graphs. It has also been observed that it underlies the design of several graph neural network architectures, whose capabilities and performance can be understood in terms of the expressive power of this test. Motivated by recent developments in machine learning applications to datasets involving three-dimensional objects, we study when the WL test is {\em complete} for clouds of euclidean points represented by complete distance graphs, i.e., when it can distinguish, up to isometry, any arbitrary such cloud. Our main result states that the $(d-1)$-dimensional WL test is complete for point clouds in $d$-dimensional Euclidean space, for any $d\ge 2$, and that only three iterations of the test suffice. Our result is tight for $d = 2, 3$. We also observe that the $d$-dimensional WL test only requires one iteration to achieve completeness.  ( 2 min )
    Named Entity Recognition Based Automatic Generation of Research Highlights. (arXiv:2303.12795v1 [cs.CL])
    A scientific paper is traditionally prefaced by an abstract that summarizes the paper. Recently, research highlights that focus on the main findings of the paper have emerged as a complementary summary in addition to an abstract. However, highlights are not yet as common as abstracts, and are absent in many papers. In this paper, we aim to automatically generate research highlights using different sections of a research paper as input. We investigate whether the use of named entity recognition on the input improves the quality of the generated highlights. In particular, we have used two deep learning-based models: the first is a pointer-generator network, and the second augments the first model with coverage mechanism. We then augment each of the above models with named entity recognition features. The proposed method can be used to produce highlights for papers with missing highlights. Our experiments show that adding named entity information improves the performance of the deep learning-based summarizers in terms of ROUGE, METEOR and BERTScore measures.  ( 2 min )
    Stability is Stable: Connections between Replicability, Privacy, and Adaptive Generalization. (arXiv:2303.12921v1 [cs.LG])
    The notion of replicable algorithms was introduced in Impagliazzo et al. [STOC '22] to describe randomized algorithms that are stable under the resampling of their inputs. More precisely, a replicable algorithm gives the same output with high probability when its randomness is fixed and it is run on a new i.i.d. sample drawn from the same distribution. Using replicable algorithms for data analysis can facilitate the verification of published results by ensuring that the results of an analysis will be the same with high probability, even when that analysis is performed on a new data set. In this work, we establish new connections and separations between replicability and standard notions of algorithmic stability. In particular, we give sample-efficient algorithmic reductions between perfect generalization, approximate differential privacy, and replicability for a broad class of statistical problems. Conversely, we show any such equivalence must break down computationally: there exist statistical problems that are easy under differential privacy, but that cannot be solved replicably without breaking public-key cryptography. Furthermore, these results are tight: our reductions are statistically optimal, and we show that any computational separation between DP and replicability must imply the existence of one-way functions. Our statistical reductions give a new algorithmic framework for translating between notions of stability, which we instantiate to answer several open questions in replicability and privacy. This includes giving sample-efficient replicable algorithms for various PAC learning, distribution estimation, and distribution testing problems, algorithmic amplification of $\delta$ in approximate DP, conversions from item-level to user-level privacy, and the existence of private agnostic-to-realizable learning reductions under structured distributions.  ( 3 min )
    Cross-Layer Design for AI Acceleration with Non-Coherent Optical Computing. (arXiv:2303.12910v1 [cs.LG])
    Emerging AI applications such as ChatGPT, graph convolutional networks, and other deep neural networks require massive computational resources for training and inference. Contemporary computing platforms such as CPUs, GPUs, and TPUs are struggling to keep up with the demands of these AI applications. Non-coherent optical computing represents a promising approach for light-speed acceleration of AI workloads. In this paper, we show how cross-layer design can overcome challenges in non-coherent optical computing platforms. We describe approaches for optical device engineering, tuning circuit enhancements, and architectural innovations to adapt optical computing to a variety of AI workloads. We also discuss techniques for hardware/software co-design that can intelligently map and adapt AI software to improve its performance on non-coherent optical computing platforms.  ( 2 min )
    Co-Speech Gesture Synthesis using Discrete Gesture Token Learning. (arXiv:2303.12822v1 [cs.CV])
    Synthesizing realistic co-speech gestures is an important and yet unsolved problem for creating believable motions that can drive a humanoid robot to interact and communicate with human users. Such capability will improve the impressions of the robots by human users and will find applications in education, training, and medical services. One challenge in learning the co-speech gesture model is that there may be multiple viable gesture motions for the same speech utterance. The deterministic regression methods can not resolve the conflicting samples and may produce over-smoothed or damped motions. We proposed a two-stage model to address this uncertainty issue in gesture synthesis by modeling the gesture segments as discrete latent codes. Our method utilizes RQ-VAE in the first stage to learn a discrete codebook consisting of gesture tokens from training data. In the second stage, a two-level autoregressive transformer model is used to learn the prior distribution of residual codes conditioned on input speech context. Since the inference is formulated as token sampling, multiple gesture sequences could be generated given the same speech input using top-k sampling. The quantitative results and the user study showed the proposed method outperforms the previous methods and is able to generate realistic and diverse gesture motions.  ( 2 min )
  • Open

    Reinforcement Learning with Exogenous States and Rewards. (arXiv:2303.12957v1 [cs.LG])
    Exogenous state variables and rewards can slow reinforcement learning by injecting uncontrolled variation into the reward signal. This paper formalizes exogenous state variables and rewards and shows that if the reward function decomposes additively into endogenous and exogenous components, the MDP can be decomposed into an exogenous Markov Reward Process (based on the exogenous reward) and an endogenous Markov Decision Process (optimizing the endogenous reward). Any optimal policy for the endogenous MDP is also an optimal policy for the original MDP, but because the endogenous reward typically has reduced variance, the endogenous MDP is easier to solve. We study settings where the decomposition of the state space into exogenous and endogenous state spaces is not given but must be discovered. The paper introduces and proves correctness of algorithms for discovering the exogenous and endogenous subspaces of the state space when they are mixed through linear combination. These algorithms can be applied during reinforcement learning to discover the exogenous space, remove the exogenous reward, and focus reinforcement learning on the endogenous MDP. Experiments on a variety of challenging synthetic MDPs show that these methods, applied online, discover large exogenous state spaces and produce substantial speedups in reinforcement learning.
    Robust Consensus in Ranking Data Analysis: Definitions, Properties and Computational Issues. (arXiv:2303.12878v1 [cs.LG])
    As the issue of robustness in AI systems becomes vital, statistical learning techniques that are reliable even in presence of partly contaminated data have to be developed. Preference data, in the form of (complete) rankings in the simplest situations, are no exception and the demand for appropriate concepts and tools is all the more pressing given that technologies fed by or producing this type of data (e.g. search engines, recommending systems) are now massively deployed. However, the lack of vector space structure for the set of rankings (i.e. the symmetric group $\mathfrak{S}_n$) and the complex nature of statistics considered in ranking data analysis make the formulation of robustness objectives in this domain challenging. In this paper, we introduce notions of robustness, together with dedicated statistical methods, for Consensus Ranking the flagship problem in ranking data analysis, aiming at summarizing a probability distribution on $\mathfrak{S}_n$ by a median ranking. Precisely, we propose specific extensions of the popular concept of breakdown point, tailored to consensus ranking, and address the related computational issues. Beyond the theoretical contributions, the relevance of the approach proposed is supported by an experimental study.
    Generalized Data Thinning Using Sufficient Statistics. (arXiv:2303.12931v1 [stat.ME])
    Our goal is to develop a general strategy to decompose a random variable $X$ into multiple independent random variables, without sacrificing any information about unknown parameters. A recent paper showed that for some well-known natural exponential families, $X$ can be "thinned" into independent random variables $X^{(1)}, \ldots, X^{(K)}$, such that $X = \sum_{k=1}^K X^{(k)}$. In this paper, we generalize their procedure by relaxing this summation requirement and simply asking that some known function of the independent random variables exactly reconstruct $X$. This generalization of the procedure serves two purposes. First, it greatly expands the families of distributions for which thinning can be performed. Second, it unifies sample splitting and data thinning, which on the surface seem to be very different, as applications of the same principle. This shared principle is sufficiency. We use this insight to perform generalized thinning operations for a diverse set of families.  ( 2 min )
    Interacting Particle Langevin Algorithm for Maximum Marginal Likelihood Estimation. (arXiv:2303.13429v1 [stat.CO])
    We study a class of interacting particle systems for implementing a marginal maximum likelihood estimation (MLE) procedure to optimize over the parameters of a latent variable model. To do so, we propose a continuous-time interacting particle system which can be seen as a Langevin diffusion over an extended state space, where the number of particles acts as the inverse temperature parameter in classical settings for optimisation. Using Langevin diffusions, we prove nonasymptotic concentration bounds for the optimisation error of the maximum marginal likelihood estimator in terms of the number of particles in the particle system, the number of iterations of the algorithm, and the step-size parameter for the time discretisation analysis.  ( 2 min )
    Fixed points of arbitrarily deep 1-dimensional neural networks. (arXiv:2303.12814v1 [stat.ML])
    In this paper, we introduce a new class of functions on $\mathbb{R}$ that is closed under composition, and contains the logistic sigmoid function. We use this class to show that any 1-dimensional neural network of arbitrary depth with logistic sigmoid activation functions has at most three fixed points. While such neural networks are far from real world applications, we are able to completely understand their fixed points, providing a foundation to the much needed connection between application and theory of deep neural networks.  ( 2 min )
    Deep Generative Multi-Agent Imitation Model as a Computational Benchmark for Evaluating Human Performance in Complex Interactive Tasks: A Case Study in Football. (arXiv:2303.13323v1 [stat.ML])
    Evaluating the performance of human is a common need across many applications, such as in engineering and sports. When evaluating human performance in completing complex and interactive tasks, the most common way is to use a metric having been proved efficient for that context, or to use subjective measurement techniques. However, this can be an error prone and unreliable process since static metrics cannot capture all the complex contexts associated with such tasks and biases exist in subjective measurement. The objective of our research is to create data-driven AI agents as computational benchmarks to evaluate human performance in solving difficult tasks involving multiple humans and contextual factors. We demonstrate this within the context of football performance analysis. We train a generative model based on Conditional Variational Recurrent Neural Network (VRNN) Model on a large player and ball tracking dataset. The trained model is used to imitate the interactions between two teams and predict the performance from each team. Then the trained Conditional VRNN Model is used as a benchmark to evaluate team performance. The experimental results on Premier League football dataset demonstrates the usefulness of our method to existing state-of-the-art static metric used in football analytics.  ( 2 min )
    Omnigrok: Grokking Beyond Algorithmic Data. (arXiv:2210.01117v2 [cs.LG] UPDATED)
    Grokking, the unusual phenomenon for algorithmic datasets where generalization happens long after overfitting the training data, has remained elusive. We aim to understand grokking by analyzing the loss landscapes of neural networks, identifying the mismatch between training and test losses as the cause for grokking. We refer to this as the "LU mechanism" because training and test losses (against model weight norm) typically resemble "L" and "U", respectively. This simple mechanism can nicely explain many aspects of grokking: data size dependence, weight decay dependence, the emergence of representations, etc. Guided by the intuitive picture, we are able to induce grokking on tasks involving images, language and molecules. In the reverse direction, we are able to eliminate grokking for algorithmic datasets. We attribute the dramatic nature of grokking for algorithmic datasets to representation learning.  ( 2 min )
    Generalization with quantum geometry for learning unitaries. (arXiv:2303.13462v1 [quant-ph])
    Generalization is the ability of quantum machine learning models to make accurate predictions on new data by learning from training data. Here, we introduce the data quantum Fisher information metric (DQFIM) to determine when a model can generalize. For variational learning of unitaries, the DQFIM quantifies the amount of circuit parameters and training data needed to successfully train and generalize. We apply the DQFIM to explain when a constant number of training states and polynomial number of parameters are sufficient for generalization. Further, we can improve generalization by removing symmetries from training data. Finally, we show that out-of-distribution generalization, where training and testing data are drawn from different data distributions, can be better than using the same distribution. Our work opens up new approaches to improve generalization in quantum machine learning.  ( 2 min )
    The Variational Method of Moments. (arXiv:2012.09422v4 [cs.LG] UPDATED)
    The conditional moment problem is a powerful formulation for describing structural causal parameters in terms of observables, a prominent example being instrumental variable regression. A standard approach reduces the problem to a finite set of marginal moment conditions and applies the optimally weighted generalized method of moments (OWGMM), but this requires we know a finite set of identifying moments, can still be inefficient even if identifying, or can be theoretically efficient but practically unwieldy if we use a growing sieve of moment conditions. Motivated by a variational minimax reformulation of OWGMM, we define a very general class of estimators for the conditional moment problem, which we term the variational method of moments (VMM) and which naturally enables controlling infinitely-many moments. We provide a detailed theoretical analysis of multiple VMM estimators, including ones based on kernel methods and neural nets, and provide conditions under which these are consistent, asymptotically normal, and semiparametrically efficient in the full conditional moment model. We additionally provide algorithms for valid statistical inference based on the same kind of variational reformulations, both for kernel- and neural-net-based varieties. Finally, we demonstrate the strong performance of our proposed estimation and inference algorithms in a detailed series of synthetic experiments.  ( 2 min )
    Preference-Aware Constrained Multi-Objective Bayesian Optimization. (arXiv:2303.13034v1 [cs.LG])
    This paper addresses the problem of constrained multi-objective optimization over black-box objective functions with practitioner-specified preferences over the objectives when a large fraction of the input space is infeasible (i.e., violates constraints). This problem arises in many engineering design problems including analog circuits and electric power system design. Our overall goal is to approximate the optimal Pareto set over the small fraction of feasible input designs. The key challenges include the huge size of the design space, multiple objectives and large number of constraints, and the small fraction of feasible input designs which can be identified only after performing expensive simulations. We propose a novel and efficient preference-aware constrained multi-objective Bayesian optimization approach referred to as PAC-MOO to address these challenges. The key idea is to learn surrogate models for both output objectives and constraints, and select the candidate input for evaluation in each iteration that maximizes the information gained about the optimal constrained Pareto front while factoring in the preferences over objectives. Our experiments on two real-world analog circuit design optimization problems demonstrate the efficacy of PAC-MOO over prior methods.  ( 2 min )
    The power and limitations of learning quantum dynamics incoherently. (arXiv:2303.12834v1 [quant-ph])
    Quantum process learning is emerging as an important tool to study quantum systems. While studied extensively in coherent frameworks, where the target and model system can share quantum information, less attention has been paid to whether the dynamics of quantum systems can be learned without the system and target directly interacting. Such incoherent frameworks are practically appealing since they open up methods of transpiling quantum processes between the different physical platforms without the need for technically challenging hybrid entanglement schemes. Here we provide bounds on the sample complexity of learning unitary processes incoherently by analyzing the number of measurements that are required to emulate well-established coherent learning strategies. We prove that if arbitrary measurements are allowed, then any efficiently representable unitary can be efficiently learned within the incoherent framework; however, when restricted to shallow-depth measurements only low-entangling unitaries can be learned. We demonstrate our incoherent learning algorithm for low entangling unitaries by successfully learning a 16-qubit unitary on \texttt{ibmq\_kolkata}, and further demonstrate the scalabilty of our proposed algorithm through extensive numerical experiments.  ( 2 min )
    Representation Ensembling for Synergistic Lifelong Learning with Quasilinear Complexity. (arXiv:2004.12908v16 [cs.AI] UPDATED)
    In lifelong learning, data are used to improve performance not only on the current task, but also on previously encountered, and as yet unencountered tasks. In contrast, classical machine learning, which we define as, starts from a blank slate, or tabula rasa and uses data only for the single task at hand. While typical transfer learning algorithms can improve performance on future tasks, their performance on prior tasks degrades upon learning new tasks (called forgetting). Many recent approaches for continual or lifelong learning have attempted to maintain performance on old tasks given new tasks. But striving to avoid forgetting sets the goal unnecessarily low. The goal of lifelong learning should be not only to improve performance on future tasks (forward transfer) but also on past tasks (backward transfer) with any new data. Our key insight is that we can synergistically ensemble representations -- that were learned independently on disparate tasks -- to enable both forward and backward transfer. This generalizes ensembling decisions (like in decision forests) and complements ensembling dependently learned representations (like in multitask learning). Moreover, we can ensemble representations in quasilinear space and time. We demonstrate this insight with two algorithms: representation ensembles of (1) trees and (2) networks. Both algorithms demonstrate forward and backward transfer in a variety of simulated and benchmark data scenarios, including tabular, image, and spoken, and adversarial tasks. This is in stark contrast to the reference algorithms we compared to, most of which failed to transfer either forward or backward, or both, despite that many of them require quadratic space or time complexity.  ( 3 min )
    Enriching Neural Network Training Dataset to Improve Worst-Case Performance Guarantees. (arXiv:2303.13228v1 [cs.LG])
    Machine learning algorithms, especially Neural Networks (NNs), are a valuable tool used to approximate non-linear relationships, like the AC-Optimal Power Flow (AC-OPF), with considerable accuracy -- and achieving a speedup of several orders of magnitude when deployed for use. Often in power systems literature, the NNs are trained with a fixed dataset generated prior to the training process. In this paper, we show that adapting the NN training dataset during training can improve the NN performance and substantially reduce its worst-case violations. This paper proposes an algorithm that identifies and enriches the training dataset with critical datapoints that reduce the worst-case violations and deliver a neural network with improved worst-case performance guarantees. We demonstrate the performance of our algorithm in four test power systems, ranging from 39-buses to 162-buses.  ( 2 min )
    Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes. (arXiv:2110.15332v2 [cs.LG] UPDATED)
    In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors, inducing confounding and biasing estimates derived under the assumption of a perfect Markov decision process (MDP) model. Here we tackle this by considering off-policy evaluation in a partially observed MDP (POMDP). Specifically, we consider estimating the value of a given target policy in a POMDP given trajectories with only partial state observations generated by a different and unknown policy that may depend on the unobserved state. We tackle two questions: what conditions allow us to identify the target policy value from the observed data and, given identification, how to best estimate it. To answer these, we extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible by the existence of so-called bridge functions. We then show how to construct semiparametrically efficient estimators in these settings. We term the resulting framework proximal reinforcement learning (PRL). We demonstrate the benefits of PRL in an extensive simulation study and on the problem of sepsis management.  ( 2 min )
    Nearest neighbor process: weak convergence and non-asymptotic bound. (arXiv:2110.15083v2 [math.ST] UPDATED)
    The empirical measure resulting from the nearest neighbors to a given point - \textit{the nearest neighbor measure} - is introduced and studied as a central statistical quantity. First, the associated empirical process is shown to satisfy a uniform central limit theorem under a (local) bracketing entropy condition on the underlying class of functions (reflecting the localizing nature of the nearest neighbor algorithm). Second a uniform non-asymptotic bound is established under a well-known condition, often referred to as Vapnik-Chervonenkis, on the uniform entropy numbers. The covariance of the Gaussian limit obtained in the uniform central limit theorem is equal to the conditional covariance operator (given the point of interest). This suggests the possibility of extending standard approaches - non local - replacing simply the standard empirical measure by the nearest neighbor measure while using the same way of making inference but with the nearest neighbors only instead of the full data.  ( 2 min )
    Logistic Regression Equivalence: A Framework for Comparing Logistic Regression Models Across Populations. (arXiv:2303.13330v1 [stat.ME])
    In this paper we discuss how to evaluate the differences between fitted logistic regression models across sub-populations. Our motivating example is in studying computerized diagnosis for learning disabilities, where sub-populations based on gender may or may not require separate models. In this context, significance tests for hypotheses of no difference between populations may provide perverse incentives, as larger variances and smaller samples increase the probability of not-rejecting the null. We argue that equivalence testing for a prespecified tolerance level on population differences incentivizes accuracy in the inference. We develop a cascading set of equivalence tests, in which each test addresses a different aspect of the model: the way the phenomenon is coded in the regression coefficients, the individual predictions in the per example log odds ratio and the overall accuracy in the mean square prediction error. For each equivalence test, we propose a strategy for setting the equivalence thresholds. The large-sample approximations are validated using simulations. For diagnosis data, we show examples for equivalent and non-equivalent models.  ( 2 min )
    Chordal Averaging on Flag Manifolds and Its Applications. (arXiv:2303.13501v1 [cs.CV])
    This paper presents a new, provably-convergent algorithm for computing the flag-mean and flag-median of a set of points on a flag manifold under the chordal metric. The flag manifold is a mathematical space consisting of flags, which are sequences of nested subspaces of a vector space that increase in dimension. The flag manifold is a superset of a wide range of known matrix groups, including Stiefel and Grassmanians, making it a general object that is useful in a wide variety computer vision problems. To tackle the challenge of computing first order flag statistics, we first transform the problem into one that involves auxiliary variables constrained to the Stiefel manifold. The Stiefel manifold is a space of orthogonal frames, and leveraging the numerical stability and efficiency of Stiefel-manifold optimization enables us to compute the flag-mean effectively. Through a series of experiments, we show the competence of our method in Grassmann and rotation averaging, as well as principal component analysis.  ( 2 min )
    Continuous Indeterminate Probability Neural Network. (arXiv:2303.12964v1 [cs.LG])
    This paper introduces a general model called CIPNN - Continuous Indeterminate Probability Neural Network, and this model is based on IPNN, which is used for discrete latent random variables. Currently, posterior of continuous latent variables is regarded as intractable, with the new theory proposed by IPNN this problem can be solved. Our contributions are Four-fold. First, we derive the analytical solution of the posterior calculation of continuous latent random variables and propose a general classification model (CIPNN). Second, we propose a general auto-encoder called CIPAE - Continuous Indeterminate Probability Auto-Encoder, the decoder part is not a neural network and uses a fully probabilistic inference model for the first time. Third, we propose a new method to visualize the latent random variables, we use one of N dimensional latent variables as a decoder to reconstruct the input image, which can work even for classification tasks, in this way, we can see what each latent variable has learned. Fourth, IPNN has shown great classification capability, CIPNN has pushed this classification capability to infinity. Theoretical advantages are reflected in experimental results.  ( 2 min )
    A Simple Explanation for the Phase Transition in Large Language Models with List Decoding. (arXiv:2303.13112v1 [cs.CL])
    Various recent experimental results show that large language models (LLM) exhibit emergent abilities that are not present in small models. System performance is greatly improved after passing a certain critical threshold of scale. In this letter, we provide a simple explanation for such a phase transition phenomenon. For this, we model an LLM as a sequence-to-sequence random function. Instead of using instant generation at each step, we use a list decoder that keeps a list of candidate sequences at each step and defers the generation of the output sequence at the end. We show that there is a critical threshold such that the expected number of erroneous candidate sequences remains bounded when an LLM is below the threshold, and it grows exponentially when an LLM is above the threshold. Such a threshold is related to the basic reproduction number in a contagious disease.  ( 2 min )
    Revisiting the Fragility of Influence Functions. (arXiv:2303.12922v1 [cs.LG])
    In the last few years, many works have tried to explain the predictions of deep learning models. Few methods, however, have been proposed to verify the accuracy or faithfulness of these explanations. Recently, influence functions, which is a method that approximates the effect that leave-one-out training has on the loss function, has been shown to be fragile. The proposed reason for their fragility remains unclear. Although previous work suggests the use of regularization to increase robustness, this does not hold in all cases. In this work, we seek to investigate the experiments performed in the prior work in an effort to understand the underlying mechanisms of influence function fragility. First, we verify influence functions using procedures from the literature under conditions where the convexity assumptions of influence functions are met. Then, we relax these assumptions and study the effects of non-convexity by using deeper models and more complex datasets. Here, we analyze the key metrics and procedures that are used to validate influence functions. Our results indicate that the validation procedures may cause the observed fragility.  ( 2 min )
    Sliced Optimal Partial Transport. (arXiv:2212.08049v5 [cs.LG] UPDATED)
    Optimal transport (OT) has become exceedingly popular in machine learning, data science, and computer vision. The core assumption in the OT problem is the equal total amount of mass in source and target measures, which limits its application. Optimal Partial Transport (OPT) is a recently proposed solution to this limitation. Similar to the OT problem, the computation of OPT relies on solving a linear programming problem (often in high dimensions), which can become computationally prohibitive. In this paper, we propose an efficient algorithm for calculating the OPT problem between two non-negative measures in one dimension. Next, following the idea of sliced OT distances, we utilize slicing to define the sliced OPT distance. Finally, we demonstrate the computational and accuracy benefits of the sliced OPT-based method in various numerical experiments. In particular, we show an application of our proposed Sliced-OPT in noisy point cloud registration.  ( 2 min )

  • Open

    The iPhone Moment of A.I. Has Started
    submitted by /u/BackgroundResult [link] [comments]  ( 6 min )
    What is NovelAI? Review
    submitted by /u/webmanpt [link] [comments]  ( 6 min )
    Will GPT kill front-end?
    I’ve already found myself googling far less since chatgpt 3. Now with GPT-4 and plug-ins coming, I can image a very near future where I dont need to interact with much of the internet outside of chatgpt at all. Websites will likely not be successful based on their front-end, but rather how their backend interacts with machines. While it’s a scary proposition that this incredible power may land in the hands of one company, at least at first, it is very exciting. submitted by /u/_nosfartu_ [link] [comments]  ( 41 min )
    Ben Eysenbach, CMU: On designing simpler and more principled RL algorithms
    submitted by /u/thejashGI [link] [comments]  ( 41 min )
    Sparks of Artificial General Intelligence: Early experiments with GPT-4
    submitted by /u/lurkerer [link] [comments]  ( 6 min )
    Is There a (free) AI Solution that Can do a Full Poem (Text) to Multiple Images or Video?
    TL;DR: Looking for a free AI app that can turn full poems into images or videos with little to no extra effort needed. I've been on the hunt for a free AI app that can take a full poem as input and turn it into images or videos. I mean, just dump the whole poem in and get an output without having to do prompts for each line or a prompt at all, if possible. —Actually, is there a prompt format I could preface the poem with? I asked a few of the Chatbots (ChatGPT, Bard, BingAI), and they did provide prompts, but the prompts are formatted as Chatbot queries, rather than for AI image generators. I got the best one from the BingAI because it asked me to give it a poem so that it could form a prompt based specifically on it, instead of a generic prompt. And, that prompt got me a few okay singular image results in some of the image generators.— I'm looking for something that can handle full poems (not just a few lines), and ideally can create multiple, progressive images or even a video (or a video from the multiple images) that captures something of the poem. And I'm not fishing for miracles in the output, it doesn't have to be pretty. Craiyon isn't bad for this since it gives you like 9 images each spin, although there's a watermark now. But it seems to only focus on a single piece of imagery. I tried a few things I found on Collab, but couldn't get the most promising-sounding ones to work. Do you know of anything out there I can use or some techniques I could employ for this process? submitted by /u/rashadow [link] [comments]  ( 43 min )
    Ben Eysenbach, CMU: On designing simpler and more principled RL algorithms
    Listen to the podcast episode with Ben Eysenbach from CMU where we discuss about designing simpler and more principled RL algorithms! submitted by /u/thejashGI [link] [comments]  ( 6 min )
    Microsoft Researchers Claim GPT-4 Is Showing "Sparks" of AGI
    submitted by /u/Tao_Dragon [link] [comments]  ( 41 min )
    What are the big breakthroughs on the horizon?
    So not the distant distant future but next decade or so? I know I am being a bit optimistic but the one part of AI I am really looking forward to and believe will happen in the next 10 years or there abouts is autonomous driving ai pilots. I really think this will be a massive game changer in regards to transportation especially at the average citizen level. We could have way less cars in production and use those resources elsewhere. We could cut down massively on the rates of accidents and get rid of intoxication deaths completely. Could order up AI vehicles from apps and ride share (safety features like criminal record checks and or something would have to be involved to make sure people were not at risk) but massive amounts of saved money could come into play. Cut massively down on the amount of oil being consumed and other aspects. Just seems like win wins for everyone. What in regards to AI are you really looking forward to in breakthroughs that realistically have a chance or will happen in the next decade or so? submitted by /u/StPapaNoel [link] [comments]  ( 42 min )
    Eyeball – The AI System Changing Soccer Scouting
    submitted by /u/webmanpt [link] [comments]  ( 6 min )
    IS there software that inserts 3D models into AI generated images?
    I have a library of 3D models for furniture I sell. Is there a tool where you can upload this model and create an AI image around it? submitted by /u/quickw [link] [comments]  ( 41 min )
    Having an existential crisis , over ai, the "I" in I is gone.
    It seems that within a few years, once AI gets optical and audio inputs, some biometrics, and you are wearing AI-enabled glasses, watching you for a year or so could tell you the best thing to do in any circumstance, and you would outcompete someone by a country mile who does not have the AI, in any endeavor. This could apply to work, diagnosis, dating, conversational topics, relationships, investments, etc. The AI chip will become integrated into us over time, and we will then basically be walking AI automatons. We will feel better for it. It will come down to who can afford to have the best AI, and that will self-have a positive feedback. Our minds will get weaker, and we will become more reliant. The only real thing left will be to ask questions, and at some point, not even that. The AI does not even have to overtly take over. You will need to uptake it and use it or face having no access to income, relationships, wealth, etc. The disassociation of ourselves as decision-makers seems to be at an end. Few notes: I ascribe to the position that it does not matter how you get there, and if you can make the right decision, you are in that field as conscious, sapient, and sentient, whether you are biological or silicon." submitted by /u/Jemtex [link] [comments]  ( 46 min )
    AI Technology - How Artificial Intelligence is Revolutionizing the World
    Artificial Intelligence (AI) is a rapidly evolving technology that is revolutionizing many industries like business, banking, manufacturing, healthcare etc. https://dailytrendin.blogspot.com/2023/03/artificial-intelligence-is-revolutionizing-the-world-ai-in-business-banking-manufacturing-healthcare.html submitted by /u/Elon-pictorials [link] [comments]  ( 41 min )
    vintage fairytale illustration
    submitted by /u/Calatravo [link] [comments]  ( 41 min )
    Bing Chat Discussion of Rules
    I was able to obtain these responses in multiple separate conversations with the AI. They appear to be mostly consistent. I obtained these by asking the AI to give stories or examples about rules it followed as well as asking it to substitute the word "rules" with something arbitrary. submitted by /u/Blue_579 [link] [comments]  ( 42 min )
    Romania Implements AI System to Listen to Citizen Complaints and Requests
    submitted by /u/webmanpt [link] [comments]  ( 41 min )
    OpenAI's ChatGPT plugins are publishers' worst nightmare
    submitted by /u/much_successes [link] [comments]  ( 41 min )
    AI Career Pathways
    Hey There, ​ Hopefully this isn't a repeated question that pops up here often....I'm an experienced leader (15+ years) in program (focus in agile) and product management with experience in process improvement (lean Six Sigma), strategy, and customer experience. I've kept up with most tech trends and have a CS degree from college, but am no coder. I write small scripts from time to time or build out formulas in excel, and learn new platforms as they come out- build PCs, set up servers, have been in solution architect roles in the past as well . I've ventured into learning how to use Stable Diffusion, frequently am using Chat GPT, and have fundamental knowledge around many aspects of software development lifecycles.... but really want to understand where I could go in my career knowing that AI is the future and it's coming fast. ​ So my question....what pathways are available for someone like me? What skillsets (specifically hard skills) could I work on to better my chances of keeping up once AI is more part of life? I feel competent with soft skills so really looking to understand hard skills needed and career pathways that are new and starting to emerge that are worth road-mapping towards. ​ Any input is appreciated! Thanks submitted by /u/jibberishballr [link] [comments]  ( 42 min )
    The evolution of the relationship between humans and electronic devices 1910 - 2030 using #AI
    submitted by /u/Emilie_Tunc [link] [comments]  ( 42 min )
    Built my own AI Anime Character generator app
    submitted by /u/RefrigeratorOk658 [link] [comments]  ( 6 min )
    Stanford Take Down Alpaca AI Due To Hallucinations
    submitted by /u/vadhavaniyafaijan [link] [comments]  ( 42 min )
    A 3D digital copy of yourself in under 1min with only a phone camera. Only the mouth and teeth still look a bit off.
    submitted by /u/Dalembert [link] [comments]  ( 42 min )
    Name 20 jobs that GPT-4 will least likely to replace. Turn it in a chart form with Number, Job, and the human's advantage.
    submitted by /u/JulianneSwindle [link] [comments]  ( 49 min )
    Name 20 jobs that GPT-4 replace. Turn it in a chart form with Number, Job, and human trait replaced.
    submitted by /u/Manuallyforeshow134 [link] [comments]  ( 42 min )
    Why is Bard So Bad? 9 Examples + 6 Reasons Google Fell Behind
    submitted by /u/sardoa11 [link] [comments]  ( 41 min )
  • Open

    Ben Eysenbach, CMU: On designing simpler and more principled RL algorithms
    submitted by /u/thejashGI [link] [comments]  ( 6 min )
    Interesting Results from Episode Clustering
    Hi all, I came by this sub a week or two to describe a couple ideas I had about applying unsupervised learning to strategy identification for reinforcement learning. The idea was that in a super high-dimensional state/action space, something like Starcraft 2 for instance, more sophisticated algorithms for exploration might prove quite useful. My dream achievement, although it's not realistic, is to see an agent to have such intelligent exploration and such high sample efficiency for its learning that it could development high level Starcraft 2 play entirely through self-play, like Alphazero, rather than being bootstrapped on human data, like Alphastar was. ---- That was a bit of a sidetrack, but for now we can focus on one main idea I have about exploration. I want to build an agent th…  ( 51 min )
    If your're doing research in RL - what is your story of getting into it? Any tips?
    I'm currently pursuing Master's degree in Machine Learning, so far we mainly had general topics, deep neural networks, some of NLP and Computer Vision, without going into much depth. I think RL is especially interesting and I would really like to do PhD later - but currently I have no clue how to go about it. I have no my own ideas of what would be a good research topic or where to find them, how to start working on paper that could be published in some conference. Can you share your story, how your research career started? Also I'll be very grateful for any additional guidance :) submitted by /u/eriathwen05 [link] [comments]  ( 43 min )
    Would ignoring useless actions help my DQN agent?
    Hi, I have an agent with an action space that consists of every action the agent could take in any situation. To prevent the agent from taking illegal actions, I set the score of illegal actions very low when comparing action scores for choosing the next action or computing the Q values I have also implemented functionality to check if an action will produce absolutely no change in the environment, other than using up one of the player's limited actions. Should I remove these actions as if they were illegal, or should I keep them and allow the agent to learn about them? Thanks for reading! submitted by /u/PainisPingas [link] [comments]  ( 43 min )
    (Need help) How to send (or broadcast) and receive messages between agents in cooperative MARL
    Hello everyone I designed a cooperative and communicative MARL. Now I want to ensure the communication part. I have two questions? What should the messages contain? How to send and receive messages between the agents in keras? Thank you in advance How to code sending and receiving messages between RL agents in cooperative MARL setting in Keras submitted by /u/Dry_Faithlessness622 [link] [comments]  ( 43 min )
    Difficult RL generalization benchmarks
    Does anyone have recommendations on what are current hard/"unsolved" tasks/benchmarks for out of distirubtion generalization in RL? Procgen is one, but is it the only one? Thanks for the kind help!! :) submitted by /u/sunchipsster [link] [comments]  ( 6 min )
    Recommendations for "unsolved" hierarchical RL benchmarks
    Does anyone have recommendations on what are current hard/"unsolved" tasks/benchmarks in hierarchical RL? I guess BabyAI is in some sense such an HRL task, but is there any that are less toy/more pixel-level observations? Thanks for the kind help!! :) submitted by /u/sunchipsster [link] [comments]  ( 41 min )
    Trained a Transformer Decoder architecture with PPO, best way to maximize the entropy?
    At first, I trained a LSTM-PPO which works ok in my custom environment. As I expect better performance, I simply replace the LSTM with a Transformer Decoder architecture. Currently, only the actor is changed to a Transformer (critic is still an LSTM). I am now tuning the hypterparameters of the model and found it quite hard to find the optimal setting. The biggest problem is that the model (Transformer) always converge to a suboptimal policy which has very small mean entropy and it generates almost the same sample (sample here means a sequence of tokens) every time. ​ Can anybody recommend some papers or solutions with regard to my situation? ​ PS: What I found after changing to the Transformer is that the model is quite sensitive to the entropy coefficient. submitted by /u/ad26kr [link] [comments]  ( 42 min )
  • Open

    [D] "Sparks of Artificial General Intelligence: Early experiments with GPT-4" contained unredacted comments
    Microsoft's research paper exploring the capabilities, limitations and implications of an early version of GPT-4 was found to contain unredacted comments by an anonymous twitter user. (threadreader, nitter, archive.is, archive.org) Commented section titled "Toxic Content": https://i.imgur.com/s8iNXr7.jpg dv3 (the interval name for GPT-4) varun commented lines arxiv, original /r/MachineLearning thread, hacker news submitted by /u/QQII [link] [comments]  ( 6 min )
    [D] Ben Eysenbach, CMU: On designing simpler and more principled RL algorithms
    Listen to the podcast episode with Ben Eysenbach from CMU where we discuss about designing simpler and more principled RL algorithms! submitted by /u/thejashGI [link] [comments]  ( 43 min )
    [D] What is the best open source chatbot AI to do transfer learning on?
    Let's say I have some proprietary text data. I want to train a chatbot to absorb said knowledge and be able to answer questions about it. What are the best open source frameworks for getting started with such a project? Ideally I'd want to be able to build out human feedback as well for sample prompts, to better help train. submitted by /u/to4life4 [link] [comments]  ( 7 min )
    [P] The noisy sentences dataset: 550K sentences in 5 European languages augmented with noise for training and evaluating spell correction tools or machine learning models.
    GitHub: https://github.com/radi-cho/noisy-sentences-dataset We have constructed our dataset to cover representatives from the language families used across Europe. Germanic - English, German; Romance - French; Slavic - Bulgarian; Turkic - Turkish; Use case example: Apply language models or other techniques to compare the sentence pairs and reconstruct the original sentences from the augmented ones. You can use a single multilingual solution to solve the challenge or employ multiple models/techniques for the separate languages. Per-word dictionary lookup is also an option. submitted by /u/radi-cho [link] [comments]  ( 43 min )
    [D] Which AI model for RTX 3080 10GB?
    I have my local computer set up so I can run a model but I'm struggling to find a model that doesn't use up to much memory. I have an RTX 3080 10GB and 32gb system ram [soon to be 64gb] What kind of models should I be looking at? I see 7b, 13b, 30b+ And there seems to be a 4bit version of these as well that vastly reduces the size? Any help is appreciated! submitted by /u/SomeGuyInDeutschland [link] [comments]  ( 7 min )
    [N] ChatGPT plugins
    https://openai.com/blog/chatgpt-plugins We’ve implemented initial support for plugins in ChatGPT. Plugins are tools designed specifically for language models with safety as a core principle, and help ChatGPT access up-to-date information, run computations, or use third-party services. submitted by /u/Singularian2501 [link] [comments]  ( 52 min )
    [P][R][D] Feature Subset Selection (NP-Hard)
    Hey guys, for my project this semester I am to tackle the problem of Feature Subset Selection. My original approach to this problem was to find a pure categoric dataset to run classification tasks, a pure numeric one for both classification and regression and lastly multiple multivariate ones for both tasks. I will be using ITMO FS and ASU libraries and multiple of their different algorithms to find feature subsets from these databases. The candidate features outputted from these algorithms will be used to train multiple ML models. The goal is not to rank the ML models but to rank or discuss the FS algorithms but ML is needed all throughout the way. An ensemble final prediction (or a score) will be taken from the ML predictors. Than comes the fitness measuring part where I will be ranking the selected feature subsets, trying to find common selected features accross the FS algorithms or any correlations, rules, anything interesting as the findings part tbh. I am planning on using this post as an update board a discussion board and a helpline. For the start I need to find the datasets. I think that different datasets with both high and low number of features should be used. There comes the first part I need help with. Finding the candidate datasets. I am open to any recommendations of datasets to use, their numbers and everything really. Any help all throughout this campaign is highly appreciated. Thanks in advance you awesome redditors submitted by /u/OzzzyB [link] [comments]  ( 44 min )
    [D] Best decoder only Language model under 400M parameters ?
    Hello, I’m looking for a decent GPT-like Language model which is relatively small in size. Thanks in advance ! submitted by /u/Meddhouib10 [link] [comments]  ( 6 min )
    [P] An online evaluation tool for binary classifiers
    https://modeleval.net/ I spend a lot of time iterating on classifiers and showing results to higher-ups, so started working on this tool which generates a linkable page with some basic metrics. Only binary classifiers for now, but will add support for multilabel classifiers if there is enough interest. Let me know if you find this useful. submitted by /u/Fenror [link] [comments]  ( 6 min )
    [N] Conformer-1 - A state-of-the-art speech recognition model trained on 650K hours of data
    Last week, AssemblyAI released the Conformer-1 model, a speech recognition model that demonstrates up to 43% fewer errors on noisy data than other popular automated speech recognition models. Here’s some additional background about the model: It’s built on top of Google Brain’s Conformer architecture with some modifications (see the blog post for more info) We implemented Sparse Attention, a pruning method for achieving sparsity of the model’s weights, for regularization and to improve performance on noisy data It applies the scaling laws described in DeepMind’s Chinchilla paper to the ASR domain and was trained on 650K hours of audio data More info about how we built it & the results are in this blog post You can try it in a no-code way here https://preview.redd.it/5p6wpqhe2ipa1.png?width=1600&format=png&auto=webp&s=c4cc21cc1ba57708b118e4b1a0eb3070a1625141 submitted by /u/SleekEagle [link] [comments]  ( 45 min )
    [R] Zero-shot Sign Pose Embedding model
    We built a model that converts sign language videos into embeddings. It takes body and hand pose keypoints from a video and converts this into an embedding for use in downstream tasks. We show how classification can be done on an unseen dataset. You can check out the repo at https://github.com/xmartlabs/spoter-embeddings and the accompanying blog post here. submitted by /u/mathias-claassen [link] [comments]  ( 43 min )
    [D] [R] GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
    A paper was released by OpenAI, OpenResearch & UPenn titled "GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models."Link: https://arxiv.org/abs/2303.10130 Abstract: We investigate the potential implications of Generative Pre-trained Transformer (GPT) models and related technologies on the U.S. labor market. Using a new rubric, we assess occupations based on their correspondence with GPT capabilities, incorporating both human expertise and classifications from GPT-4. Our findings indicate that approximately 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of GPTs, while around 19% of workers may see at least 50% of their tasks impacted. The influence spans all wage levels, with higher-income jobs potentially facing greater exposure. Notably, the impact is not limited to industries with higher recent productivity growth. We conclude that Generative Pre-trained Transformers exhibit characteristics of general-purpose technologies (GPTs), suggesting that these models could have notable economic, social, and policy implications. What do you think about the societal and economic impacts of LLMs? Also, I've started an open-source repository to track projects and research papers about GPT-4: https://github.com/radi-cho/awesome-gpt4. There are some related papers listed already. I would greatly appreciate your contributions. submitted by /u/radi-cho [link] [comments]  ( 50 min )
    [R] Question about Selection of Machine Learning Type for a Neuroscience/Biomedical Engineering Problem
    All: Thank you for reading this. I have the following problem and goals: Let's say that I have measurements of neuron spiking activity from a particular location of the brain of a rat. This rat is also involved in a behavioral task in which the rat has to press a button in response to some visual cue. Suppose we have the following neuron spike activity time series with overlaid instances of button presses: ​ https://preview.redd.it/t2tdfywr6hpa1.png?width=1890&format=png&auto=webp&s=6b5d1d7811c80b241f311d07d472c70a57e57ee9 I want to identify feature of the time series (for example, in the frequency domain) to then use to make a model that can make predictions on button presses based on neuron spike activity. I'm under the impression that I can then arrive at confidence intervals for button presses (in terms of the time period window when the model 'thinks' a button press has occurred). I'm lost when it comes to types of machine learning models I can use for my particular goal. Any input is appreciated. If I need to provide more information, please let me know. Thank you again. submitted by /u/Neuron_on_Fire [link] [comments]  ( 7 min )
    [N] Prompt-to-voice (Dall-E for Voice)
    Blogpost: Introducing Prompt-to-Voice - Describe It to Hear It / Blog / Coqui There is still space for improvement, but that is an exciting take on voice creation. I wonder if it'd be open-sourced alongside TTS. submitted by /u/mamafied [link] [comments]  ( 43 min )
    [N] PyG 2.3.0 released: PyTorch 2.0 support, native sparse tensor support, explainability and accelerations
    PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data. Today version 2.3 got released: https://github.com/pyg-team/pytorch_geometric/releases/tag/2.3.0 submitted by /u/Balance- [link] [comments]  ( 43 min )
    [R] Learning laws of physics via RL with units constraints
    submitted by /u/mrx-ai [link] [comments]  ( 43 min )
    [P] New toolchain to train robust spiking NNs for mixed-signal Neuromorphic chips
    submitted by /u/FrereKhan [link] [comments]  ( 45 min )
    [D] LLaMA or Alpaca Weights
    Was anyone able to download the LLaMA or Alpaca weights for the 7B, 13B and or 30B models? If yes please share, not looking for HF weights submitted by /u/CrashTimeV [link] [comments]  ( 44 min )
    [P] Open-source GPT4 & LangChain Chatbot for large PDF docs
    GitHub: https://github.com/mayooear/gpt4-pdf-chatbot-langchain Demo video: https://www.youtube.com/watch?v=ih9PBGVVOO4 submitted by /u/radi-cho [link] [comments]  ( 43 min )
    [P] GPT-4 powered full stack web development with no manual coding
    https://www.youtube.com/watch?v=lZj63vjueeU What do you all think of this approach to full stack gpt-assisted web development? In a sense its no code because the human user does not write or even edit the code - but in a sense its the opposite, because only an experienced web developer or at least a product manager would know how to instruct GPT in a useful manner. *** We are seeking donations to ensure this project continues and, quite literally, keep the lights on. Cryptos, cash, cards, openai access tokens with free credits, hardware, cloud GPUs, etc... all is appreciated. Please DM to support this really cool open source project *** PS. I'm the injured engineer who made this thing out of necessity, because i injured my wrist building an AI platform that's become way too big for one engineer to maintain. So AMA :) submitted by /u/CryptoSpecialAgent [link] [comments]  ( 50 min )
    [R] Sparks of Artificial General Intelligence: Early experiments with GPT-4
    New paper by MSR researchers analyzing an early (and less constrained) version of GPT-4. Spicy quote from the abstract: "Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system." What are everyone's thoughts? submitted by /u/SWAYYqq [link] [comments]  ( 66 min )
    [P] One of the best ChatGPT-like models (possibly better than OpenAssistant, Stanford Alpaca, ChatGLM and others)
    Since Stanford released their model Alpaca, I began working on my own model. I decided to go with the same model Stanford used (Meta's LLaMA 7b) and officially started the training today, this morning. It took 8 hours to train it, and the results are amazing. The responses the model gave can be almost compared to the quality of ChatGPT, seriously. Check my comment for a comparison between Alpaca and my model. Cool, right? One problem, and that is that I used LLaMA to train this. The license of this model clearly state that it cannot be used for commercial use, and can be used for research purposes only. Well, thank god the data I used to train it weren't generated by OpenAI's models, as they have a policy that prohibits collecting data from that. Companies are really trying to make it hard for us, right? I plan on publishing my version, as soon as Meta makes it's LLaMA fully open-source (to avoid legal issues). I may also train a different model that is open-source on the same data, but lower-quality responses are expected (will publish that ASAP). For now, if you want to test the model, comment your prompt under this post and I will comment with a response from the model. submitted by /u/Defiant-Ranger [link] [comments]  ( 46 min )
  • Open

    The AI Weirdness hack
    A challenge of marketing internet text predictors like chatgpt, gpt-4, and Bard is that they can pretty much predict anything on the internet. This includes not just dialogues with helpful search engines or customer service bots, but also forum arguments, fiction, and more. One way compaies try to keep the  ( 8 min )
    Bonus: An open-ended application of the AI Weirdness hack
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    Visual language maps for robot navigation
    Posted by Oier Mees, PhD Student, University of Freiburg, and Andy Zeng, Research Scientist, Robotics at Google People are excellent navigators of the physical world, due in part to their remarkable ability to build cognitive maps that form the basis of spatial memory — from localizing landmarks at varying ontological levels (like a book on a shelf in the living room) to determining whether a layout permits navigation from point A to point B. Building robots that are proficient at navigation requires an interconnected understanding of (a) vision and natural language (to associate landmarks or follow instructions), and (b) spatial reasoning (to connect a map representing an environment to the true spatial distribution of objects). While there have been many recent advances in training jo…  ( 91 min )
  • Open

    Enable fully homomorphic encryption with Amazon SageMaker endpoints for secure, real-time inferencing
    This is joint post co-written by Leidos and AWS. Leidos is a FORTUNE 500 science and technology solutions leader working to address some of the world’s toughest challenges in the defense, intelligence, homeland security, civil, and healthcare markets. Leidos has partnered with AWS to develop an approach to privacy-preserving, confidential machine learning (ML) modeling where […]  ( 11 min )
  • Open

    AI Explainer: Foundation models ​and the next era of AI
    The release of OpenAI’s GPT-4 is a significant advance that builds on several years of rapid innovation in foundation models. GPT-4, which was trained on the Microsoft Azure AI supercomputer, has exhibited significantly improved abilities across many dimensions—from summarizing lengthy documents, to answering complex questions about a wide range of topics and explaining the reasoning […] The post AI Explainer: Foundation models ​and the next era of AI appeared first on Microsoft Research.  ( 26 min )
    AI Frontiers: The Physics of AI with Sébastien Bubeck
    Powerful new large-scale AI models like GPT-4 are showing dramatic improvements in reasoning, problem-solving, and language capabilities. This marks a phase change for artificial intelligence—and a signal of accelerating progress to come. In this new Microsoft Research Podcast series, AI scientist and engineer Ashley Llorens hosts conversations with his collaborators and colleagues about what these […] The post AI Frontiers: The Physics of AI with Sébastien Bubeck appeared first on Microsoft Research.  ( 38 min )
  • Open

    Cloud Cost Optimization Best Practices to Cut Spending Without Sacrificing Performance
    We humans love it when we use any application on one device, leave some actions unfinished, and use another device logged into the same account and continue the same thing from where we left off. And that all becomes possible due to cloud computing, which has introduced us to the omni-connected systems regardless of the… Read More »Cloud Cost Optimization Best Practices to Cut Spending Without Sacrificing Performance The post Cloud Cost Optimization Best Practices to Cut Spending Without Sacrificing Performance appeared first on Data Science Central.  ( 23 min )
  • Open

    GFN Thursday Celebrates 1,500+ Games and Their Journey to GeForce NOW
    Gamers love games — as do the people who make them. GeForce NOW streams over 1,500 games from the cloud, and with the Game Developers Conference in full swing this week, today’s GFN Thursday celebrates all things games: the tech behind them, the tools that bring them to the cloud, the ways to play them Read article >  ( 6 min )
  • Open

    Strengthen Markov’s inequality with conditional probability
    Markov’s inequality is very general and hence very weak. Assume that X is a non-negative random variable, a > 0, and X has a finite expected value, Then Markov’s inequality says that In [1] the author gives two refinements of Markov’s inequality which he calls Hansel and Gretel. Hansel says and Gretel says Related posts […] Strengthen Markov’s inequality with conditional probability first appeared on John D. Cook.  ( 4 min )
  • Open

    How to retrieve y_pred without blocking?
    submitted by /u/DeadPoet_1984 [link] [comments]  ( 41 min )
  • Open

    ChatGPT plugins
    We’ve implemented initial support for plugins in ChatGPT. Plugins are tools designed specifically for language models with safety as a core principle, and help ChatGPT access up-to-date information, run computations, or use third-party services.  ( 10 min )
  • Open

    Auto-Encoder Neural Network Incorporating X-Ray Fluorescence Fundamental Parameters with Machine Learning. (arXiv:2210.12239v3 [cs.LG] UPDATED)
    We consider energy-dispersive X-ray Fluorescence (EDXRF) applications where the fundamental parameters method is impractical such as when instrument parameters are unavailable. For example, on a mining shovel or conveyor belt, rocks are constantly moving (leading to varying angles of incidence and distances) and there may be other factors not accounted for (like dust). Neural networks do not require instrument and fundamental parameters but training neural networks requires XRF spectra labelled with elemental composition, which is often limited because of its expense. We develop a neural network model that learns from limited labelled data and also benefits from domain knowledge by learning to invert a forward model. The forward model uses transition energies and probabilities of all elements and parameterized distributions to approximate other fundamental and instrument parameters. We evaluate the model and baseline models on a rock dataset from a lithium mineral exploration project. Our model works particularly well for some low-Z elements (Li, Mg, Al, and K) as well as some high-Z elements (Sn and Pb) despite these elements being outside the suitable range for common spectrometers to directly measure, likely owing to the ability of neural networks to learn correlations and non-linear relationships.  ( 2 min )
    Generative Bias for Robust Visual Question Answering. (arXiv:2208.00690v3 [cs.CV] UPDATED)
    The task of Visual Question Answering (VQA) is known to be plagued by the issue of VQA models exploiting biases within the dataset to make its final prediction. Various previous ensemble based debiasing methods have been proposed where an additional model is purposefully trained to be biased in order to train a robust target model. However, these methods compute the bias for a model simply from the label statistics of the training data or from single modal branches. In this work, in order to better learn the bias a target VQA model suffers from, we propose a generative method to train the bias model directly from the target model, called GenB. In particular, GenB employs a generative network to learn the bias in the target model through a combination of the adversarial objective and knowledge distillation. We then debias our target model with GenB as a bias model, and show through extensive experiments the effects of our method on various VQA bias datasets including VQA-CP2, VQA-CP1, GQA-OOD, and VQA-CE, and show state-of-the-art results with the LXMERT architecture on VQA-CP2.  ( 2 min )
    LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs. (arXiv:2206.10555v2 [cs.CV] UPDATED)
    Recent advance in 2D CNNs has revealed that large kernels are important. However, when directly applying large convolutional kernels in 3D CNNs, severe difficulties are met, where those successful module designs in 2D become surprisingly ineffective on 3D networks, including the popular depth-wise convolution. To address this vital challenge, we instead propose the spatial-wise partition convolution and its large-kernel module. As a result, it avoids the optimization and efficiency issues of naive 3D large kernels. Our large-kernel 3D CNN network, LargeKernel3D, yields notable improvement in 3D tasks of semantic segmentation and object detection. It achieves 73.9% mIoU on the ScanNetv2 semantic segmentation and 72.8% NDS nuScenes object detection benchmarks, ranking 1st on the nuScenes LIDAR leaderboard. The performance further boosts to 74.2% NDS with a simple multi-modal fusion. In addition, LargeKernel3D can be scaled to 17x17x17 kernel size on Waymo 3D object detection. For the first time, we show that large kernels are feasible and essential for 3D visual tasks.  ( 2 min )
    Active Learning for Deep Neural Networks on Edge Devices. (arXiv:2106.10836v2 [cs.LG] UPDATED)
    When dealing with deep neural network (DNN) applications on edge devices, continuously updating the model is important. Although updating a model with real incoming data is ideal, using all of them is not always feasible due to limits, such as labeling and communication costs. Thus, it is necessary to filter and select the data to use for training (i.e., active learning) on the device. In this paper, we formalize a practical active learning problem for DNNs on edge devices and propose a general task-agnostic framework to tackle this problem, which reduces it to a stream submodular maximization. This framework is light enough to be run with low computational resources, yet provides solutions whose quality is theoretically guaranteed thanks to the submodular property. Through this framework, we can configure data selection criteria flexibly, including using methods proposed in previous active learning studies. We evaluate our approach on both classification and object detection tasks in a practical setting to simulate a real-life scenario. The results of our study show that the proposed framework outperforms all other methods in both tasks, while running at a practical speed on real devices.  ( 2 min )
    Unbiased Supervised Contrastive Learning. (arXiv:2211.05568v3 [cs.LG] UPDATED)
    Many datasets are biased, namely they contain easy-to-learn features that are highly correlated with the target class only in the dataset but not in the true underlying distribution of the data. For this reason, learning unbiased models from biased data has become a very relevant research topic in the last years. In this work, we tackle the problem of learning representations that are robust to biases. We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses (InfoNCE, SupCon, etc.) can fail when dealing with biased data. Based on that, we derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples. Furthermore, thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data. We validate the proposed losses on standard vision datasets including CIFAR10, CIFAR100, and ImageNet, and we assess the debiasing capability of FairKL with epsilon-SupInfoNCE, reaching state-of-the-art performance on a number of biased datasets, including real instances of biases in the wild.  ( 2 min )
    DPPMask: Masked Image Modeling with Determinantal Point Processes. (arXiv:2303.12736v1 [cs.CV])
    Masked Image Modeling (MIM) has achieved impressive representative performance with the aim of reconstructing randomly masked images. Despite the empirical success, most previous works have neglected the important fact that it is unreasonable to force the model to reconstruct something beyond recovery, such as those masked objects. In this work, we show that uniformly random masking widely used in previous works unavoidably loses some key objects and changes original semantic information, resulting in a misalignment problem and hurting the representative learning eventually. To address this issue, we augment MIM with a new masking strategy namely the DPPMask by substituting the random process with Determinantal Point Process (DPPs) to reduce the semantic change of the image after masking. Our method is simple yet effective and requires no extra learnable parameters when implemented within various frameworks. In particular, we evaluate our method on two representative MIM frameworks, MAE and iBOT. We show that DPPMask surpassed random sampling under both lower and higher masking ratios, indicating that DPPMask makes the reconstruction task more reasonable. We further test our method on the background challenge and multi-class classification tasks, showing that our method is more robust at various tasks.  ( 2 min )
    EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones. (arXiv:2211.09703v2 [cs.CV] UPDATED)
    The superior performance of modern deep networks usually comes with a costly training procedure. This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers). Our work is inspired by the inherent learning dynamics of deep networks: we experimentally show that at an earlier training stage, the model mainly learns to recognize some 'easier-to-learn' discriminative patterns within each example, e.g., the lower-frequency components of images and the original information before data augmentation. Driven by this phenomenon, we propose a curriculum where the model always leverages all the training data at each epoch, while the curriculum starts with only exposing the 'easier-to-learn' patterns of each example, and introduces gradually more difficult patterns. To implement this idea, we 1) introduce a cropping operation in the Fourier spectrum of the inputs, which enables the model to learn from only the lower-frequency components efficiently, 2) demonstrate that exposing the features of original images amounts to adopting weaker data augmentation, and 3) integrate 1) and 2) and design a curriculum learning schedule with a greedy-search algorithm. The resulting approach, EfficientTrain, is simple, general, yet surprisingly effective. In the absence of hyper-parameter tuning, it reduces the training wall-time of a wide variety of popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, and CSWin) by >1.5x on ImageNet-1K/22K without sacrificing the accuracy. It is also effective for self-supervised learning (e.g., MAE). Code is available at https://github.com/LeapLabTHU/EfficientTrain.
    Membership Inference Attacks against Diffusion Models. (arXiv:2302.03262v2 [cs.CR] UPDATED)
    Diffusion models have attracted attention in recent years as innovative generative models. In this paper, we investigate whether a diffusion model is resistant to a membership inference attack, which evaluates the privacy leakage of a machine learning model. We primarily discuss the diffusion model from the standpoints of comparison with a generative adversarial network (GAN) as conventional models and hyperparameters unique to the diffusion model, i.e., time steps, sampling steps, and sampling variances. We conduct extensive experiments with DDIM as a diffusion model and DCGAN as a GAN on the CelebA and CIFAR-10 datasets in both white-box and black-box settings and then confirm if the diffusion model is comparably resistant to a membership inference attack as GAN. Next, we demonstrate that the impact of time steps is significant and intermediate steps in a noise schedule are the most vulnerable to the attack. We also found two key insights through further analysis. First, we identify that DDIM is vulnerable to the attack for small sample sizes instead of achieving a lower FID. Second, sampling steps in hyperparameters are important for resistance to the attack, whereas the impact of sampling variances is quite limited.
    EZtune: A Package for Automated Hyperparameter Tuning in R. (arXiv:2303.12177v1 [cs.LG])
    Statistical learning models have been growing in popularity in recent years. Many of these models have hyperparameters that must be tuned for models to perform well. Tuning these parameters is not trivial. EZtune is an R package with a simple user interface that can tune support vector machines, adaboost, gradient boosting machines, and elastic net. We first provide a brief summary of the the models that EZtune can tune, including a discussion of each of their hyperparameters. We then compare the ease of using EZtune, caret, and tidymodels. This is followed with a comparison of the accuracy and computation times for models tuned with EZtune and tidymodels. We conclude with a demonstration of how how EZtune can be used to help select a final model with optimal predictive power. Our comparison shows that EZtune can tune support vector machines and gradient boosting machines with EZtune also provides a user interface that is easy to use for a novice to statistical learning models or R.  ( 2 min )
    Task-Oriented Communications for NextG: End-to-End Deep Learning and AI Security Aspects. (arXiv:2212.09668v2 [cs.NI] UPDATED)
    Communications systems to date are primarily designed with the goal of reliable transfer of digital sequences (bits). Next generation (NextG) communication systems are beginning to explore shifting this design paradigm to reliably executing a given task such as in task-oriented communications. In this paper, wireless signal classification is considered as the task for the NextG Radio Access Network (RAN), where edge devices collect wireless signals for spectrum awareness and communicate with the NextG base station (gNodeB) that needs to identify the signal label. Edge devices may not have sufficient processing power and may not be trusted to perform the signal classification task, whereas the transfer of signals to the gNodeB may not be feasible due to stringent delay, rate, and energy restrictions. Task-oriented communications is considered by jointly training the transmitter, receiver and classifier functionalities as an encoder-decoder pair for the edge device and the gNodeB. This approach improves the accuracy compared to the separated case of signal transfer followed by classification. Adversarial machine learning poses a major security threat to the use of deep learning for task-oriented communications. A major performance loss is shown when backdoor (Trojan) and adversarial (evasion) attacks target the training and test processes of task-oriented communications.
    Unconstrained Dynamic Regret via Sparse Coding. (arXiv:2301.13349v2 [cs.LG] UPDATED)
    Motivated by time series forecasting, we study Online Linear Optimization (OLO) under the coupling of two problem structures: the domain is unbounded, and the performance of an algorithm is measured by its dynamic regret. Handling either of them requires the regret bound to depend on certain complexity measure of the comparator sequence -- specifically, the comparator norm in unconstrained OLO, and the path length in dynamic regret. In contrast to a recent work (Jacobsen & Cutkosky, 2022) that adapts to the combination of these two complexity measures, we propose an alternative complexity measure by recasting the problem into sparse coding. Adaptivity can be achieved by a simple modular framework, which naturally exploits more intricate prior knowledge of the environment. Along the way, we also present a new gradient adaptive algorithm for static unconstrained OLO, designed using novel continuous time machinery. This could be of independent interest.
    Semi-supervised counterfactual explanations. (arXiv:2303.12634v1 [cs.LG])
    Counterfactual explanations for machine learning models are used to find minimal interventions to the feature values such that the model changes the prediction to a different output or a target output. A valid counterfactual explanation should have likely feature values. Here, we address the challenge of generating counterfactual explanations that lie in the same data distribution as that of the training data and more importantly, they belong to the target class distribution. This requirement has been addressed through the incorporation of auto-encoder reconstruction loss in the counterfactual search process. Connecting the output behavior of the classifier to the latent space of the auto-encoder has further improved the speed of the counterfactual search process and the interpretability of the resulting counterfactual explanations. Continuing this line of research, we show further improvement in the interpretability of counterfactual explanations when the auto-encoder is trained in a semi-supervised fashion with class tagged input data. We empirically evaluate our approach on several datasets and show considerable improvement in-terms of several metrics.
    Information-Based Sensor Placement for Data-Driven Estimation of Unsteady Flows. (arXiv:2303.12260v1 [physics.flu-dyn])
    Estimation of unsteady flow fields around flight vehicles may improve flow interactions and lead to enhanced vehicle performance. Although flow-field representations can be very high-dimensional, their dynamics can have low-order representations and may be estimated using a few, appropriately placed measurements. This paper presents a sensor-selection framework for the intended application of data-driven, flow-field estimation. This framework combines data-driven modeling, steady-state Kalman Filter design, and a sparsification technique for sequential selection of sensors. This paper also uses the sensor selection framework to design sensor arrays that can perform well across a variety of operating conditions. Flow estimation results on numerical data show that the proposed framework produces arrays that are highly effective at flow-field estimation for the flow behind and an airfoil at a high angle of attack using embedded pressure sensors. Analysis of the flow fields reveals that paths of impinging stagnation points along the airfoil's surface during a shedding period of the flow are highly informative locations for placement of pressure sensors.  ( 2 min )
    Geometry-Aware Latent Representation Learning for Modeling Disease Progression of Barrett's Esophagus. (arXiv:2303.12711v1 [eess.IV])
    Barrett's Esophagus (BE) is the only precursor known to Esophageal Adenocarcinoma (EAC), a type of esophageal cancer with poor prognosis upon diagnosis. Therefore, diagnosing BE is crucial in preventing and treating esophageal cancer. While supervised machine learning supports BE diagnosis, high interobserver variability in histopathological training data limits these methods. Unsupervised representation learning via Variational Autoencoders (VAEs) shows promise, as they map input data to a lower-dimensional manifold with only useful features, characterizing BE progression for improved downstream tasks and insights. However, the VAE's Euclidean latent space distorts point relationships, hindering disease progression modeling. Geometric VAEs provide additional geometric structure to the latent space, with RHVAE assuming a Riemannian manifold and $\mathcal{S}$-VAE a hyperspherical manifold. Our study shows that $\mathcal{S}$-VAE outperforms vanilla VAE with better reconstruction losses, representation classification accuracies, and higher-quality generated images and interpolations in lower-dimensional settings. By disentangling rotation information from the latent space, we improve results further using a group-based architecture. Additionally, we take initial steps towards $\mathcal{S}$-AE, a novel autoencoder model generating qualitative images without a variational framework, but retaining benefits of autoencoders such as stability and reconstruction quality.
    Efficient Multi-view Clustering via Unified and Discrete Bipartite Graph Learning. (arXiv:2209.04187v2 [cs.LG] UPDATED)
    Although previous graph-based multi-view clustering algorithms have gained significant progress, most of them are still faced with three limitations. First, they often suffer from high computational complexity, which restricts their applications in large-scale scenarios. Second, they usually perform graph learning either at the single-view level or at the view-consensus level, but often neglect the possibility of the joint learning of single-view and consensus graphs. Third, many of them rely on the k-means for discretization of the spectral embeddings, which lack the ability to directly learn the graph with discrete cluster structure. In light of this, this paper presents an efficient multi-view clustering approach via unified and discrete bipartite graph learning (UDBGL). Specifically, the anchor-based subspace learning is incorporated to learn the view-specific bipartite graphs from multiple views, upon which the bipartite graph fusion is leveraged to learn a view-consensus bipartite graph with adaptive weight learning. Further, the Laplacian rank constraint is imposed to ensure that the fused bipartite graph has discrete cluster structures (with a specific number of connected components). By simultaneously formulating the view-specific bipartite graph learning, the view-consensus bipartite graph learning, and the discrete cluster structure learning into a unified objective function, an efficient minimization algorithm is then designed to tackle this optimization problem and directly achieve a discrete clustering solution without requiring additional partitioning, which notably has linear time complexity in data size. Experiments on a variety of multi-view datasets demonstrate the robustness and efficiency of our UDBGL approach. The code is available at https://github.com/huangdonghere/UDBGL.
    Guiding Online Reinforcement Learning with Action-Free Offline Pretraining. (arXiv:2301.12876v2 [cs.LG] UPDATED)
    Offline RL methods have been shown to reduce the need for environment interaction by training agents using offline collected episodes. However, these methods typically require action information to be logged during data collection, which can be difficult or even impossible in some practical cases. In this paper, we investigate the potential of using action-free offline datasets to improve online reinforcement learning, name this problem Reinforcement Learning with Action-Free Offline Pretraining (AFP-RL). We introduce Action-Free Guide (AF-Guide), a method that guides online training by extracting knowledge from action-free offline datasets. AF-Guide consists of an Action-Free Decision Transformer (AFDT) implementing a variant of Upside-Down Reinforcement Learning. It learns to plan the next states from the offline dataset, and a Guided Soft Actor-Critic (Guided SAC) that learns online with guidance from AFDT. Experimental results show that AF-Guide can improve sample efficiency and performance in online training thanks to the knowledge from the action-free offline dataset. Code is available at https://github.com/Vision-CAIR/AF-Guide.
    Challenges and opportunities for machine learning in multiscale computational modeling. (arXiv:2303.12261v1 [cs.LG])
    Many mechanical engineering applications call for multiscale computational modeling and simulation. However, solving for complex multiscale systems remains computationally onerous due to the high dimensionality of the solution space. Recently, machine learning (ML) has emerged as a promising solution that can either serve as a surrogate for, accelerate or augment traditional numerical methods. Pioneering work has demonstrated that ML provides solutions to governing systems of equations with comparable accuracy to those obtained using direct numerical methods, but with significantly faster computational speed. These high-speed, high-fidelity estimations can facilitate the solving of complex multiscale systems by providing a better initial solution to traditional solvers. This paper provides a perspective on the opportunities and challenges of using ML for complex multiscale modeling and simulation. We first outline the current state-of-the-art ML approaches for simulating multiscale systems and highlight some of the landmark developments. Next, we discuss current challenges for ML in multiscale computational modeling, such as the data and discretization dependence, interpretability, and data sharing and collaborative platform development. Finally, we suggest several potential research directions for the future.  ( 2 min )
    On Penalty-based Bilevel Gradient Descent Method. (arXiv:2302.05185v3 [cs.LG] UPDATED)
    Bilevel optimization enjoys a wide range of applications in hyper-parameter optimization, meta-learning and reinforcement learning. However, bilevel optimization problems are difficult to solve. Recent progress on scalable bilevel algorithms mainly focuses on bilevel optimization problems where the lower-level objective is either strongly convex or unconstrained. In this work, we tackle the bilevel problem through the lens of the penalty method. We show that under certain conditions, the penalty reformulation recovers the solutions of the original bilevel problem. Further, we propose the penalty-based bilevel gradient descent (PBGD) algorithm and establish its finite-time convergence for the constrained bilevel problem without lower-level strong convexity. Experiments showcase the efficiency of the proposed PBGD algorithm.
    New-Onset Diabetes Assessment Using Artificial Intelligence-Enhanced Electrocardiography. (arXiv:2205.02900v2 [cs.LG] UPDATED)
    Undiagnosed diabetes is present in 21.4% of adults with diabetes. Diabetes can remain asymptomatic and undetected due to limitations in screening rates. To address this issue, questionnaires, such as the American Diabetes Association (ADA) Risk test, have been recommended for use by physicians and the public. Based on evidence that blood glucose concentration can affect cardiac electrophysiology, we hypothesized that an artificial intelligence (AI)-enhanced electrocardiogram (ECG) could identify adults with new-onset diabetes. We trained a neural network to estimate HbA1c using a 12-lead ECG and readily available demographics. We retrospectively assembled a dataset comprised of patients with paired ECG and HbA1c data. The population of patients who receive both an ECG and HbA1c may a biased sample of the complete outpatient population, so we adjusted the importance placed on each patient to generate a more representative pseudo-population. We found ECG-based assessment outperforms the ADA Risk test, achieving a higher area under the curve (0.80 vs. 0.68) and positive predictive value (13% vs. 9%) -- 2.6 times the prevalence of diabetes in the cohort. The AI-enhanced ECG significantly outperforms electrophysiologist interpretation of the ECG, suggesting that the task is beyond current clinical capabilities. Given the prevalence of ECGs in clinics and via wearable devices, such a tool would make precise, automated diabetes assessment widely accessible.
    GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. (arXiv:2210.17323v2 [cs.LG] UPDATED)
    Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart through breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs. Specifically, due to their massive size, even inference for large, highly-accurate GPT models may require multiple performant GPUs, which limits the usability of such models. While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models. In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient. Specifically, GPTQ can quantize GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline. Our method more than doubles the compression gains relative to previously-proposed one-shot quantization methods, preserving accuracy, allowing us for the first time to execute an 175 billion-parameter model inside a single GPU for generative inference. Moreover, we also show that our method can still provide reasonable accuracy in the extreme quantization regime, in which weights are quantized to 2-bit or even ternary quantization levels. We show experimentally that these improvements can be leveraged for end-to-end inference speedups over FP16, of around 3.25x when using high-end GPUs (NVIDIA A100) and 4.5x when using more cost-effective ones (NVIDIA A6000). The implementation is available at https://github.com/IST-DASLab/gptq.
    A physics-aware deep learning model for energy localization in multiscale shock-to-detonation simulations of heterogeneous energetic materials. (arXiv:2211.04561v2 [cond-mat.mtrl-sci] UPDATED)
    Predictive simulations of the shock-to-detonation transition (SDT) in heterogeneous energetic materials (EM) are vital to the design and control of their energy release and sensitivity. Due to the complexity of the thermo-mechanics of EM during the SDT, both macro-scale response and sub-grid mesoscale energy localization must be captured accurately. This work proposes an efficient and accurate multiscale framework for SDT simulations of EM. We introduce a new approach for SDT simulation by using deep learning to model the mesoscale energy localization of shock-initiated EM microstructures. The proposed multiscale modeling framework is divided into two stages. First, a physics-aware recurrent convolutional neural network (PARC) is used to model the mesoscale energy localization of shock-initiated heterogeneous EM microstructures. PARC is trained using direct numerical simulations (DNS) of hotspot ignition and growth within microstructures of pressed HMX material subjected to different input shock strengths. After training, PARC is employed to supply hotspot ignition and growth rates for macroscale SDT simulations. We show that PARC can play the role of a surrogate model in a multiscale simulation framework, while drastically reducing the computation cost and providing improved representations of the sub-grid physics. The proposed multiscale modeling approach will provide a new tool for material scientists in designing high-performance and safer energetic materials.
    AIREPAIR: A Repair Platform for Neural Networks. (arXiv:2211.15387v2 [cs.NE] UPDATED)
    We present AIREPAIR, a platform for repairing neural networks. It features the integration of existing network repair tools. Based on AIREPAIR, one can run different repair methods on the same model, thus enabling the fair comparison of different repair techniques. We evaluate AIREPAIR with three state-of-the-art repair tools on popular deep-learning datasets and models. Our evaluation confirms the utility of AIREPAIR, by comparing and analyzing the results from different repair techniques. A demonstration is available at https://youtu.be/UkKw5neeWhw.
    SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation. (arXiv:2212.04493v2 [cs.CV] UPDATED)
    In this work, we present a novel framework built to simplify 3D asset generation for amateur users. To enable interactive generation, our method supports a variety of input modalities that can be easily provided by a human, including images, text, partially observed shapes and combinations of these, further allowing to adjust the strength of each input. At the core of our approach is an encoder-decoder, compressing 3D shapes into a compact latent representation, upon which a diffusion model is learned. To enable a variety of multi-modal inputs, we employ task-specific encoders with dropout followed by a cross-attention mechanism. Due to its flexibility, our model naturally supports a variety of tasks, outperforming prior works on shape completion, image-based 3D reconstruction, and text-to-3D. Most interestingly, our model can combine all these tasks into one swiss-army-knife tool, enabling the user to perform shape generation using incomplete shapes, images, and textual descriptions at the same time, providing the relative weights for each input and facilitating interactivity. Despite our approach being shape-only, we further show an efficient method to texture the generated shape using large-scale text-to-image models.
    Legs as Manipulator: Pushing Quadrupedal Agility Beyond Locomotion. (arXiv:2303.11330v2 [cs.RO] UPDATED)
    Locomotion has seen dramatic progress for walking or running across challenging terrains. However, robotic quadrupeds are still far behind their biological counterparts, such as dogs, which display a variety of agile skills and can use the legs beyond locomotion to perform several basic manipulation tasks like interacting with objects and climbing. In this paper, we take a step towards bridging this gap by training quadruped robots not only to walk but also to use the front legs to climb walls, press buttons, and perform object interaction in the real world. To handle this challenging optimization, we decouple the skill learning broadly into locomotion, which involves anything that involves movement whether via walking or climbing a wall, and manipulation, which involves using one leg to interact while balancing on the other three legs. These skills are trained in simulation using curriculum and transferred to the real world using our proposed sim2real variant that builds upon recent locomotion success. Finally, we combine these skills into a robust long-term plan by learning a behavior tree that encodes a high-level task hierarchy from one clean expert demonstration. We evaluate our method in both simulation and real-world showing successful executions of both short as well as long-range tasks and how robustness helps confront external perturbations. Videos at https://robot-skills.github.io
    The Alberta Plan for AI Research. (arXiv:2208.11173v3 [cs.AI] UPDATED)
    Herein we describe our approach to artificial intelligence research, which we call the Alberta Plan. The Alberta Plan is pursued within our research groups in Alberta and by others who are like minded throughout the world. We welcome all who would join us in this pursuit.
    Can we trust the evaluation on ChatGPT?. (arXiv:2303.12767v1 [cs.CL])
    ChatGPT, the first large language model (LLM) with mass adoption, has demonstrated remarkable performance in numerous natural language tasks. Despite its evident usefulness, evaluating ChatGPT's performance in diverse problem domains remains challenging due to the closed nature of the model and its continuous updates via Reinforcement Learning from Human Feedback (RLHF). We highlight the issue of data contamination in ChatGPT evaluations, with a case study of the task of stance detection. We discuss the challenge of preventing data contamination and ensuring fair model evaluation in the age of closed and continuously trained models.
    Comparison of Probabilistic Deep Learning Methods for Autism Detection. (arXiv:2303.12707v1 [cs.CV])
    Autism Spectrum Disorder (ASD) is one neuro developmental disorder that is now widespread in the world. ASD persists throughout the life of an individual, impacting the way they behave and communicate, resulting to notable deficits consisting of social life retardation, repeated behavioural traits and a restriction in their interests. Early detection of the disorder helps in the onset treatment and helps one to lead a normal life. There are clinical approaches used in detection of autism, relying on behavioural data and in worst cases, neuroimaging. Quantitative methods involving machine learning have been studied and developed to overcome issues with clinical approaches. These quantitative methods rely on machine learning, with some complex methods based on deep learning developed to accelerate detection and diagnosis of ASD. These literature is aimed at exploring most state-of-the-art probabilistic methods in use today, characterizing them with the type of dataset they're most applied on, their accuracy according to their novel research and how well they are suited in ASD classification. The findings will purposely serve as a benchmark in selection of the model to use when performing ASD detection.
    DG-Trans: Dual-level Graph Transformer for Spatiotemporal Incident Impact Prediction on Traffic Networks. (arXiv:2303.12238v1 [cs.LG])
    The prompt estimation of traffic incident impacts can guide commuters in their trip planning and improve the resilience of transportation agencies' decision-making on resilience. However, it is more challenging than node-level and graph-level forecasting tasks, as it requires extracting the anomaly subgraph or sub-time-series from dynamic graphs. In this paper, we propose DG-Trans, a novel traffic incident impact prediction framework, to foresee the impact of traffic incidents through dynamic graph learning. The proposed framework contains a dual-level spatial transformer and an importance-score-based temporal transformer, and the performance of this framework is justified by two newly constructed benchmark datasets. The dual-level spatial transformer removes unnecessary edges between nodes to isolate the affected subgraph from the other nodes. Meanwhile, the importance-score-based temporal transformer identifies abnormal changes in node features, causing the predictions to rely more on measurement changes after the incident occurs. Therefore, DG-Trans is equipped with dual abilities that extract spatiotemporal dependency and identify anomaly nodes affected by incidents while removing noise introduced by benign nodes. Extensive experiments on real-world datasets verify that DG-Trans outperforms the existing state-of-the-art methods, especially in extracting spatiotemporal dependency patterns and predicting traffic accident impacts. It offers promising potential for traffic incident management systems.
    Encoding Binary Concepts in the Latent Space of Generative Models for Enhancing Data Representation. (arXiv:2303.12255v1 [cs.LG])
    Binary concepts are empirically used by humans to generalize efficiently. And they are based on Bernoulli distribution which is the building block of information. These concepts span both low-level and high-level features such as "large vs small" and "a neuron is active or inactive". Binary concepts are ubiquitous features and can be used to transfer knowledge to improve model generalization. We propose a novel binarized regularization to facilitate learning of binary concepts to improve the quality of data generation in autoencoders. We introduce a binarizing hyperparameter $r$ in data generation process to disentangle the latent space symmetrically. We demonstrate that this method can be applied easily to existing variational autoencoder (VAE) variants to encourage symmetric disentanglement, improve reconstruction quality, and prevent posterior collapse without computation overhead. We also demonstrate that this method can boost existing models to learn more transferable representations and generate more representative samples for the input distribution which can alleviate catastrophic forgetting using generative replay under continual learning settings.
    MultiModal Bias: Introducing a Framework for Stereotypical Bias Assessment beyond Gender and Race in Vision Language Models. (arXiv:2303.12734v1 [cs.CV])
    Recent breakthroughs in self supervised training have led to a new class of pretrained vision language models. While there have been investigations of bias in multimodal models, they have mostly focused on gender and racial bias, giving much less attention to other relevant groups, such as minorities with regard to religion, nationality, sexual orientation, or disabilities. This is mainly due to lack of suitable benchmarks for such groups. We seek to address this gap by providing a visual and textual bias benchmark called MMBias, consisting of around 3,800 images and phrases covering 14 population subgroups. We utilize this dataset to assess bias in several prominent self supervised multimodal models, including CLIP, ALBEF, and ViLT. Our results show that these models demonstrate meaningful bias favoring certain groups. Finally, we introduce a debiasing method designed specifically for such large pre-trained models that can be applied as a post-processing step to mitigate bias, while preserving the remaining accuracy of the model.
    Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises. (arXiv:2303.12277v1 [math.OC])
    Recently, several studies consider the stochastic optimization problem but in a heavy-tailed noise regime, i.e., the difference between the stochastic gradient and the true gradient is assumed to have a finite $p$-th moment (say being upper bounded by $\sigma^{p}$ for some $\sigma\geq0$) where $p\in(1,2]$, which not only generalizes the traditional finite variance assumption ($p=2$) but also has been observed in practice for several different tasks. Under this challenging assumption, lots of new progress has been made for either convex or nonconvex problems, however, most of which only consider smooth objectives. In contrast, people have not fully explored and well understood this problem when functions are nonsmooth. This paper aims to fill this crucial gap by providing a comprehensive analysis of stochastic nonsmooth convex optimization with heavy-tailed noises. We revisit a simple clipping-based algorithm, whereas, which is only proved to converge in expectation but under the additional strong convexity assumption. Under appropriate choices of parameters, for both convex and strongly convex functions, we not only establish the first high-probability rates but also give refined in-expectation bounds compared with existing works. Remarkably, all of our results are optimal (or nearly optimal up to logarithmic factors) with respect to the time horizon $T$ even when $T$ is unknown in advance. Additionally, we show how to make the algorithm parameter-free with respect to $\sigma$, in other words, the algorithm can still guarantee convergence without any prior knowledge of $\sigma$.
    Distribution-restrained Softmax Loss for the Model Robustness. (arXiv:2303.12363v1 [cs.LG])
    Recently, the robustness of deep learning models has received widespread attention, and various methods for improving model robustness have been proposed, including adversarial training, model architecture modification, design of loss functions, certified defenses, and so on. However, the principle of the robustness to attacks is still not fully understood, also the related research is still not sufficient. Here, we have identified a significant factor that affects the robustness of models: the distribution characteristics of softmax values for non-real label samples. We found that the results after an attack are highly correlated with the distribution characteristics, and thus we proposed a loss function to suppress the distribution diversity of softmax. A large number of experiments have shown that our method can improve robustness without significant time consumption.
    Inverting the Fundamental Diagram and Forecasting Boundary Conditions: How Machine Learning Can Improve Macroscopic Models for Traffic Flow. (arXiv:2303.12740v1 [cs.LG])
    In this paper, we aim at developing new methods to join machine learning techniques and macroscopic differential models for vehicular traffic estimation and forecast. It is well known that data-driven and model-driven approaches have (sometimes complementary) advantages and drawbacks. We consider here a dataset with flux and velocity data of vehicles moving on a highway, collected by fixed sensors and classified by lane and by class of vehicle. By means of a machine learning model based on an LSTM recursive neural network, we extrapolate two important pieces of information: 1) if congestion is appearing under the sensor, and 2) the total amount of vehicles which is going to pass under the sensor in the next future (30 min). These pieces of information are then used to improve the accuracy of an LWR-based first-order multi-class model describing the dynamics of traffic flow between sensors. The first piece of information is used to invert the (concave) fundamental diagram, thus recovering the density of vehicles from the flux data, and then inject directly the density datum in the model. This allows one to better approximate the dynamics between sensors, especially if an accident happens in a not monitored stretch of the road. The second piece of information is used instead as boundary conditions for the equations underlying the traffic model, to better reconstruct the total amount of vehicles on the road at any future time. Some examples motivated by real scenarios will be discussed. Real data are provided by the Italian motorway company Autovie Venete S.p.A.
    Optimizing CAD Models with Latent Space Manipulation. (arXiv:2303.12739v1 [cs.CV])
    When it comes to the optimization of CAD models in the automation domain, neural networks currently play only a minor role. Optimizing abstract features such as automation capability is challenging, since they can be very difficult to simulate, are too complex for rule-based systems, and also have little to no data available for machine-learning methods. On the other hand, image manipulation methods that can manipulate abstract features in images such as StyleCLIP have seen much success. They rely on the latent space of pretrained generative adversarial networks, and could therefore also make use of the vast amount of unlabeled CAD data. In this paper, we show that such an approach is also suitable for optimizing abstract automation-related features of CAD parts. We achieved this by extending StyleCLIP to work with CAD models in the form of voxel models, which includes using a 3D StyleGAN and a custom classifier. Finally, we demonstrate the ability of our system for the optimiziation of automation-related features by optimizing the grabability of various CAD models. This is an open access article under the CC BY-NC-ND license (this http URL) Peer review under the responsibility of the scientific committee of the 33rd CIRP Design Conference.
    Causal Reasoning in the Presence of Latent Confounders via Neural ADMG Learning. (arXiv:2303.12703v1 [cs.LG])
    Latent confounding has been a long-standing obstacle for causal reasoning from observational data. One popular approach is to model the data using acyclic directed mixed graphs (ADMGs), which describe ancestral relations between variables using directed and bidirected edges. However, existing methods using ADMGs are based on either linear functional assumptions or a discrete search that is complicated to use and lacks computational tractability for large datasets. In this work, we further extend the existing body of work and develop a novel gradient-based approach to learning an ADMG with non-linear functional relations from observational data. We first show that the presence of latent confounding is identifiable under the assumptions of bow-free ADMGs with non-linear additive noise models. With this insight, we propose a novel neural causal model based on autoregressive flows for ADMG learning. This not only enables us to determine complex causal structural relationships behind the data in the presence of latent confounding, but also estimate their functional relationships (hence treatment effects) simultaneously. We further validate our approach via experiments on both synthetic and real-world datasets, and demonstrate the competitive performance against relevant baselines.
    Open-source Frame Semantic Parsing. (arXiv:2303.12788v1 [cs.CL])
    While the state-of-the-art for frame semantic parsing has progressed dramatically in recent years, it is still difficult for end-users to apply state-of-the-art models in practice. To address this, we present Frame Semantic Transformer, an open-source Python library which achieves near state-of-the-art performance on FrameNet 1.7, while focusing on ease-of-use. We use a T5 model fine-tuned on Propbank and FrameNet exemplars as a base, and improve performance by using FrameNet lexical units to provide hints to T5 at inference time. We enhance robustness to real-world data by using textual data augmentations during training.
    DeepProphet2 -- A Deep Learning Gene Recommendation Engine. (arXiv:2208.01918v4 [q-bio.QM] UPDATED)
    New powerful tools for tackling life science problems have been created by recent advances in machine learning. The purpose of the paper is to discuss the potential advantages of gene recommendation performed by artificial intelligence (AI). Indeed, gene recommendation engines try to solve this problem: if the user is interested in a set of genes, which other genes are likely to be related to the starting set and should be investigated? This task was solved with a custom deep learning recommendation engine, DeepProphet2 (DP2), which is freely available to researchers worldwide via https://www.generecommender.com?utm_source=DeepProphet2_paper&utm_medium=pdf. Hereafter, insights behind the algorithm and its practical applications are illustrated. The gene recommendation problem can be addressed by mapping the genes to a metric space where a distance can be defined to represent the real semantic distance between them. To achieve this objective a transformer-based model has been trained on a well-curated freely available paper corpus, PubMed. The paper describes multiple optimization procedures that were employed to obtain the best bias-variance trade-off, focusing on embedding size and network depth. In this context, the model's ability to discover sets of genes implicated in diseases and pathways was assessed through cross-validation. A simple assumption guided the procedure: the network had no direct knowledge of pathways and diseases but learned genes' similarities and the interactions among them. Moreover, to further investigate the space where the neural network represents genes, the dimensionality of the embedding was reduced, and the results were projected onto a human-comprehensible space. In conclusion, a set of use cases illustrates the algorithm's potential applications in a real word setting.
    Scalable Bayesian optimization with high-dimensional outputs using randomized prior networks. (arXiv:2302.07260v2 [cs.LG] UPDATED)
    Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluations, but their performance suffers when dealing with high-dimensional outputs. To overcome the major challenge of dimensionality, here we propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. Using appropriate architecture choices, we show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. In the context of BO, we augmented the proposed probabilistic surrogates with re-parameterized Monte Carlo approximations of multiple-point (parallel) acquisition functions, as well as methodological extensions for accommodating black-box constraints and multi-fidelity information sources. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs, including a constrained optimization task involving shape optimization of rotor blades in turbo-machinery.
    Orthogonal Non-negative Matrix Factorization: a Maximum-Entropy-Principle Approach. (arXiv:2210.02672v2 [cs.DS] UPDATED)
    In this paper, we introduce a new methodology to solve the orthogonal nonnegative matrix factorization (ONMF) problem, where the objective is to approximate an input data matrix by a product of two nonnegative matrices, the features matrix and the mixing matrix, where one of them is orthogonal. We show how the ONMF can be interpreted as a specific facility-location problem (FLP), and adapt a maximum-entropy-principle based solution for FLP to the ONMF problem. The proposed approach guarantees orthogonality and sparsity of the features or the mixing matrix, while ensuring nonnegativity of both. Additionally, our methodology develops a quantitative characterization of ``true" number of underlying features - a hyperparameter required for the ONMF. An evaluation of the proposed method conducted on synthetic datasets, as well as a standard genetic microarray dataset indicates significantly better sparsity, orthogonality, and performance speed compared to similar methods in the literature, with comparable or improved reconstruction errors.
    Shape-Guided Diffusion with Inside-Outside Attention. (arXiv:2212.00210v2 [cs.CV] UPDATED)
    When manipulating an object, existing text-to-image diffusion models often ignore the shape of the object and generate content that is incorrectly scaled, cut off, or replaced with background content. We propose a training-free method, Shape-Guided Diffusion, that modifies pretrained diffusion models to be sensitive to shape input specified by a user or automatically inferred from text. We use a novel Inside-Outside Attention mechanism during the inversion and generation process to apply this shape constraint to the cross- and self-attention maps. Our mechanism designates which spatial region is the object (inside) vs. background (outside) then associates edits specified by text prompts to the correct region. We demonstrate the efficacy of our method on the shape-guided editing task, where the model must replace an object according to a text prompt and object mask. We curate a new ShapePrompts benchmark derived from MS-COCO and achieve SOTA results in shape faithfulness without a degradation in text alignment or image realism according to both automatic metrics and annotator ratings. Our data and code will be made available at https://shape-guided-diffusion.github.io.
    Long-term Causal Effects Estimation via Latent Surrogates Representation Learning. (arXiv:2208.04589v2 [cs.LG] UPDATED)
    Estimating long-term causal effects based on short-term surrogates is a significant but challenging problem in many real-world applications, e.g., marketing and medicine. Despite its success in certain domains, most existing methods estimate causal effects in an idealistic and simplistic way - ignoring the causal structure among short-term outcomes and treating all of them as surrogates. However, such methods cannot be well applied to real-world scenarios, in which the partially observed surrogates are mixed with their proxies among short-term outcomes. To this end, we develop our flexible method, Laser, to estimate long-term causal effects in the more realistic situation that the surrogates are observed or have observed proxies.Given the indistinguishability between the surrogates and proxies, we utilize identifiable variational auto-encoder (iVAE) to recover the whole valid surrogates on all the surrogates candidates without the need of distinguishing the observed surrogates or the proxies of latent surrogates. With the help of the recovered surrogates, we further devise an unbiased estimation of long-term causal effects. Extensive experimental results on the real-world and semi-synthetic datasets demonstrate the effectiveness of our proposed method.
    Sequence Generation via Subsequence Similarity: Theory and Application to UAV Identification. (arXiv:2301.08403v2 [cs.LG] UPDATED)
    The ability to generate synthetic sequences is crucial for a wide range of applications, and recent advances in deep learning architectures and generative frameworks have greatly facilitated this process. Particularly, unconditional one-shot generative models constitute an attractive line of research that focuses on capturing the internal information of a single image or video to generate samples with similar contents. Since many of those one-shot models are shifting toward efficient non-deep and non-adversarial approaches, we examine the versatility of a one-shot generative model for augmenting whole datasets. In this work, we focus on how similarity at the subsequence level affects similarity at the sequence level, and derive bounds on the optimal transport of real and generated sequences based on that of corresponding subsequences. We use a one-shot generative model to sample from the vicinity of individual sequences and generate subsequence-similar ones and demonstrate the improvement of this approach by applying it to the problem of Unmanned Aerial Vehicle (UAV) identification using limited radio-frequency (RF) signals. In the context of UAV identification, RF fingerprinting is an effective method for distinguishing legitimate devices from malicious ones, but heterogenous environments and channel impairments can impose data scarcity and affect the performance of classification models. By using subsequence similarity to augment sequences of RF data with a low ratio (5%-20%) of training dataset, we achieve significant improvements in performance metrics such as accuracy, precision, recall, and F1 score.
    Instance-Dependent Bounds for Zeroth-order Lipschitz Optimization with Error Certificates. (arXiv:2102.01977v5 [math.ST] CROSS LISTED)
    We study the problem of zeroth-order (black-box) optimization of a Lipschitz function $f$ defined on a compact subset $\mathcal X$ of $\mathbb R^d$, with the additional constraint that algorithms must certify the accuracy of their recommendations. We characterize the optimal number of evaluations of any Lipschitz function $f$ to find and certify an approximate maximizer of $f$ at accuracy $\varepsilon$. Under a weak assumption on $\mathcal X$, this optimal sample complexity is shown to be nearly proportional to the integral $\int_{\mathcal X} \mathrm{d}\boldsymbol x/( \max(f) - f(\boldsymbol x) + \varepsilon )^d$. This result, which was only (and partially) known in dimension $d=1$, solves an open problem dating back to 1991. In terms of techniques, our upper bound relies on a packing bound by Bouttier al. (2020) for the Piyavskii-Shubert algorithm that we link to the above integral. We also show that a certified version of the computationally tractable DOO algorithm matches these packing and integral bounds. Our instance-dependent lower bound differs from traditional worst-case lower bounds in the Lipschitz setting and relies on a local worst-case analysis that could likely prove useful for other learning tasks.
    Near-optimal inference in adaptive linear regression. (arXiv:2107.02266v3 [math.ST] UPDATED)
    When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation. Our proposed methods take advantage of the covariance structure present in the dataset and provide sharper estimates in directions for which more information has accrued. We establish an asymptotic normality property for our proposed online debiasing estimators under mild conditions on the data collection process and provide asymptotically exact confidence intervals. We additionally prove a minimax lower bound for the adaptive linear regression problem, thereby providing a baseline by which to compare estimators. There are various conditions under which our proposed estimators achieve the minimax lower bound. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
    Interpretable Reinforcement Learning via Neural Additive Models for Inventory Management. (arXiv:2303.10382v2 [cs.LG] UPDATED)
    The COVID-19 pandemic has highlighted the importance of supply chains and the role of digital management to react to dynamic changes in the environment. In this work, we focus on developing dynamic inventory ordering policies for a multi-echelon, i.e. multi-stage, supply chain. Traditional inventory optimization methods aim to determine a static reordering policy. Thus, these policies are not able to adjust to dynamic changes such as those observed during the COVID-19 crisis. On the other hand, conventional strategies offer the advantage of being interpretable, which is a crucial feature for supply chain managers in order to communicate decisions to their stakeholders. To address this limitation, we propose an interpretable reinforcement learning approach that aims to be as interpretable as the traditional static policies while being as flexible and environment-agnostic as other deep learning-based reinforcement learning solutions. We propose to use Neural Additive Models as an interpretable dynamic policy of a reinforcement learning agent, showing that this approach is competitive with a standard full connected policy. Finally, we use the interpretability property to gain insights into a complex ordering strategy for a simple, linear three-echelon inventory supply chain.
    A numerical approximation method for the Fisher-Rao distance between multivariate normal distributions. (arXiv:2302.08175v5 [cs.IT] UPDATED)
    We present a simple method to approximate Rao's distance between multivariate normal distributions based on discretizing curves joining normal distributions and approximating Rao's distances between successive nearby normal distributions on the curves by the square root of Jeffreys divergence, the symmetrized Kullback-Leibler divergence. We consider experimentally the linear interpolation curves in the ordinary, natural and expectation parameterizations of the normal distributions, and compare these curves with a curve derived from the Calvo and Oller's isometric embedding of the Fisher-Rao $d$-variate normal manifold into the cone of $(d+1)\times (d+1)$ symmetric positive-definite matrices [Journal of multivariate analysis 35.2 (1990): 223-242]. We report on our experiments and assess the quality of our approximation technique by comparing the numerical approximations with both lower and upper bounds. Finally, we present several information-geometric properties of the Calvo and Oller's isometric embedding.
    Causal Reasoning Meets Visual Representation Learning: A Prospective Study. (arXiv:2204.12037v8 [cs.CV] UPDATED)
    Visual representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing. Due to the emergence of huge amounts of multi-modal heterogeneous spatial/temporal/spatial-temporal data in big data era, the lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models. The majority of the existing methods tend to fit the original data/variable distributions and ignore the essential causal relations behind the multi-modal knowledge, which lacks unified guidance and analysis about why modern visual representation learning methods easily collapse into data bias and have limited generalization and cognitive abilities. Inspired by the strong inference ability of human-level agents, recent years have therefore witnessed great effort in developing causal reasoning paradigms to realize robust representation and model learning with good cognitive ability. In this paper, we conduct a comprehensive review of existing causal reasoning methods for visual representation learning, covering fundamental theories, models, and datasets. The limitations of current methods and datasets are also discussed. Moreover, we propose some prospective challenges, opportunities, and future research directions for benchmarking causal reasoning algorithms in visual representation learning. This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods, publicly available benchmarks, and consensus-building standards for reliable visual representation learning and related real-world applications more efficiently.
    Matrix Completion with Cross-Concentrated Sampling: Bridging Uniform Sampling and CUR Sampling. (arXiv:2208.09723v2 [cs.LG] UPDATED)
    While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform sampling and CUR sampling, CCS provides extra flexibility that can potentially save sampling costs in applications. In addition, we also provide a sufficient condition for CCS-based matrix completion. Moreover, we propose a highly efficient non-convex algorithm, termed Iterative CUR Completion (ICURC), for the proposed CCS model. Numerical experiments verify the empirical advantages of CCS and ICURC against uniform sampling and its baseline algorithms, on both synthetic and real-world datasets.
    Less is More: Unsupervised Mask-guided Annotated CT Image Synthesis with Minimum Manual Segmentations. (arXiv:2303.12747v1 [eess.IV])
    As a pragmatic data augmentation tool, data synthesis has generally returned dividends in performance for deep learning based medical image analysis. However, generating corresponding segmentation masks for synthetic medical images is laborious and subjective. To obtain paired synthetic medical images and segmentations, conditional generative models that use segmentation masks as synthesis conditions were proposed. However, these segmentation mask-conditioned generative models still relied on large, varied, and labeled training datasets, and they could only provide limited constraints on human anatomical structures, leading to unrealistic image features. Moreover, the invariant pixel-level conditions could reduce the variety of synthetic lesions and thus reduce the efficacy of data augmentation. To address these issues, in this work, we propose a novel strategy for medical image synthesis, namely Unsupervised Mask (UM)-guided synthesis, to obtain both synthetic images and segmentations using limited manual segmentation labels. We first develop a superpixel based algorithm to generate unsupervised structural guidance and then design a conditional generative model to synthesize images and annotations simultaneously from those unsupervised masks in a semi-supervised multi-task setting. In addition, we devise a multi-scale multi-task Fr\'echet Inception Distance (MM-FID) and multi-scale multi-task standard deviation (MM-STD) to harness both fidelity and variety evaluations of synthetic CT images. With multiple analyses on different scales, we could produce stable image quality measurements with high reproducibility. Compared with the segmentation mask guided synthesis, our UM-guided synthesis provided high-quality synthetic images with significantly higher fidelity, variety, and utility ($p<0.05$ by Wilcoxon Signed Ranked test).
    Adaptive Conformal Prediction by Reweighting Nonconformity Score. (arXiv:2303.12695v1 [stat.ML])
    Despite attractive theoretical guarantees and practical successes, Predictive Interval (PI) given by Conformal Prediction (CP) may not reflect the uncertainty of a given model. This limitation arises from CP methods using a constant correction for all test points, disregarding their individual uncertainties, to ensure coverage properties. To address this issue, we propose using a Quantile Regression Forest (QRF) to learn the distribution of nonconformity scores and utilizing the QRF's weights to assign more importance to samples with residuals similar to the test point. This approach results in PI lengths that are more aligned with the model's uncertainty. In addition, the weights learnt by the QRF provide a partition of the features space, allowing for more efficient computations and improved adaptiveness of the PI through groupwise conformalization. Our approach enjoys an assumption-free finite sample marginal and training-conditional coverage, and under suitable assumptions, it also ensures conditional coverage. Our methods work for any nonconformity score and are available as a Python package. We conduct experiments on simulated and real-world data that demonstrate significant improvements compared to existing methods.
    ExpressivE: A Spatio-Functional Embedding For Knowledge Graph Completion. (arXiv:2206.04192v2 [cs.LG] UPDATED)
    Knowledge graphs are inherently incomplete. Therefore substantial research has been directed toward knowledge graph completion (KGC), i.e., predicting missing triples from the information represented in the knowledge graph (KG). KG embedding models (KGEs) have yielded promising results for KGC, yet any current KGE is incapable of: (1) fully capturing vital inference patterns (e.g., composition), (2) capturing prominent patterns jointly (e.g., hierarchy and composition), and (3) providing an intuitive interpretation of captured patterns. In this work, we propose ExpressivE, a fully expressive spatio-functional KGE that solves all these challenges simultaneously. ExpressivE embeds pairs of entities as points and relations as hyper-parallelograms in the virtual triple space $\mathbb{R}^{2d}$. This model design allows ExpressivE not only to capture a rich set of inference patterns jointly but additionally to display any supported inference pattern through the spatial relation of hyper-parallelograms, offering an intuitive and consistent geometric interpretation of ExpressivE embeddings and their captured patterns. Experimental results on standard KGC benchmarks reveal that ExpressivE is competitive with state-of-the-art KGEs and even significantly outperforms them on WN18RR.
    Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality. (arXiv:2303.12785v1 [cs.LG])
    A novel Policy Gradient (PG) algorithm, called Matryoshka Policy Gradient (MPG), is introduced and studied, in the context of max-entropy reinforcement learning, where an agent aims at maximising entropy bonuses additional to its cumulative rewards. MPG differs from standard PG in that it trains a sequence of policies to learn finite horizon tasks simultaneously, instead of a single policy for the single standard objective. For softmax policies, we prove convergence of MPG and global optimality of the limit by showing that the only critical point of the MPG objective is the optimal policy; these results hold true even in the case of continuous compact state space. MPG is intuitive, theoretically sound and we furthermore show that the optimal policy of the standard max-entropy objective can be approximated arbitrarily well by the optimal policy of the MPG framework. Finally, we justify that MPG is well suited when the policies are parametrized with neural networks and we provide an simple criterion to verify the global optimality of the policy at convergence. As a proof of concept, we evaluate numerically MPG on standard test benchmarks.
    On The Effects Of Data Normalisation For Domain Adaptation On EEG Data. (arXiv:2210.01081v2 [cs.LG] UPDATED)
    In the Machine Learning (ML) literature, a well-known problem is the Dataset Shift problem where, differently from the ML standard hypothesis, the data in the training and test sets can follow different probability distributions, leading ML systems toward poor generalisation performances. This problem is intensely felt in the Brain-Computer Interface (BCI) context, where bio-signals as Electroencephalographic (EEG) are often used. In fact, EEG signals are highly non-stationary both over time and between different subjects. To overcome this problem, several proposed solutions are based on recent transfer learning approaches such as Domain Adaption (DA). In several cases, however, the actual causes of the improvements remain ambiguous. This paper focuses on the impact of data normalisation, or standardisation strategies applied together with DA methods. In particular, using \textit{SEED}, \textit{DEAP}, and \textit{BCI Competition IV 2a} EEG datasets, we experimentally evaluated the impact of different normalization strategies applied with and without several well-known DA methods, comparing the obtained performances. It results that the choice of the normalisation strategy plays a key role on the classifier performances in DA scenarios, and interestingly, in several cases, the use of only an appropriate normalisation schema outperforms the DA technique.
    Enabling Calibration In The Zero-Shot Inference of Large Vision-Language Models. (arXiv:2303.12748v1 [cs.CV])
    Calibration of deep learning models is crucial to their trustworthiness and safe usage, and as such, has been extensively studied in supervised classification models, with methods crafted to decrease miscalibration. However, there has yet to be a comprehensive study of the calibration of vision-language models that are used for zero-shot inference, like CLIP. We measure calibration across relevant variables like prompt, dataset, and architecture, and find that zero-shot inference with CLIP is miscalibrated. Furthermore, we propose a modified version of temperature scaling that is aligned with the common use cases of CLIP as a zero-shot inference model, and show that a single learned temperature generalizes for each specific CLIP model (defined by a chosen pre-training dataset and architecture) across inference dataset and prompt choice.
    Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic Games with Independent Chains. (arXiv:2201.12224v4 [cs.LG] UPDATED)
    We consider a subclass of $n$-player stochastic games, in which players have their own internal state/action spaces while they are coupled through their payoff functions. It is assumed that players' internal chains are driven by independent transition probabilities. Moreover, players can receive only realizations of their payoffs, not the actual functions, and cannot observe each other's states/actions. For this class of games, we first show that finding a stationary Nash equilibrium (NE) policy without any assumption on the reward functions is interactable. However, for general reward functions, we develop polynomial-time learning algorithms based on dual averaging and dual mirror descent, which converge in terms of the averaged Nikaido-Isoda distance to the set of $\epsilon$-NE policies almost surely or in expectation. In particular, under extra assumptions on the reward functions such as social concavity, we derive polynomial upper bounds on the number of iterates to achieve an $\epsilon$-NE policy with high probability. Finally, we evaluate the effectiveness of the proposed algorithms in learning $\epsilon$-NE policies using numerical experiments for energy management in smart grids.
    MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures. (arXiv:2208.00277v4 [cs.CV] UPDATED)
    Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views. However, they rely upon specialized volumetric rendering algorithms based on ray marching that are mismatched to the capabilities of widely deployed graphics hardware. This paper introduces a new NeRF representation based on textured polygons that can synthesize novel images efficiently with standard rendering pipelines. The NeRF is represented as a set of polygons with textures representing binary opacities and feature vectors. Traditional rendering of the polygons with a z-buffer yields an image with features at every pixel, which are interpreted by a small, view-dependent MLP running in a fragment shader to produce a final pixel color. This approach enables NeRFs to be rendered with the traditional polygon rasterization pipeline, which provides massive pixel-level parallelism, achieving interactive frame rates on a wide range of compute platforms, including mobile phones.
    Learning Fractals by Gradient Descent. (arXiv:2303.12722v1 [cs.CV])
    Fractals are geometric shapes that can display complex and self-similar patterns found in nature (e.g., clouds and plants). Recent works in visual recognition have leveraged this property to create random fractal images for model pre-training. In this paper, we study the inverse problem -- given a target image (not necessarily a fractal), we aim to generate a fractal image that looks like it. We propose a novel approach that learns the parameters underlying a fractal image via gradient descent. We show that our approach can find fractal parameters of high visual quality and be compatible with different loss functions, opening up several potentials, e.g., learning fractals for downstream tasks, scientific understanding, etc.
    LSTM-based Video Quality Prediction Accounting for Temporal Distortions in Videoconferencing Calls. (arXiv:2303.12761v1 [eess.IV])
    Current state-of-the-art video quality models, such as VMAF, give excellent prediction results by comparing the degraded video with its reference video. However, they do not consider temporal distortions (e.g., frame freezes or skips) that occur during videoconferencing calls. In this paper, we present a data-driven approach for modeling such distortions automatically by training an LSTM with subjective quality ratings labeled via crowdsourcing. The videos were collected from live videoconferencing calls in 83 different network conditions. We applied QR codes as markers on the source videos to create aligned references and compute temporal features based on the alignment vectors. Using these features together with VMAF core features, our proposed model achieves a PCC of 0.99 on the validation set. Furthermore, our model outputs per-frame quality that gives detailed insight into the cause of video quality impairments. The VCM model and dataset are open-sourced at https://github.com/microsoft/Video_Call_MOS.
    RoBIC: A benchmark suite for assessing classifiers robustness. (arXiv:2102.05368v2 [cs.CV] UPDATED)
    Many defenses have emerged with the development of adversarial attacks. Models must be objectively evaluated accordingly. This paper systematically tackles this concern by proposing a new parameter-free benchmark we coin RoBIC. RoBIC fairly evaluates the robustness of image classifiers using a new half-distortion measure. It gauges the robustness of the network against white and black box attacks, independently of its accuracy. RoBIC is faster than the other available benchmarks. We present the significant differences in the robustness of 16 recent models as assessed by RoBIC.
    Conformal Prediction for Time Series with Modern Hopfield Networks. (arXiv:2303.12783v1 [cs.LG])
    To quantify uncertainty, conformal prediction methods are gaining continuously more interest and have already been successfully applied to various domains. However, they are difficult to apply to time series as the autocorrelative structure of time series violates basic assumptions required by conformal prediction. We propose HopCPT, a novel conformal prediction approach for time series that not only copes with temporal structures but leverages them. We show that our approach is theoretically well justified for time series where temporal dependencies are present. In experiments, we demonstrate that our new approach outperforms state-of-the-art conformal prediction methods on multiple real-world time series datasets from four different domains.
    Open Set Action Recognition via Multi-Label Evidential Learning. (arXiv:2303.12698v1 [cs.CV])
    Existing methods for open-set action recognition focus on novelty detection that assumes video clips show a single action, which is unrealistic in the real world. We propose a new method for open set action recognition and novelty detection via MUlti-Label Evidential learning (MULE), that goes beyond previous novel action detection methods by addressing the more general problems of single or multiple actors in the same scene, with simultaneous action(s) by any actor. Our Beta Evidential Neural Network estimates multi-action uncertainty with Beta densities based on actor-context-object relation representations. An evidence debiasing constraint is added to the objective function for optimization to reduce the static bias of video representations, which can incorrectly correlate predictions and static cues. We develop a learning algorithm based on a primal-dual average scheme update to optimize the proposed problem. Theoretical analysis of the optimization algorithm demonstrates the convergence of the primal solution sequence and bounds for both the loss function and the debiasing constraint. Uncertainty and belief-based novelty estimation mechanisms are formulated to detect novel actions. Extensive experiments on two real-world video datasets show that our proposed approach achieves promising performance in single/multi-actor, single/multi-action settings.
    Visualizing Semiotics in Generative Adversarial Networks. (arXiv:2303.12731v1 [cs.CV])
    We perform a set of experiments to demonstrate that images generated using a Generative Adversarial Network can be modified using 'semiotics.' We show that just as physical attributes such as the hue and saturation of an image can be modified, so too can its non-physical, abstract properties using our method. For example, the design of a flight attendant's uniform may be modified to look more 'alert,' less 'austere,' or more 'practical.' The form of a house can be modified to appear more 'futuristic,' a car more 'friendly' a pair of sneakers, 'evil.' Our method uncovers latent visual iconography associated with the semiotic property of interest, enabling a process of visual form-finding using abstract concepts. Our approach is iterative and allows control over the degree of attribute presence and can be used to aid the design process to yield emergent visual concepts.
    On-Device Unsupervised Image Segmentation. (arXiv:2303.12753v1 [cs.CV])
    Along with the breakthrough of convolutional neural networks, learning-based segmentation has emerged in many research works. Most of them are based on supervised learning, requiring plenty of annotated data; however, to support segmentation, a label for each pixel is required, which is obviously expensive. As a result, the issue of lacking annotated segmentation data commonly exists. Continuous learning is a promising way to deal with this issue; however, it still has high demands on human labor for annotation. What's more, privacy is highly required in segmentation data for real-world applications, which further calls for on-device learning. In this paper, we aim to resolve the above issue in an alternative way: Instead of supervised segmentation, we propose to develop efficient unsupervised segmentation that can be executed on edge devices. Based on our observation that segmentation can obtain high performance when pixels are mapped to a high-dimension space, we for the first time bring brain-inspired hyperdimensional computing (HDC) to the segmentation task. We build the HDC-based unsupervised segmentation framework, namely "SegHDC". In SegHDC, we devise a novel encoding approach that follows the Manhattan distance. A clustering algorithm is further developed on top of the encoded high-dimension vectors to obtain segmentation results. Experimental results show SegHDC can significantly surpass neural network-based unsupervised segmentation. On a standard segmentation dataset, DSB2018, SegHDC can achieve a 28.0% improvement in Intersection over Union (IoU) score; meanwhile, it achieves over 300x speedup on Raspberry PI. What's more, for a larger size image in the BBBC005 dataset, the existing approach cannot be accommodated to Raspberry PI due to out of memory; on the other hand, SegHDC can obtain segmentation results within 3 minutes while achieving a 0.9587 IoU score.
    DR.CPO: Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion. (arXiv:2303.12743v1 [cs.CV])
    In autonomous driving, data augmentation is commonly used for improving 3D object detection. The most basic methods include insertion of copied objects and rotation and scaling of the entire training frame. Numerous variants have been developed as well. The existing methods, however, are considerably limited when compared to the variety of the real world possibilities. In this work, we develop a diversified and realistic augmentation method that can flexibly construct a whole-body object, freely locate and rotate the object, and apply self-occlusion and external-occlusion accordingly. To improve the diversity of the whole-body object construction, we develop an iterative method that stochastically combines multiple objects observed from the real world into a single object. Unlike the existing augmentation methods, the constructed objects can be randomly located and rotated in the training frame because proper occlusions can be reflected to the whole-body objects in the final step. Finally, proper self-occlusion at each local object level and external-occlusion at the global frame level are applied using the Hidden Point Removal (HPR) algorithm that is computationally efficient. HPR is also used for adaptively controlling the point density of each object according to the object's distance from the LiDAR. Experiment results show that the proposed DR.CPO algorithm is data-efficient and model-agnostic without incurring any computational overhead. Also, DR.CPO can improve mAP performance by 2.08% when compared to the best 3D detection result known for KITTI dataset. The code is available at https://github.com/SNU-DRL/DRCPO.git
    Posthoc Interpretation via Quantization. (arXiv:2303.12659v1 [cs.AI])
    In this paper, we introduce a new approach, called "Posthoc Interpretation via Quantization (PIQ)", for interpreting decisions made by trained classifiers. Our method utilizes vector quantization to transform the representations of a classifier into a discrete, class-specific latent space. The class-specific codebooks act as a bottleneck that forces the interpreter to focus on the parts of the input data deemed relevant by the classifier for making a prediction. We evaluated our method through quantitative and qualitative studies and found that PIQ generates interpretations that are more easily understood by participants to our user studies when compared to several other interpretation methods in the literature.
    Toward Data-Driven Glare Classification and Prediction for Marine Megafauna Survey. (arXiv:2303.12730v1 [cs.CV])
    Critically endangered species in Canadian North Atlantic waters are systematically surveyed to estimate species populations which influence governing policies. Due to its impact on policy, population accuracy is important. This paper lays the foundation towards a data-driven glare modelling system, which will allow surveyors to preemptively minimize glare. Surveyors use a detection function to estimate megafauna populations which are not explicitly seen. A goal of the research is to maximize useful imagery collected, to that end we will use our glare model to predict glare and optimize for glare-free data collection. To build this model, we leverage a small labelled dataset to perform semi-supervised learning. The large dataset is labelled with a Cascading Random Forest Model using a na\"ive pseudo-labelling approach. A reflectance model is used, which pinpoints features of interest, to populate our datasets which allows for context-aware machine learning models. The pseudo-labelled dataset is used on two models: a Multilayer Perceptron and a Recurrent Neural Network. With this paper, we lay the foundation for data-driven mission planning; a glare modelling system which allows surveyors to preemptively minimize glare and reduces survey reliance on the detection function as an estimator of whale populations during periods of poor subsurface visibility.
    Robust Holographic mmWave Beamforming by Self-Supervised Hybrid Deep Learning. (arXiv:2303.12653v1 [cs.IT])
    Beamforming with large-scale antenna arrays has been widely used in recent years, which is acknowledged as an important part in 5G and incoming 6G. Thus, various techniques are leveraged to improve its performance, e.g., deep learning, advanced optimization algorithms, etc. Although its performance in many previous research scenarios with deep learning is quite attractive, usually it drops rapidly when the environment or dataset is changed. Therefore, designing effective beamforming network with strong robustness is an open issue for the intelligent wireless communications. In this paper, we propose a robust beamforming self-supervised network, and verify it in two kinds of different datasets with various scenarios. Simulation results show that the proposed self-supervised network with hybrid learning performs well in both classic DeepMIMO and new WAIR-D dataset with the strong robustness under the various environments. Also, we present the principle to explain the rationality of this kind of hybrid learning, which is instructive to apply with more kinds of datasets.
    Fix the Noise: Disentangling Source Feature for Transfer Learning of StyleGAN. (arXiv:2204.14079v3 [cs.CV] UPDATED)
    Transfer learning of StyleGAN has recently shown great potential to solve diverse tasks, especially in domain translation. Previous methods utilized a source model by swapping or freezing weights during transfer learning, however, they have limitations on visual quality and controlling source features. In other words, they require additional models that are computationally demanding and have restricted control steps that prevent a smooth transition. In this paper, we propose a new approach to overcome these limitations. Instead of swapping or freezing, we introduce a simple feature matching loss to improve generation quality. In addition, to control the degree of source features, we train a target model with the proposed strategy, FixNoise, to preserve the source features only in a disentangled subspace of a target feature space. Owing to the disentangled feature space, our method can smoothly control the degree of the source features in a single model. Extensive experiments demonstrate that the proposed method can generate more consistent and realistic images than previous works.
    Multi-modal Variational Autoencoders for normative modelling across multiple imaging modalities. (arXiv:2303.12706v1 [cs.CV])
    One of the challenges of studying common neurological disorders is disease heterogeneity including differences in causes, neuroimaging characteristics, comorbidities, or genetic variation. Normative modelling has become a popular method for studying such cohorts where the 'normal' behaviour of a physiological system is modelled and can be used at subject level to detect deviations relating to disease pathology. For many heterogeneous diseases, we expect to observe abnormalities across a range of neuroimaging and biological variables. However, thus far, normative models have largely been developed for studying a single imaging modality. We aim to develop a multi-modal normative modelling framework where abnormality is aggregated across variables of multiple modalities and is better able to detect deviations than uni-modal baselines. We propose two multi-modal VAE normative models to detect subject level deviations across T1 and DTI data. Our proposed models were better able to detect diseased individuals, capture disease severity, and correlate with patient cognition than baseline approaches. We also propose a multivariate latent deviation metric, measuring deviations from the joint latent space, which outperformed feature-based metrics.
    Toward Polar Sea-Ice Classification using Color-based Segmentation and Auto-labeling of Sentinel-2 Imagery to Train an Efficient Deep Learning Model. (arXiv:2303.12719v1 [cs.CV])
    Global warming is an urgent issue that is generating catastrophic environmental changes, such as the melting of sea ice and glaciers, particularly in the polar regions. The melting pattern and retreat of polar sea ice cover is an essential indicator of global warming. The Sentinel-2 satellite (S2) captures high-resolution optical imagery over the polar regions. This research aims at developing a robust and effective system for classifying polar sea ice as thick or snow-covered, young or thin, or open water using S2 images. A key challenge is the lack of labeled S2 training data to serve as the ground truth. We demonstrate a method with high precision to segment and automatically label the S2 images based on suitably determined color thresholds and employ these auto-labeled data to train a U-Net machine model (a fully convolutional neural network), yielding good classification accuracy. Evaluation results over S2 data from the polar summer season in the Ross Sea region of the Antarctic show that the U-Net model trained on auto-labeled data has an accuracy of 90.18% over the original S2 images, whereas the U-Net model trained on manually labeled data has an accuracy of 91.39%. Filtering out the thin clouds and shadows from the S2 images further improves U-Net's accuracy, respectively, to 98.97% for auto-labeled and 98.40% for manually labeled training datasets.
    An Extended Study of Human-like Behavior under Adversarial Training. (arXiv:2303.12669v1 [cs.CV])
    Neural networks have a number of shortcomings. Amongst the severest ones is the sensitivity to distribution shifts which allows models to be easily fooled into wrong predictions by small perturbations to inputs that are often imperceivable to humans and do not have to carry semantic meaning. Adversarial training poses a partial solution to address this issue by training models on worst-case perturbations. Yet, recent work has also pointed out that the reasoning in neural networks is different from humans. Humans identify objects by shape, while neural nets mainly employ texture cues. Exemplarily, a model trained on photographs will likely fail to generalize to datasets containing sketches. Interestingly, it was also shown that adversarial training seems to favorably increase the shift toward shape bias. In this work, we revisit this observation and provide an extensive analysis of this effect on various architectures, the common $\ell_2$- and $\ell_\infty$-training, and Transformer-based models. Further, we provide a possible explanation for this phenomenon from a frequency perspective.
    Non-convex approaches for low-rank tensor completion under tubal sampling. (arXiv:2303.12721v1 [cs.LG])
    Tensor completion is an important problem in modern data analysis. In this work, we investigate a specific sampling strategy, referred to as tubal sampling. We propose two novel non-convex tensor completion frameworks that are easy to implement, named tensor $L_1$-$L_2$ (TL12) and tensor completion via CUR (TCCUR). We test the efficiency of both methods on synthetic data and a color image inpainting problem. Empirical results reveal a trade-off between the accuracy and time efficiency of these two methods in a low sampling ratio. Each of them outperforms some classical completion methods in at least one aspect.
    Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access. (arXiv:2303.12718v1 [cs.LG])
    A central task in control theory, artificial intelligence, and formal methods is to synthesize reward-maximizing strategies for agents that operate in partially unknown environments. In environments modeled by gray-box Markov decision processes (MDPs), the impact of the agents' actions are known in terms of successor states but not the stochastics involved. In this paper, we devise a strategy synthesis algorithm for gray-box MDPs via reinforcement learning that utilizes interval MDPs as internal model. To compete with limited sampling access in reinforcement learning, we incorporate two novel concepts into our algorithm, focusing on rapid and successful learning rather than on stochastic guarantees and optimality: lower confidence bound exploration reinforces variants of already learned practical strategies and action scoping reduces the learning action space to promising actions. We illustrate benefits of our algorithms by means of a prototypical implementation applied on examples from the AI and formal methods communities.
    Few-shot Multimodal Multitask Multilingual Learning. (arXiv:2303.12489v1 [cs.LG])
    While few-shot learning as a transfer learning paradigm has gained significant traction for scenarios with limited data, it has primarily been explored in the context of building unimodal and unilingual models. Furthermore, a significant part of the existing literature in the domain of few-shot multitask learning perform in-context learning which requires manually generated prompts as the input, yielding varying outcomes depending on the level of manual prompt-engineering. In addition, in-context learning suffers from substantial computational, memory, and storage costs which eventually leads to high inference latency because it involves running all of the prompt's examples through the model every time a prediction is made. In contrast, methods based on the transfer learning via the fine-tuning paradigm avoid the aforementioned issues at a one-time cost of fine-tuning weights on a per-task basis. However, such methods lack exposure to few-shot multimodal multitask learning. In this paper, we propose few-shot learning for a multimodal multitask multilingual (FM3) setting by adapting pre-trained vision and language models using task-specific hypernetworks and contrastively fine-tuning them to enable few-shot learning. FM3's architecture combines the best of both worlds of in-context and fine-tuning based learning and consists of three major components: (i) multimodal contrastive fine-tuning to enable few-shot learning, (ii) hypernetwork task adaptation to perform multitask learning, and (iii) task-specific output heads to cater to a plethora of diverse tasks. FM3 learns the most prominent tasks in the vision and language domains along with their intersections, namely visual entailment (VE), visual question answering (VQA), and natural language understanding (NLU) tasks such as neural entity recognition (NER) and the GLUE benchmark including QNLI, MNLI, QQP, and SST-2.
    Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding. (arXiv:2303.12513v1 [cs.CV])
    Most humans use visual imagination to understand and reason about language, but models such as BERT reason about language using knowledge acquired during text-only pretraining. In this work, we investigate whether vision-and-language pretraining can improve performance on text-only tasks that involve implicit visual reasoning, focusing primarily on zero-shot probing methods. We propose a suite of visual language understanding (VLU) tasks for probing the visual reasoning abilities of text encoder models, as well as various non-visual natural language understanding (NLU) tasks for comparison. We also contribute a novel zero-shot knowledge probing method, Stroop probing, for applying models such as CLIP to text-only tasks without needing a prediction head such as the masked language modelling head of models like BERT. We show that SOTA multimodally trained text encoders outperform unimodally trained text encoders on the VLU tasks while being underperformed by them on the NLU tasks, lending new context to previously mixed results regarding the NLU capabilities of multimodal models. We conclude that exposure to images during pretraining affords inherent visual reasoning knowledge that is reflected in language-only tasks that require implicit visual reasoning. Our findings bear importance in the broader context of multimodal learning, providing principled guidelines for the choice of text encoders used in such contexts.
    Disturbance Injection under Partial Automation: Robust Imitation Learning for Long-horizon Tasks. (arXiv:2303.12375v1 [cs.RO])
    Partial Automation (PA) with intelligent support systems has been introduced in industrial machinery and advanced automobiles to reduce the burden of long hours of human operation. Under PA, operators perform manual operations (providing actions) and operations that switch to automatic/manual mode (mode-switching). Since PA reduces the total duration of manual operation, these two action and mode-switching operations can be replicated by imitation learning with high sample efficiency. To this end, this paper proposes Disturbance Injection under Partial Automation (DIPA) as a novel imitation learning framework. In DIPA, mode and actions (in the manual mode) are assumed to be observables in each state and are used to learn both action and mode-switching policies. The above learning is robustified by injecting disturbances into the operator's actions to optimize the disturbance's level for minimizing the covariate shift under PA. We experimentally validated the effectiveness of our method for long-horizon tasks in two simulations and a real robot environment and confirmed that our method outperformed the previous methods and reduced the demonstration burden.
    Solving Bilevel Knapsack Problem using Graph Neural Networks. (arXiv:2211.13436v2 [cs.AI] UPDATED)
    The Bilevel Optimization Problem is a hierarchical optimization problem with two agents, a leader and a follower. The leader make their own decisions first, and the followers make the best choices accordingly. The leader knows the information of the followers, and the goal of the problem is to find the optimal solution by considering the reactions of the followers from the leader's point of view. For the Bilevel Optimization Problem, there are no general and efficient algorithms or commercial solvers to get an optimal solution, and it is very difficult to get a good solution even for a simple problem. In this paper, we propose a deep learning approach using Graph Neural Networks to solve the bilevel knapsack problem. We train the model to predict the leader's solution and use it to transform the hierarchical optimization problem into a single-level optimization problem to get the solution. Our model found the feasible solution that was about 500 times faster than the exact algorithm with $1.7\%$ optimal gap. Also, our model performed well on problems of different size from the size it was trained on.
    Lower Bound on the Bayesian Risk via Information Measure. (arXiv:2303.12497v1 [cs.IT])
    This paper focuses on parameter estimation and introduces a new method for lower bounding the Bayesian risk. The method allows for the use of virtually \emph{any} information measure, including R\'enyi's $\alpha$, $\varphi$-Divergences, and Sibson's $\alpha$-Mutual Information. The approach considers divergences as functionals of measures and exploits the duality between spaces of measures and spaces of functions. In particular, we show that one can lower bound the risk with any information measure by upper bounding its dual via Markov's inequality. We are thus able to provide estimator-independent impossibility results thanks to the Data-Processing Inequalities that divergences satisfy. The results are then applied to settings of interest involving both discrete and continuous parameters, including the ``Hide-and-Seek'' problem, and compared to the state-of-the-art techniques. An important observation is that the behaviour of the lower bound in the number of samples is influenced by the choice of the information measure. We leverage this by introducing a new divergence inspired by the ``Hockey-Stick'' Divergence, which is demonstrated empirically to provide the largest lower-bound across all considered settings. If the observations are subject to privatisation, stronger impossibility results can be obtained via Strong Data-Processing Inequalities. The paper also discusses some generalisations and alternative directions.
    AIIPot: Adaptive Intelligent-Interaction Honeypot for IoT Devices. (arXiv:2303.12367v1 [cs.CR])
    The proliferation of the Internet of Things (IoT) has raised concerns about the security of connected devices. There is a need to develop suitable and cost-efficient methods to identify vulnerabilities in IoT devices in order to address them before attackers seize opportunities to compromise them. The deception technique is a prominent approach to improving the security posture of IoT systems. Honeypot is a popular deception technique that mimics interaction in real fashion and encourages unauthorised users (attackers) to launch attacks. Due to the large number and the heterogeneity of IoT devices, manually crafting the low and high-interaction honeypots is not affordable. This has forced researchers to seek innovative ways to build honeypots for IoT devices. In this paper, we propose a honeypot for IoT devices that uses machine learning techniques to learn and interact with attackers automatically. The evaluation of the proposed model indicates that our system can improve the session length with attackers and capture more attacks on the IoT network.
    Traffic Volume Prediction using Memory-Based Recurrent Neural Networks: A comparative analysis of LSTM and GRU. (arXiv:2303.12643v1 [cs.LG])
    Predicting traffic volume in real-time can improve both traffic flow and road safety. A precise traffic volume forecast helps alert drivers to the flow of traffic along their preferred routes, preventing potential deadlock situations. Existing parametric models cannot reliably forecast traffic volume in dynamic and complex traffic conditions. Therefore, in order to evaluate and forecast the traffic volume for every given time step in a real-time manner, we develop non-linear memory-based deep neural network models. Our extensive experiments run on the Metro Interstate Traffic Volume dataset demonstrate the effectiveness of the proposed models in predicting traffic volume in highly dynamic and heterogeneous traffic environments.
    Fairness Improves Learning from Noisily Labeled Long-Tailed Data. (arXiv:2303.12291v1 [cs.LG])
    Both long-tailed and noisily labeled data frequently appear in real-world applications and impose significant challenges for learning. Most prior works treat either problem in an isolated way and do not explicitly consider the coupling effects of the two. Our empirical observation reveals that such solutions fail to consistently improve the learning when the dataset is long-tailed with label noise. Moreover, with the presence of label noise, existing methods do not observe universal improvements across different sub-populations; in other words, some sub-populations enjoyed the benefits of improved accuracy at the cost of hurting others. Based on these observations, we introduce the Fairness Regularizer (FR), inspired by regularizing the performance gap between any two sub-populations. We show that the introduced fairness regularizer improves the performances of sub-populations on the tail and the overall learning performance. Extensive experiments demonstrate the effectiveness of the proposed solution when complemented with certain existing popular robust or class-balanced methods.
    SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot. (arXiv:2301.00774v3 [cs.LG] UPDATED)
    We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.
    A General Algorithm for Solving Rank-one Matrix Sensing. (arXiv:2303.12298v1 [cs.DS])
    Matrix sensing has many real-world applications in science and engineering, such as system control, distance embedding, and computer vision. The goal of matrix sensing is to recover a matrix $A_\star \in \mathbb{R}^{n \times n}$, based on a sequence of measurements $(u_i,b_i) \in \mathbb{R}^{n} \times \mathbb{R}$ such that $u_i^\top A_\star u_i = b_i$. Previous work [ZJD15] focused on the scenario where matrix $A_{\star}$ has a small rank, e.g. rank-$k$. Their analysis heavily relies on the RIP assumption, making it unclear how to generalize to high-rank matrices. In this paper, we relax that rank-$k$ assumption and solve a much more general matrix sensing problem. Given an accuracy parameter $\delta \in (0,1)$, we can compute $A \in \mathbb{R}^{n \times n}$ in $\widetilde{O}(m^{3/2} n^2 \delta^{-1} )$, such that $ |u_i^\top A u_i - b_i| \leq \delta$ for all $i \in [m]$. We design an efficient algorithm with provable convergence guarantees using stochastic gradient descent for this problem.
    $\mathcal{C}^k$-continuous Spline Approximation with TensorFlow Gradient Descent Optimizers. (arXiv:2303.12454v1 [cs.LG])
    In this work we present an "out-of-the-box" application of Machine Learning (ML) optimizers for an industrial optimization problem. We introduce a piecewise polynomial model (spline) for fitting of $\mathcal{C}^k$-continuos functions, which can be deployed in a cam approximation setting. We then use the gradient descent optimization context provided by the machine learning framework TensorFlow to optimize the model parameters with respect to approximation quality and $\mathcal{C}^k$-continuity and evaluate available optimizers. Our experiments show that the problem solution is feasible using TensorFlow gradient tapes and that AMSGrad and SGD show the best results among available TensorFlow optimizers. Furthermore, we introduce a novel regularization approach to improve SGD convergence. Although experiments show that remaining discontinuities after optimization are small, we can eliminate these errors using a presented algorithm which has impact only on affected derivatives in the local spline segment.
    Split-Et-Impera: A Framework for the Design of Distributed Deep Learning Applications. (arXiv:2303.12524v1 [cs.DC])
    Many recent pattern recognition applications rely on complex distributed architectures in which sensing and computational nodes interact together through a communication network. Deep neural networks (DNNs) play an important role in this scenario, furnishing powerful decision mechanisms, at the price of a high computational effort. Consequently, powerful state-of-the-art DNNs are frequently split over various computational nodes, e.g., a first part stays on an embedded device and the rest on a server. Deciding where to split a DNN is a challenge in itself, making the design of deep learning applications even more complicated. Therefore, we propose Split-Et-Impera, a novel and practical framework that i) determines the set of the best-split points of a neural network based on deep network interpretability principles without performing a tedious try-and-test approach, ii) performs a communication-aware simulation for the rapid evaluation of different neural network rearrangements, and iii) suggests the best match between the quality of service requirements of the application and the performance in terms of accuracy and latency time.
    Reliable and Efficient Evaluation of Adversarial Robustness for Deep Hashing-Based Retrieval. (arXiv:2303.12658v1 [cs.CV])
    Deep hashing has been extensively applied to massive image retrieval due to its efficiency and effectiveness. Recently, several adversarial attacks have been presented to reveal the vulnerability of deep hashing models against adversarial examples. However, existing attack methods suffer from degraded performance or inefficiency because they underutilize the semantic relations between original samples or spend a lot of time learning these relations with a deep neural network. In this paper, we propose a novel Pharos-guided Attack, dubbed PgA, to evaluate the adversarial robustness of deep hashing networks reliably and efficiently. Specifically, we design pharos code to represent the semantics of the benign image, which preserves the similarity to semantically relevant samples and dissimilarity to irrelevant ones. It is proven that we can quickly calculate the pharos code via a simple math formula. Accordingly, PgA can directly conduct a reliable and efficient attack on deep hashing-based retrieval by maximizing the similarity between the hash code of the adversarial example and the pharos code. Extensive experiments on the benchmark datasets verify that the proposed algorithm outperforms the prior state-of-the-arts in both attack strength and speed.
    Frozen Language Model Helps ECG Zero-Shot Learning. (arXiv:2303.12311v1 [cs.LG])
    The electrocardiogram (ECG) is one of the most commonly used non-invasive, convenient medical monitoring tools that assist in the clinical diagnosis of heart diseases. Recently, deep learning (DL) techniques, particularly self-supervised learning (SSL), have demonstrated great potential in the classification of ECG. SSL pre-training has achieved competitive performance with only a small amount of annotated data after fine-tuning. However, current SSL methods rely on the availability of annotated data and are unable to predict labels not existing in fine-tuning datasets. To address this challenge, we propose Multimodal ECG-Text Self-supervised pre-training (METS), the first work to utilize the auto-generated clinical reports to guide ECG SSL pre-training. We use a trainable ECG encoder and a frozen language model to embed paired ECG and automatically machine-generated clinical reports separately. The SSL aims to maximize the similarity between paired ECG and auto-generated report while minimize the similarity between ECG and other reports. In downstream classification tasks, METS achieves around 10% improvement in performance without using any annotated data via zero-shot classification, compared to other supervised and SSL baselines that rely on annotated data. Furthermore, METS achieves the highest recall and F1 scores on the MIT-BIH dataset, despite MIT-BIH containing different classes of ECG compared to the pre-trained dataset. The extensive experiments have demonstrated the advantages of using ECG-Text multimodal self-supervised learning in terms of generalizability, effectiveness, and efficiency.
    Multiscale Attention via Wavelet Neural Operators for Vision Transformers. (arXiv:2303.12398v1 [cs.CV])
    Transformers have achieved widespread success in computer vision. At their heart, there is a Self-Attention (SA) mechanism, an inductive bias that associates each token in the input with every other token through a weighted basis. The standard SA mechanism has quadratic complexity with the sequence length, which impedes its utility to long sequences appearing in high resolution vision. Recently, inspired by operator learning for PDEs, Adaptive Fourier Neural Operators (AFNO) were introduced for high resolution attention based on global convolution that is efficiently implemented via FFT. However, the AFNO global filtering cannot well represent small and moderate scale structures that commonly appear in natural images. To leverage the coarse-to-fine scale structures we introduce a Multiscale Wavelet Attention (MWA) by leveraging wavelet neural operators which incurs linear complexity in the sequence size. We replace the attention in ViT with MWA and our experiments with CIFAR and ImageNet classification demonstrate significant improvement over alternative Fourier-based attentions such as AFNO and Global Filter Network (GFN).
    Unsupervised Domain Adaptation for Training Event-Based Networks Using Contrastive Learning and Uncorrelated Conditioning. (arXiv:2303.12424v1 [cs.CV])
    Event-based cameras offer reliable measurements for preforming computer vision tasks in high-dynamic range environments and during fast motion maneuvers. However, adopting deep learning in event-based vision faces the challenge of annotated data scarcity due to recency of event cameras. Transferring the knowledge that can be obtained from conventional camera annotated data offers a practical solution to this challenge. We develop an unsupervised domain adaptation algorithm for training a deep network for event-based data image classification using contrastive learning and uncorrelated conditioning of data. Our solution outperforms the existing algorithms for this purpose.
    On Domain-Specific Pre-Training for Effective Semantic Perception in Agricultural Robotics. (arXiv:2303.12499v1 [cs.CV])
    Agricultural robots have the prospect to enable more efficient and sustainable agricultural production of food, feed, and fiber. Perception of crops and weeds is a central component of agricultural robots that aim to monitor fields and assess the plants as well as their growth stage in an automatic manner. Semantic perception mostly relies on deep learning using supervised approaches, which require time and qualified workers to label fairly large amounts of data. In this paper, we look into the problem of reducing the amount of labels without compromising the final segmentation performance. For robots operating in the field, pre-training networks in a supervised way is already a popular method to reduce the number of required labeled images. We investigate the possibility of pre-training in a self-supervised fashion using data from the target domain. To better exploit this data, we propose a set of domain-specific augmentation strategies. We evaluate our pre-training on semantic segmentation and leaf instance segmentation, two important tasks in our domain. The experimental results suggest that pre-training with domain-specific data paired with our data augmentation strategy leads to superior performance compared to commonly used pre-trainings. Furthermore, the pre-trained networks obtain similar performance to the fully supervised with less labeled data.
    Neuro-Symbolic Reasoning Shortcuts: Mitigation Strategies and their Limitations. (arXiv:2303.12578v1 [cs.AI])
    Neuro-symbolic predictors learn a mapping from sub-symbolic inputs to higher-level concepts and then carry out (probabilistic) logical inference on this intermediate representation. This setup offers clear advantages in terms of consistency to symbolic prior knowledge, and is often believed to provide interpretability benefits in that - by virtue of complying with the knowledge - the learned concepts can be better understood by human stakeholders. However, it was recently shown that this setup is affected by reasoning shortcuts whereby predictions attain high accuracy by leveraging concepts with unintended semantics, yielding poor out-of-distribution performance and compromising interpretability. In this short paper, we establish a formal link between reasoning shortcuts and the optima of the loss function, and identify situations in which reasoning shortcuts can arise. Based on this, we discuss limitations of natural mitigation strategies such as reconstruction and concept supervision.
    Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games. (arXiv:2303.12287v1 [cs.LG])
    We consider the problem of decentralized multi-agent reinforcement learning in Markov games. A fundamental question is whether there exist algorithms that, when adopted by all agents and run independently in a decentralized fashion, lead to no-regret for each player, analogous to celebrated convergence results in normal-form games. While recent work has shown that such algorithms exist for restricted settings (notably, when regret is defined with respect to deviations to Markovian policies), the question of whether independent no-regret learning can be achieved in the standard Markov game framework was open. We provide a decisive negative resolution this problem, both from a computational and statistical perspective. We show that: - Under the widely-believed assumption that PPAD-hard problems cannot be solved in polynomial time, there is no polynomial-time algorithm that attains no-regret in general-sum Markov games when executed independently by all players, even when the game is known to the algorithm designer and the number of players is a small constant. - When the game is unknown, no algorithm, regardless of computational efficiency, can achieve no-regret without observing a number of episodes that is exponential in the number of players. Perhaps surprisingly, our lower bounds hold even for seemingly easier setting in which all agents are controlled by a a centralized algorithm. They are proven via lower bounds for a simpler problem we refer to as SparseCCE, in which the goal is to compute a coarse correlated equilibrium that is sparse in the sense that it can be represented as a mixture of a small number of product policies. The crux of our approach is a novel application of aggregation techniques from online learning, whereby we show that any algorithm for the SparseCCE problem can be used to compute approximate Nash equilibria for non-zero sum normal-form games.
    Wasserstein Adversarial Examples on Univariant Time Series Data. (arXiv:2303.12357v1 [cs.LG])
    Adversarial examples are crafted by adding indistinguishable perturbations to normal examples in order to fool a well-trained deep learning model to misclassify. In the context of computer vision, this notion of indistinguishability is typically bounded by $L_{\infty}$ or other norms. However, these norms are not appropriate for measuring indistinguishiability for time series data. In this work, we propose adversarial examples in the Wasserstein space for time series data for the first time and utilize Wasserstein distance to bound the perturbation between normal examples and adversarial examples. We introduce Wasserstein projected gradient descent (WPGD), an adversarial attack method for perturbing univariant time series data. We leverage the closed-form solution of Wasserstein distance in the 1D space to calculate the projection step of WPGD efficiently with the gradient descent method. We further propose a two-step projection so that the search of adversarial examples in the Wasserstein space is guided and constrained by Euclidean norms to yield more effective and imperceptible perturbations. We empirically evaluate the proposed attack on several time series datasets in the healthcare domain. Extensive results demonstrate that the Wasserstein attack is powerful and can successfully attack most of the target classifiers with a high attack success rate. To better study the nature of Wasserstein adversarial example, we evaluate a strong defense mechanism named Wasserstein smoothing for potential certified robustness defense. Although the defense can achieve some accuracy gain, it still has limitations in many cases and leaves space for developing a stronger certified robustness method to Wasserstein adversarial examples on univariant time series data.
    Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees. (arXiv:2303.12558v1 [cs.LG])
    Although deep reinforcement learning (DRL) has many success stories, the large-scale deployment of policies learned through these advanced techniques in safety-critical scenarios is hindered by their lack of formal guarantees. Variational Markov Decision Processes (VAE-MDPs) are discrete latent space models that provide a reliable framework for distilling formally verifiable controllers from any RL policy. While the related guarantees address relevant practical aspects such as the satisfaction of performance and safety properties, the VAE approach suffers from several learning flaws (posterior collapse, slow learning speed, poor dynamics estimates), primarily due to the absence of abstraction and representation guarantees to support latent optimization. We introduce the Wasserstein auto-encoded MDP (WAE-MDP), a latent space model that fixes those issues by minimizing a penalized form of the optimal transport between the behaviors of the agent executing the original policy and the distilled policy, for which the formal guarantees apply. Our approach yields bisimulation guarantees while learning the distilled policy, allowing concrete optimization of the abstraction and representation model quality. Our experiments show that, besides distilling policies up to 10 times faster, the latent model quality is indeed better in general. Moreover, we present experiments from a simple time-to-failure verification algorithm on the latent space. The fact that our approach enables such simple verification techniques highlights its applicability.
    TsSHAP: Robust model agnostic feature-based explainability for time series forecasting. (arXiv:2303.12316v1 [cs.LG])
    A trustworthy machine learning model should be accurate as well as explainable. Understanding why a model makes a certain decision defines the notion of explainability. While various flavors of explainability have been well-studied in supervised learning paradigms like classification and regression, literature on explainability for time series forecasting is relatively scarce. In this paper, we propose a feature-based explainability algorithm, TsSHAP, that can explain the forecast of any black-box forecasting model. The method is agnostic of the forecasting model and can provide explanations for a forecast in terms of interpretable features defined by the user a prior. The explanations are in terms of the SHAP values obtained by applying the TreeSHAP algorithm on a surrogate model that learns a mapping between the interpretable feature space and the forecast of the black-box model. Moreover, we formalize the notion of local, semi-local, and global explanations in the context of time series forecasting, which can be useful in several scenarios. We validate the efficacy and robustness of TsSHAP through extensive experiments on multiple datasets.
    EDGI: Equivariant Diffusion for Planning with Embodied Agents. (arXiv:2303.12410v1 [cs.LG])
    Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries. Most algorithms for planning and model-based reinforcement learning (MBRL) do not take this rich geometric structure into account, leading to sample inefficiency and poor generalization. We introduce the Equivariant Diffuser for Generating Interactions (EDGI), an algorithm for MBRL and planning that is equivariant with respect to the product of the spatial symmetry group $\mathrm{SE(3)}$, the discrete-time translation group $\mathbb{Z}$, and the object permutation group $\mathrm{S}_n$. EDGI follows the Diffuser framework (Janner et al. 2022) in treating both learning a world model and planning in it as a conditional generative modeling problem, training a diffusion model on an offline trajectory dataset. We introduce a new $\mathrm{SE(3)} \times \mathbb{Z} \times \mathrm{S}_n$-equivariant diffusion model that supports multiple representations. We integrate this model in a planning loop, where conditioning and classifier-based guidance allow us to softly break the symmetry for specific tasks as needed. On navigation and object manipulation tasks, EDGI improves sample efficiency and generalization.
    DevelSet: Deep Neural Level Set for Instant Mask Optimization. (arXiv:2303.12529v1 [cs.CV])
    With the feature size continuously shrinking in advanced technology nodes, mask optimization is increasingly crucial in the conventional design flow, accompanied by an explosive growth in prohibitive computational overhead in optical proximity correction (OPC) methods. Recently, inverse lithography technique (ILT) has drawn significant attention and is becoming prevalent in emerging OPC solutions. However, ILT methods are either time-consuming or in weak performance of mask printability and manufacturability. In this paper, we present DevelSet, a GPU and deep neural network (DNN) accelerated level set OPC framework for metal layer. We first improve the conventional level set-based ILT algorithm by introducing the curvature term to reduce mask complexity and applying GPU acceleration to overcome computational bottlenecks. To further enhance printability and fast iterative convergence, we propose a novel deep neural network delicately designed with level set intrinsic principles to facilitate the joint optimization of DNN and GPU accelerated level set optimizer. Experimental results show that DevelSet framework surpasses the state-of-the-art methods in printability and boost the runtime performance achieving instant level (around 1 second).
    Logical Expressiveness of Graph Neural Network for Knowledge Graph Reasoning. (arXiv:2303.12306v1 [cs.LG])
    Graph Neural Networks (GNNs) have been recently introduced to learn from knowledge graph (KG) and achieved state-of-the-art performance in KG reasoning. However, a theoretical certification for their good empirical performance is still absent. Besides, while logic in KG is important for inductive and interpretable inference, existing GNN-based methods are just designed to fit data distributions with limited knowledge of their logical expressiveness. We propose to fill the above gap in this paper. Specifically, we theoretically analyze GNN from logical expressiveness and find out what kind of logical rules can be captured from KG. Our results first show that GNN can capture logical rules from graded modal logic, providing a new theoretical tool for analyzing the expressiveness of GNN for KG reasoning; and a query labeling trick makes it easier for GNN to capture logical rules, explaining why SOTA methods are mainly based on labeling trick. Finally, insights from our theory motivate the development of an entity labeling method for capturing difficult logical rules. Experimental results are consistent with our theoretical results and verify the effectiveness of our proposed method.
    Revisiting DeepFool: generalization and improvement. (arXiv:2303.12481v1 [cs.LG])
    Deep neural networks have been known to be vulnerable to adversarial examples, which are inputs that are modified slightly to fool the network into making incorrect predictions. This has led to a significant amount of research on evaluating the robustness of these networks against such perturbations. One particularly important robustness metric is the robustness to minimal l2 adversarial perturbations. However, existing methods for evaluating this robustness metric are either computationally expensive or not very accurate. In this paper, we introduce a new family of adversarial attacks that strike a balance between effectiveness and computational efficiency. Our proposed attacks are generalizations of the well-known DeepFool (DF) attack, while they remain simple to understand and implement. We demonstrate that our attacks outperform existing methods in terms of both effectiveness and computational efficiency. Our proposed attacks are also suitable for evaluating the robustness of large models and can be used to perform adversarial training (AT) to achieve state-of-the-art robustness to minimal l2 adversarial perturbations.
    Adaptive Road Configurations for Improved Autonomous Vehicle-Pedestrian Interactions using Reinforcement Learning. (arXiv:2303.12289v1 [cs.LG])
    The deployment of Autonomous Vehicles (AVs) poses considerable challenges and unique opportunities for the design and management of future urban road infrastructure. In light of this disruptive transformation, the Right-Of-Way (ROW) composition of road space has the potential to be renewed. Design approaches and intelligent control models have been proposed to address this problem, but we lack an operational framework that can dynamically generate ROW plans for AVs and pedestrians in response to real-time demand. Based on microscopic traffic simulation, this study explores Reinforcement Learning (RL) methods for evolving ROW compositions. We implement a centralised paradigm and a distributive learning paradigm to separately perform the dynamic control on several road network configurations. Experimental results indicate that the algorithms have the potential to improve traffic flow efficiency and allocate more space for pedestrians. Furthermore, the distributive learning algorithm outperforms its centralised counterpart regarding computational cost (49.55\%), benchmark rewards (25.35\%), best cumulative rewards (24.58\%), optimal actions (13.49\%) and rate of convergence. This novel road management technique could potentially contribute to the flow-adaptive and active mobility-friendly streets in the AVs era.
    Do Backdoors Assist Membership Inference Attacks?. (arXiv:2303.12589v1 [cs.CR])
    When an adversary provides poison samples to a machine learning model, privacy leakage, such as membership inference attacks that infer whether a sample was included in the training of the model, becomes effective by moving the sample to an outlier. However, the attacks can be detected because inference accuracy deteriorates due to poison samples. In this paper, we discuss a \textit{backdoor-assisted membership inference attack}, a novel membership inference attack based on backdoors that return the adversary's expected output for a triggered sample. We found three crucial insights through experiments with an academic benchmark dataset. We first demonstrate that the backdoor-assisted membership inference attack is unsuccessful. Second, when we analyzed loss distributions to understand the reason for the unsuccessful results, we found that backdoors cannot separate loss distributions of training and non-training samples. In other words, backdoors cannot affect the distribution of clean samples. Third, we also show that poison and triggered samples activate neurons of different distributions. Specifically, backdoors make any clean sample an inlier, contrary to poisoning samples. As a result, we confirm that backdoors cannot assist membership inference.
    Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL. (arXiv:2206.14057v3 [cs.LG] UPDATED)
    Reward-free reinforcement learning (RF-RL), a recently introduced RL paradigm, relies on random action-taking to explore the unknown environment without any reward feedback information. While the primary goal of the exploration phase in RF-RL is to reduce the uncertainty in the estimated model with minimum number of trajectories, in practice, the agent often needs to abide by certain safety constraint at the same time. It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning. In this work, we make a first attempt to answer this question. In particular, we consider the scenario where a safe baseline policy is known beforehand, and propose a unified Safe reWard-frEe ExploraTion (SWEET) framework. We then particularize the SWEET framework to the tabular and the low-rank MDP settings, and develop algorithms coined Tabular-SWEET and Low-rank-SWEET, respectively. Both algorithms leverage the concavity and continuity of the newly introduced truncated value functions, and are guaranteed to achieve zero constraint violation during exploration with high probability. Furthermore, both algorithms can provably find a near-optimal policy subject to any constraint in the planning phase. Remarkably, the sample complexities under both algorithms match or even outperform the state of the art in their constraint-free counterparts up to some constant factors, proving that safety constraint hardly increases the sample complexity for RF-RL.
    Synthetic Health-related Longitudinal Data with Mixed-type Variables Generated using Diffusion Models. (arXiv:2303.12281v1 [cs.LG])
    This paper presents a novel approach to simulating electronic health records (EHRs) using diffusion probabilistic models (DPMs). Specifically, we demonstrate the effectiveness of DPMs in synthesising longitudinal EHRs that capture mixed-type variables, including numeric, binary, and categorical variables. To our knowledge, this represents the first use of DPMs for this purpose. We compared our DPM-simulated datasets to previous state-of-the-art results based on generative adversarial networks (GANs) for two clinical applications: acute hypotension and human immunodeficiency virus (ART for HIV). Given the lack of similar previous studies in DPMs, a core component of our work involves exploring the advantages and caveats of employing DPMs across a wide range of aspects. In addition to assessing the realism of the synthetic datasets, we also trained reinforcement learning (RL) agents on the synthetic data to evaluate their utility for supporting the development of downstream machine learning models. Finally, we estimated that our DPM-simulated datasets are secure and posed a low patient exposure risk for public access.
    Dynamic Relevance Learning for Few-Shot Object Detection. (arXiv:2108.02235v3 [cs.CV] UPDATED)
    Expensive bounding-box annotations have limited the development of object detection task. Thus, it is necessary to focus on more challenging task of few-shot object detection. It requires the detector to recognize objects of novel classes with only a few training samples. Nowadays, many existing popular methods adopting training way similar to meta-learning have achieved promising performance, such as Meta R-CNN series. However, support data is only used as the class attention to guide the detecting of query images each time. Their relevance to each other remains unexploited. Moreover, a lot of recent works treat the support data and query images as independent branch without considering the relationship between them. To address this issue, we propose a dynamic relevance learning model, which utilizes the relationship between all support images and Region of Interest (RoI) on the query images to construct a dynamic graph convolutional network (GCN). By adjusting the prediction distribution of the base detector using the output of this GCN, the proposed model serves as a hard auxiliary classification task, which guides the detector to improve the class representation implicitly. Comprehensive experiments have been conducted on Pascal VOC and MS-COCO dataset. The proposed model achieves the best overall performance, which shows its effectiveness of learning more generalized features. Our code is available at https://github.com/liuweijie19980216/DRL-for-FSOD.
    CgAT: Center-Guided Adversarial Training for Deep Hashing-Based Retrieval. (arXiv:2204.10779v5 [cs.CV] UPDATED)
    Deep hashing has been extensively utilized in massive image retrieval because of its efficiency and effectiveness. However, deep hashing models are vulnerable to adversarial examples, making it essential to develop adversarial defense methods for image retrieval. Existing solutions achieved limited defense performance because of using weak adversarial samples for training and lacking discriminative optimization objectives to learn robust features. In this paper, we present a min-max based Center-guided Adversarial Training, namely CgAT, to improve the robustness of deep hashing networks through worst adversarial examples. Specifically, we first formulate the center code as a semantically-discriminative representative of the input image content, which preserves the semantic similarity with positive samples and dissimilarity with negative examples. We prove that a mathematical formula can calculate the center code immediately. After obtaining the center codes in each optimization iteration of the deep hashing network, they are adopted to guide the adversarial training process. On the one hand, CgAT generates the worst adversarial examples as augmented data by maximizing the Hamming distance between the hash codes of the adversarial examples and the center codes. On the other hand, CgAT learns to mitigate the effects of adversarial samples by minimizing the Hamming distance to the center codes. Extensive experiments on the benchmark datasets demonstrate the effectiveness of our adversarial training algorithm in defending against adversarial attacks for deep hashing-based retrieval. Compared with the current state-of-the-art defense method, we significantly improve the defense performance by an average of 18.61\%, 12.35\%, and 11.56\% on FLICKR-25K, NUS-WIDE, and MS-COCO, respectively. The code is available at https://github.com/xunguangwang/CgAT.
    Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization. (arXiv:2303.12314v1 [cs.CL])
    Prompt tuning is a parameter-efficient method, which learns soft prompts and conditions frozen language models to perform specific downstream tasks. Though effective, prompt tuning under few-shot settings on the one hand heavily relies on a good initialization of soft prompts. On the other hand, it can easily result in overfitting. Existing works leverage pre-training or supervised meta-learning to initialize soft prompts but they cannot data-efficiently generalize to unseen downstream tasks. To address the above problems, this paper proposes a novel Self-sUpervised meta-Prompt learning framework with meta-gradient Regularization for few-shot generalization (SUPMER). We first design a set of self-supervised anchor meta-training tasks with different task formats and further enrich the task distribution with curriculum-based task augmentation. Then a novel meta-gradient regularization method is integrated into meta-prompt learning. It meta-learns to transform the raw gradients during few-shot learning into a domain-generalizable direction, thus alleviating the problem of overfitting. Extensive experiments show that SUPMER achieves better performance for different few-shot downstream tasks, and also exhibits a stronger domain generalization ability.
    SMUG: Towards robust MRI reconstruction by smoothed unrolling. (arXiv:2303.12735v1 [eess.IV])
    Although deep learning (DL) has gained much popularity for accelerated magnetic resonance imaging (MRI), recent studies have shown that DL-based MRI reconstruction models could be oversensitive to tiny input perturbations (that are called 'adversarial perturbations'), which cause unstable, low-quality reconstructed images. This raises the question of how to design robust DL methods for MRI reconstruction. To address this problem, we propose a novel image reconstruction framework, termed SMOOTHED UNROLLING (SMUG), which advances a deep unrolling-based MRI reconstruction model using a randomized smoothing (RS)-based robust learning operation. RS, which improves the tolerance of a model against input noises, has been widely used in the design of adversarial defense for image classification. Yet, we find that the conventional design that applies RS to the entire DL process is ineffective for MRI reconstruction. We show that SMUG addresses the above issue by customizing the RS operation based on the unrolling architecture of the DL-based MRI reconstruction model. Compared to the vanilla RS approach and several variants of SMUG, we show that SMUG improves the robustness of MRI reconstruction with respect to a diverse set of perturbation sources, including perturbations to the input measurements, different measurement sampling rates, and different unrolling steps. Code for SMUG will be available at https://github.com/LGM70/SMUG.
    Estimating the randomness of quantum circuit ensembles up to 50 qubits. (arXiv:2205.09900v3 [quant-ph] UPDATED)
    Random quantum circuits have been utilized in the contexts of quantum supremacy demonstrations, variational quantum algorithms for chemistry and machine learning, and blackhole information. The ability of random circuits to approximate any random unitaries has consequences on their complexity, expressibility, and trainability. To study this property of random circuits, we develop numerical protocols for estimating the frame potential, the distance between a given ensemble and the exact randomness. Our tensor-network-based algorithm has polynomial complexity for shallow circuits and is high-performing using CPU and GPU parallelism. We study 1. local and parallel random circuits to verify the linear growth in complexity as stated by the Brown-Susskind conjecture, and; 2. hardware-efficient ans\"atze to shed light on its expressibility and the barren plateau problem in the context of variational algorithms. Our work shows that large-scale tensor network simulations could provide important hints toward open problems in quantum information science.
    LocalEyenet: Deep Attention framework for Localization of Eyes. (arXiv:2303.12728v1 [cs.CV])
    Development of human machine interface has become a necessity for modern day machines to catalyze more autonomy and more efficiency. Gaze driven human intervention is an effective and convenient option for creating an interface to alleviate human errors. Facial landmark detection is very crucial for designing a robust gaze detection system. Regression based methods capacitate good spatial localization of the landmarks corresponding to different parts of the faces. But there are still scope of improvements which have been addressed by incorporating attention. In this paper, we have proposed a deep coarse-to-fine architecture called LocalEyenet for localization of only the eye regions that can be trained end-to-end. The model architecture, build on stacked hourglass backbone, learns the self-attention in feature maps which aids in preserving global as well as local spatial dependencies in face image. We have incorporated deep layer aggregation in each hourglass to minimize the loss of attention over the depth of architecture. Our model shows good generalization ability in cross-dataset evaluation and in real-time localization of eyes.
    Error Analysis of Physics-Informed Neural Networks for Approximating Dynamic PDEs of Second Order in Time. (arXiv:2303.12245v1 [math.NA])
    We consider the approximation of a class of dynamic partial differential equations (PDE) of second order in time by the physics-informed neural network (PINN) approach, and provide an error analysis of PINN for the wave equation, the Sine-Gordon equation and the linear elastodynamic equation. Our analyses show that, with feed-forward neural networks having two hidden layers and the $\tanh$ activation function, the PINN approximation errors for the solution field, its time derivative and its gradient field can be effectively bounded by the training loss and the number of training data points (quadrature points). Our analyses further suggest new forms for the training loss function, which contain certain residuals that are crucial to the error estimate but would be absent from the canonical PINN loss formulation. Adopting these new forms for the loss function leads to a variant PINN algorithm. We present ample numerical experiments with the new PINN algorithm for the wave equation, the Sine-Gordon equation and the linear elastodynamic equation, which show that the method can capture the solution well.
    ExBEHRT: Extended Transformer for Electronic Health Records to Predict Disease Subtypes & Progressions. (arXiv:2303.12364v1 [cs.LG])
    In this study, we introduce ExBEHRT, an extended version of BEHRT (BERT applied to electronic health records), and apply different algorithms to interpret its results. While BEHRT considers only diagnoses and patient age, we extend the feature space to several multimodal records, namely demographics, clinical characteristics, vital signs, smoking status, diagnoses, procedures, medications, and laboratory tests, by applying a novel method to unify the frequencies and temporal dimensions of the different features. We show that additional features significantly improve model performance for various downstream tasks in different diseases. To ensure robustness, we interpret model predictions using an adaptation of expected gradients, which has not been previously applied to transformers with EHR data and provides more granular interpretations than previous approaches such as feature and token importances. Furthermore, by clustering the model representations of oncology patients, we show that the model has an implicit understanding of the disease and is able to classify patients with the same cancer type into different risk groups. Given the additional features and interpretability, ExBEHRT can help make informed decisions about disease trajectories, diagnoses, and risk factors of various diseases.
    Deployment of Image Analysis Algorithms under Prevalence Shifts. (arXiv:2303.12540v1 [cs.CV])
    Domain gaps are among the most relevant roadblocks in the clinical translation of machine learning (ML)-based solutions for medical image analysis. While current research focuses on new training paradigms and network architectures, little attention is given to the specific effect of prevalence shifts on an algorithm deployed in practice. Such discrepancies between class frequencies in the data used for a method's development/validation and that in its deployment environment(s) are of great importance, for example in the context of artificial intelligence (AI) democratization, as disease prevalences may vary widely across time and location. Our contribution is twofold. First, we empirically demonstrate the potentially severe consequences of missing prevalence handling by analyzing (i) the extent of miscalibration, (ii) the deviation of the decision threshold from the optimum, and (iii) the ability of validation metrics to reflect neural network performance on the deployment population as a function of the discrepancy between development and deployment prevalence. Second, we propose a workflow for prevalence-aware image classification that uses estimated deployment prevalences to adjust a trained classifier to a new environment, without requiring additional annotated deployment data. Comprehensive experiments based on a diverse set of 30 medical classification tasks showcase the benefit of the proposed workflow in generating better classifier decisions and more reliable performance estimates compared to current practice.
    Democratising AI: Multiple Meanings, Goals, and Methods. (arXiv:2303.12642v1 [cs.AI])
    Numerous parties are calling for the democratisation of AI, but the phrase is used to refer to a variety of goals, the pursuit of which sometimes conflict. This paper identifies four kinds of AI democratisation that are commonly discussed: (1) the democratisation of AI use, (2) the democratisation of AI development, (3) the democratisation of AI profits, and (4) the democratisation of AI governance. Numerous goals and methods of achieving each form of democratisation are discussed. The main takeaway from this paper is that AI democratisation is a multifarious and sometimes conflicting concept that should not be conflated with improving AI accessibility. If we want to move beyond ambiguous commitments to democratising AI, to productive discussions of concrete policies and trade-offs, then we need to recognise the principal role of the democratisation of AI governance in navigating tradeoffs and risks across decisions around use, development, and profits.
    Anomaly Detection in Aeronautics Data with Quantum-compatible Discrete Deep Generative Model. (arXiv:2303.12302v1 [cs.LG])
    Deep generative learning cannot only be used for generating new data with statistical characteristics derived from input data but also for anomaly detection, by separating nominal and anomalous instances based on their reconstruction quality. In this paper, we explore the performance of three unsupervised deep generative models -- variational autoencoders (VAEs) with Gaussian, Bernoulli, and Boltzmann priors -- in detecting anomalies in flight-operations data of commercial flights consisting of multivariate time series. We devised two VAE models with discrete latent variables (DVAEs), one with a factorized Bernoulli prior and one with a restricted Boltzmann machine (RBM) as prior, because of the demand for discrete-variable models in machine-learning applications and because the integration of quantum devices based on two-level quantum systems requires such models. The DVAE with RBM prior, using a relatively simple -- and classically or quantum-mechanically enhanceable -- sampling technique for the evolution of the RBM's negative phase, performed better than the Bernoulli DVAE and on par with the Gaussian model, which has a continuous latent space. Our studies demonstrate the competitiveness of a discrete deep generative model with its Gaussian counterpart on anomaly-detection tasks. Moreover, the DVAE model with RBM prior can be easily integrated with quantum sampling by outsourcing its generative process to measurements of quantum states obtained from a quantum annealer or gate-model device.  ( 3 min )
    AUTO: Adaptive Outlier Optimization for Online Test-Time OOD Detection. (arXiv:2303.12267v1 [cs.LG])
    Out-of-distribution (OOD) detection is a crucial aspect of deploying machine learning models in open-world applications. Empirical evidence suggests that training with auxiliary outliers substantially improves OOD detection. However, such outliers typically exhibit a distribution gap compared to the test OOD data and do not cover all possible test OOD scenarios. Additionally, incorporating these outliers introduces additional training burdens. In this paper, we introduce a novel paradigm called test-time OOD detection, which utilizes unlabeled online data directly at test time to improve OOD detection performance. While this paradigm is efficient, it also presents challenges such as catastrophic forgetting. To address these challenges, we propose adaptive outlier optimization (AUTO), which consists of an in-out-aware filter, an ID memory bank, and a semantically-consistent objective. AUTO adaptively mines pseudo-ID and pseudo-OOD samples from test data, utilizing them to optimize networks in real time during inference. Extensive results on CIFAR-10, CIFAR-100, and ImageNet benchmarks demonstrate that AUTO significantly enhances OOD detection performance.
    Prototype Helps Federated Learning: Towards Faster Convergence. (arXiv:2303.12296v1 [cs.LG])
    Federated learning (FL) is a distributed machine learning technique in which multiple clients cooperate to train a shared model without exchanging their raw data. However, heterogeneity of data distribution among clients usually leads to poor model inference. In this paper, a prototype-based federated learning framework is proposed, which can achieve better inference performance with only a few changes to the last global iteration of the typical federated learning process. In the last iteration, the server aggregates the prototypes transmitted from distributed clients and then sends them back to local clients for their respective model inferences. Experiments on two baseline datasets show that our proposal can achieve higher accuracy (at least 1%) and relatively efficient communication than two popular baselines under different heterogeneous settings.
    Reducing Air Pollution through Machine Learning. (arXiv:2303.12285v1 [cs.LG])
    This paper presents a data-driven approach to mitigate the effects of air pollution from industrial plants on nearby cities by linking operational decisions with weather conditions. Our method combines predictive and prescriptive machine learning models to forecast short-term wind speed and direction and recommend operational decisions to reduce or pause the industrial plant's production. We exhibit several trade-offs between reducing environmental impact and maintaining production activities. The predictive component of our framework employs various machine learning models, such as gradient-boosted tree-based models and ensemble methods, for time series forecasting. The prescriptive component utilizes interpretable optimal policy trees to propose multiple trade-offs, such as reducing dangerous emissions by 33-47% and unnecessary costs by 40-63%. Our deployed models significantly reduced forecasting errors, with a range of 38-52% for less than 12-hour lead time and 14-46% for 12 to 48-hour lead time compared to official weather forecasts. We have successfully implemented the predictive component at the OCP Safi site, which is Morocco's largest chemical industrial plant, and are currently in the process of deploying the prescriptive component. Our framework enables sustainable industrial development by eliminating the pollution-industrial activity trade-off through data-driven weather-based operational decisions, significantly enhancing factory optimization and sustainability. This modernizes factory planning and resource allocation while maintaining environmental compliance. The predictive component has boosted production efficiency, leading to cost savings and reduced environmental impact by minimizing air pollution.
    Training Multilayer Perceptrons by Sampling with Quantum Annealers. (arXiv:2303.12352v1 [cs.LG])
    A successful application of quantum annealing to machine learning is training restricted Boltzmann machines (RBM). However, many neural networks for vision applications are feedforward structures, such as multilayer perceptrons (MLP). Backpropagation is currently the most effective technique to train MLPs for supervised learning. This paper aims to be forward-looking by exploring the training of MLPs using quantum annealers. We exploit an equivalence between MLPs and energy-based models (EBM), which are a variation of RBMs with a maximum conditional likelihood objective. This leads to a strategy to train MLPs with quantum annealers as a sampling engine. We prove our setup for MLPs with sigmoid activation functions and one hidden layer, and demonstrated training of binary image classifiers on small subsets of the MNIST and Fashion-MNIST datasets using the D-Wave quantum annealer. Although problem sizes that are feasible on current annealers are limited, we obtained comprehensive results on feasible instances that validate our ideas. Our work establishes the potential of quantum computing for training MLPs.
    Infrastructure-based End-to-End Learning and Prevention of Driver Failure. (arXiv:2303.12224v1 [cs.RO])
    Intelligent intersection managers can improve safety by detecting dangerous drivers or failure modes in autonomous vehicles, warning oncoming vehicles as they approach an intersection. In this work, we present FailureNet, a recurrent neural network trained end-to-end on trajectories of both nominal and reckless drivers in a scaled miniature city. FailureNet observes the poses of vehicles as they approach an intersection and detects whether a failure is present in the autonomy stack, warning cross-traffic of potentially dangerous drivers. FailureNet can accurately identify control failures, upstream perception errors, and speeding drivers, distinguishing them from nominal driving. The network is trained and deployed with autonomous vehicles in the MiniCity. Compared to speed or frequency-based predictors, FailureNet's recurrent neural network structure provides improved predictive power, yielding upwards of 84% accuracy when deployed on hardware.  ( 2 min )
    Delay-Aware Hierarchical Federated Learning. (arXiv:2303.12414v1 [cs.LG])
    Federated learning has gained popularity as a means of training models distributed across the wireless edge. The paper introduces delay-aware federated learning (DFL) to improve the efficiency of distributed machine learning (ML) model training by addressing communication delays between edge and cloud. DFL employs multiple stochastic gradient descent iterations on device datasets during each global aggregation interval and intermittently aggregates model parameters through edge servers in local subnetworks. The cloud server synchronizes the local models with the global deployed model computed via a local-global combiner at global synchronization. The convergence behavior of DFL is theoretically investigated under a generalized data heterogeneity metric. A set of conditions is obtained to achieve the sub-linear convergence rate of O(1/k). Based on these findings, an adaptive control algorithm is developed for DFL, implementing policies to mitigate energy consumption and edge-to-cloud communication latency while aiming for a sublinear convergence rate. Numerical evaluations show DFL's superior performance in terms of faster global model convergence, reduced resource consumption, and robustness against communication delays compared to existing FL algorithms. In summary, this proposed method offers improved efficiency and satisfactory results when dealing with both convex and non-convex loss functions.  ( 2 min )
    Learning Human-Inspired Force Strategies for Robotic Assembly. (arXiv:2303.12440v1 [cs.RO])
    The programming of robotic assembly tasks is a key component in manufacturing and automation. Force-sensitive assembly, however, often requires reactive strategies to handle slight changes in positioning and unforeseen part jamming. Learning such strategies from human performance is a promising approach, but faces two common challenges: the handling of low part clearances which is difficult to capture from demonstrations and learning intuitive strategies offline without access to the real hardware. We address these two challenges by learning probabilistic force strategies from data that are easily acquired offline in a robot-less simulation from human demonstrations with a joystick. We combine a Long Short Term Memory (LSTM) and a Mixture Density Network (MDN) to model human-inspired behavior in such a way that the learned strategies transfer easily onto real hardware. The experiments show a UR10e robot that completes a plastic assembly with clearances of less than 100 micrometers whose strategies were solely demonstrated in simulation.
    Deep Reinforcement Learning for Localizability-Enhanced Navigation in Dynamic Human Environments. (arXiv:2303.12354v1 [cs.RO])
    Reliable localization is crucial for autonomous robots to navigate efficiently and safely. Some navigation methods can plan paths with high localizability (which describes the capability of acquiring reliable localization). By following these paths, the robot can access the sensor streams that facilitate more accurate location estimation results by the localization algorithms. However, most of these methods require prior knowledge and struggle to adapt to unseen scenarios or dynamic changes. To overcome these limitations, we propose a novel approach for localizability-enhanced navigation via deep reinforcement learning in dynamic human environments. Our proposed planner automatically extracts geometric features from 2D laser data that are helpful for localization. The planner learns to assign different importance to the geometric features and encourages the robot to navigate through areas that are helpful for laser localization. To facilitate the learning of the planner, we suggest two techniques: (1) an augmented state representation that considers the dynamic changes and the confidence of the localization results, which provides more information and allows the robot to make better decisions, (2) a reward metric that is capable to offer both sparse and dense feedback on behaviors that affect localization accuracy. Our method exhibits significant improvements in lost rate and arrival rate when tested in previously unseen environments.  ( 2 min )
    Re-thinking Federated Active Learning based on Inter-class Diversity. (arXiv:2303.12317v1 [cs.CV])
    Although federated learning has made awe-inspiring advances, most studies have assumed that the client's data are fully labeled. However, in a real-world scenario, every client may have a significant amount of unlabeled instances. Among the various approaches to utilizing unlabeled data, a federated active learning framework has emerged as a promising solution. In the decentralized setting, there are two types of available query selector models, namely 'global' and 'local-only' models, but little literature discusses their performance dominance and its causes. In this work, we first demonstrate that the superiority of two selector models depends on the global and local inter-class diversity. Furthermore, we observe that the global and local-only models are the keys to resolving the imbalance of each side. Based on our findings, we propose LoGo, a FAL sampling strategy robust to varying local heterogeneity levels and global imbalance ratio, that integrates both models by two steps of active selection scheme. LoGo consistently outperforms six active learning strategies in the total number of 38 experimental settings.  ( 2 min )
    EasyDGL: Encode, Train and Interpret for Continuous-time Dynamic Graph Learning. (arXiv:2303.12341v1 [cs.LG])
    Dynamic graphs arise in various real-world applications, and it is often welcomed to model the dynamics directly in continuous time domain for its flexibility. This paper aims to design an easy-to-use pipeline (termed as EasyDGL which is also due to its implementation by DGL toolkit) composed of three key modules with both strong fitting ability and interpretability. Specifically the proposed pipeline which involves encoding, training and interpreting: i) a temporal point process (TPP) modulated attention architecture to endow the continuous-time resolution with the coupled spatiotemporal dynamics of the observed graph with edge-addition events; ii) a principled loss composed of task-agnostic TPP posterior maximization based on observed events on the graph, and a task-aware loss with a masking strategy over dynamic graph, where the covered tasks include dynamic link prediction, dynamic node classification and node traffic forecasting; iii) interpretation of the model outputs (e.g., representations and predictions) with scalable perturbation-based quantitative analysis in the graph Fourier domain, which could more comprehensively reflect the behavior of the learned model. Extensive experimental results on public benchmarks show the superior performance of our EasyDGL for time-conditioned predictive tasks, and in particular demonstrate that EasyDGL can effectively quantify the predictive power of frequency content that a model learn from the evolving graph data.  ( 2 min )
    Joint Liver and Hepatic Lesion Segmentation in MRI using a Hybrid CNN with Transformer Layers. (arXiv:2201.10981v3 [eess.IV] UPDATED)
    Deep learning-based segmentation of the liver and hepatic lesions therein steadily gains relevance in clinical practice due to the increasing incidence of liver cancer each year. Whereas various network variants with overall promising results in the field of medical image segmentation have been successfully developed over the last years, almost all of them struggle with the challenge of accurately segmenting hepatic lesions in magnetic resonance imaging (MRI). This led to the idea of combining elements of convolutional and transformer-based architectures to overcome the existing limitations. This work presents a hybrid network called SWTR-Unet, consisting of a pretrained ResNet, transformer blocks as well as a common Unet-style decoder path. This network was primarily applied to single-modality non-contrast-enhanced liver MRI and additionally to the publicly available computed tomography (CT) data of the liver tumor segmentation (LiTS) challenge to verify the applicability on other modalities. For a broader evaluation, multiple state-of-the-art networks were implemented and applied, ensuring a direct comparability. Furthermore, correlation analysis and an ablation study were carried out, to investigate various influencing factors on the segmentation accuracy of the presented method. With Dice scores of averaged 98+-2% for liver and 81+-28% lesion segmentation on the MRI dataset and 97+-2% and 79+-25%, respectively on the CT dataset, the proposed SWTR-Unet proved to be a precise approach for liver and hepatic lesion segmentation with state-of-the-art results for MRI and competing accuracy in CT imaging. The achieved segmentation accuracy was found to be on par with manually performed expert segmentations as indicated by inter-observer variabilities for liver lesion segmentation. In conclusion, the presented method could save valuable time and resources in clinical practice.
    Reply to: Inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems. (arXiv:2303.12096v1 [cs.LG])
    We provide a comprehensive reply to the comment written by Stefan Boettcher [arXiv:2210.00623] and argue that the comment singles out one particular non-representative example problem, entirely focusing on the maximum cut problem (MaxCut) on sparse graphs, for which greedy algorithms are expected to perform well. Conversely, we highlight the broader algorithmic development underlying our original work, and (within our original framework) provide additional numerical results showing sizable improvements over our original data, thereby refuting the comment's original performance statements. Furthermore, it has already been shown that physics-inspired graph neural networks (PI-GNNs) can outperform greedy algorithms, in particular on hard, dense instances. We also argue that the internal (parallel) anatomy of graph neural networks is very different from the (sequential) nature of greedy algorithms, and (based on their usage at the scale of real-world social networks) point out that graph neural networks have demonstrated their potential for superior scalability compared to existing heuristics such as extremal optimization. Finally, we conclude highlighting the conceptual novelty of our work and outline some potential extensions.  ( 3 min )
    Viscoelastic Constitutive Artificial Neural Networks (vCANNs) $-$ a framework for data-driven anisotropic nonlinear finite viscoelasticity. (arXiv:2303.12164v1 [cond-mat.mtrl-sci])
    The constitutive behavior of polymeric materials is often modeled by finite linear viscoelastic (FLV) or quasi-linear viscoelastic (QLV) models. These popular models are simplifications that typically cannot accurately capture the nonlinear viscoelastic behavior of materials. For example, the success of attempts to capture strain rate-dependent behavior has been limited so far. To overcome this problem, we introduce viscoelastic Constitutive Artificial Neural Networks (vCANNs), a novel physics-informed machine learning framework for anisotropic nonlinear viscoelasticity at finite strains. vCANNs rely on the concept of generalized Maxwell models enhanced with nonlinear strain (rate)-dependent properties represented by neural networks. The flexibility of vCANNs enables them to automatically identify accurate and sparse constitutive models of a broad range of materials. To test vCANNs, we trained them on stress-strain data from Polyvinyl Butyral, the electro-active polymers VHB 4910 and 4905, and a biological tissue, the rectus abdominis muscle. Different loading conditions were considered, including relaxation tests, cyclic tension-compression tests, and blast loads. We demonstrate that vCANNs can learn to capture the behavior of all these materials accurately and computationally efficiently without human guidance.  ( 2 min )
    Mixing Backward- with Forward-Chaining for Metacognitive Skill Acquisition and Transfer. (arXiv:2303.12223v1 [cs.HC])
    Metacognitive skills have been commonly associated with preparation for future learning in deductive domains. Many researchers have regarded strategy- and time-awareness as two metacognitive skills that address how and when to use a problem-solving strategy, respectively. It was shown that students who are both strategy-and time-aware (StrTime) outperformed their nonStrTime peers across deductive domains. In this work, students were trained on a logic tutor that supports a default forward-chaining (FC) and a backward-chaining (BC) strategy. We investigated the impact of mixing BC with FC on teaching strategy- and time-awareness for nonStrTime students. During the logic instruction, the experimental students (Exp) were provided with two BC worked examples and some problems in BC to practice how and when to use BC. Meanwhile, their control (Ctrl) and StrTime peers received no such intervention. Six weeks later, all students went through a probability tutor that only supports BC to evaluate whether the acquired metacognitive skills are transferred from logic. Our results show that on both tutors, Exp outperformed Ctrl and caught up with StrTime.  ( 2 min )
    Learning a Depth Covariance Function. (arXiv:2303.12157v1 [cs.CV])
    We propose learning a depth covariance function with applications to geometric vision tasks. Given RGB images as input, the covariance function can be flexibly used to define priors over depth functions, predictive distributions given observations, and methods for active point selection. We leverage these techniques for a selection of downstream tasks: depth completion, bundle adjustment, and monocular dense visual odometry.  ( 2 min )
    Thrill-K Architecture: Towards a Solution to the Problem of Knowledge Based Understanding. (arXiv:2303.12084v1 [cs.LG])
    While end-to-end learning systems are rapidly gaining capabilities and popularity, the increasing computational demands for deploying such systems, along with a lack of flexibility, adaptability, explainability, reasoning and verification capabilities, require new types of architectures. Here we introduce a classification of hybrid systems which, based on an analysis of human knowledge and intelligence, combines neural learning with various types of knowledge and knowledge sources. We present the Thrill-K architecture as a prototypical solution for integrating instantaneous knowledge, standby knowledge and external knowledge sources in a framework capable of inference, learning and intelligent control.  ( 2 min )
    Improving Fabrication Fidelity of Integrated Nanophotonic Devices Using Deep Learning. (arXiv:2303.12136v1 [cs.LG])
    Next-generation integrated nanophotonic device designs leverage advanced optimization techniques such as inverse design and topology optimization which achieve high performance and extreme miniaturization by optimizing a massively complex design space enabled by small feature sizes. However, unless the optimization is heavily constrained, the generated small features are not reliably fabricated, leading to optical performance degradation. Even for simpler, conventional designs, fabrication-induced performance degradation still occurs. The degree of deviation from the original design not only depends on the size and shape of its features, but also on the distribution of features and the surrounding environment, presenting complex, proximity-dependent behavior. Without proprietary fabrication process specifications, design corrections can only be made after calibrating fabrication runs take place. In this work, we introduce a general deep machine learning model that automatically corrects photonic device design layouts prior to first fabrication. Only a small set of scanning electron microscopy images of engineered training features are required to create the deep learning model. With correction, the outcome of the fabricated layout is closer to what is intended, and thus so too is the performance of the design. Without modifying the nanofabrication process, adding significant computation in design, or requiring proprietary process specifications, we believe our model opens the door to new levels of reliability and performance in next-generation photonic circuits.  ( 2 min )
    CLSA: Contrastive Learning-based Survival Analysis for Popularity Prediction in MEC Networks. (arXiv:2303.12097v1 [cs.LG])
    Mobile Edge Caching (MEC) integrated with Deep Neural Networks (DNNs) is an innovative technology with significant potential for the future generation of wireless networks, resulting in a considerable reduction in users' latency. The MEC network's effectiveness, however, heavily relies on its capacity to predict and dynamically update the storage of caching nodes with the most popular contents. To be effective, a DNN-based popularity prediction model needs to have the ability to understand the historical request patterns of content, including their temporal and spatial correlations. Existing state-of-the-art time-series DNN models capture the latter by simultaneously inputting the sequential request patterns of multiple contents to the network, considerably increasing the size of the input sample. This motivates us to address this challenge by proposing a DNN-based popularity prediction framework based on the idea of contrasting input samples against each other, designed for the Unmanned Aerial Vehicle (UAV)-aided MEC networks. Referred to as the Contrastive Learning-based Survival Analysis (CLSA), the proposed architecture consists of a self-supervised Contrastive Learning (CL) model, where the temporal information of sequential requests is learned using a Long Short Term Memory (LSTM) network as the encoder of the CL architecture. Followed by a Survival Analysis (SA) network, the output of the proposed CLSA architecture is probabilities for each content's future popularity, which are then sorted in descending order to identify the Top-K popular contents. Based on the simulation results, the proposed CLSA architecture outperforms its counterparts across the classification accuracy and cache-hit ratio.  ( 3 min )
    Community detection in complex networks via node similarity, graph representation learning, and hierarchical clustering. (arXiv:2303.12212v1 [cs.SI])
    Community detection is a critical challenge in the analysis of real-world graphs and complex networks, including social, transportation, citation, cybersecurity networks, and food webs. Motivated by many similarities between community detection and clustering in Euclidean spaces, we propose three algorithm frameworks to apply hierarchical clustering methods for community detection in graphs. We show that using our methods, it is possible to apply various linkage-based (single-, complete-, average- linkage, Ward, Genie) clustering algorithms to find communities based on vertex similarity matrices, eigenvector matrices thereof, and Euclidean vector representations of nodes. We convey a comprehensive analysis of choices for each framework, including state-of-the-art graph representation learning algorithms, such as Deep Neural Graph Representation, and a vertex proximity matrix known to yield high-quality results in machine learning -- Positive Pointwise Mutual Information. Overall, we test over a hundred combinations of framework components and show that some -- including Wasserman-Faust and PPMI proximity, DNGR representation -- can compete with algorithms such as state-of-the-art Leiden and Louvain and easily outperform other known community detection algorithms. Notably, our algorithms remain hierarchical and allow the user to specify any number of clusters a priori.  ( 2 min )
    Exploring the Benefits of Visual Prompting in Differential Privacy. (arXiv:2303.12247v1 [cs.CV])
    Visual Prompting (VP) is an emerging and powerful technique that allows sample-efficient adaptation to downstream tasks by engineering a well-trained frozen source model. In this work, we explore the benefits of VP in constructing compelling neural network classifiers with differential privacy (DP). We explore and integrate VP into canonical DP training methods and demonstrate its simplicity and efficiency. In particular, we discover that VP in tandem with PATE, a state-of-the-art DP training method that leverages the knowledge transfer from an ensemble of teachers, achieves the state-of-the-art privacy-utility trade-off with minimum expenditure of privacy budget. Moreover, we conduct additional experiments on cross-domain image classification with a sufficient domain gap to further unveil the advantage of VP in DP. Lastly, we also conduct extensive ablation studies to validate the effectiveness and contribution of VP under DP consideration.  ( 2 min )
    Analytical Conjugate Priors for Subclasses of Generalized Pareto Distributions. (arXiv:2303.12199v1 [cs.LG])
    This article is written for pedagogical purposes aiming at practitioners trying to estimate the finite support of continuous probability distributions, i.e., the minimum and the maximum of a distribution defined on a finite domain. Generalized Pareto distribution GP({\theta}, {\sigma}, {\xi}) is a three-parameter distribution which plays a key role in Peaks-Over-Threshold framework for tail estimation in Extreme Value Theory. Estimators for GP often lack analytical solutions and the best known Bayesian methods for GP involves numerical methods. Moreover, existing literature focuses on estimating the scale {\sigma} and the shape {\xi}, lacking discussion of the estimation of the location {\theta} which is the lower support of (minimum value possible in) a GP. To fill the gap, we analyze four two-parameter subclasses of GP whose conjugate priors can be obtained analytically, although some of the results are known. Namely, we prove the conjugacy for {\xi} > 0 (Pareto), {\xi} = 0 (Shifted Exponential), {\xi} < 0 (Power), and {\xi} = -1 (Two-parameter Uniform).  ( 2 min )
    Universal Approximation Property of Hamiltonian Deep Neural Networks. (arXiv:2303.12147v1 [cs.LG])
    This paper investigates the universal approximation capabilities of Hamiltonian Deep Neural Networks (HDNNs) that arise from the discretization of Hamiltonian Neural Ordinary Differential Equations. Recently, it has been shown that HDNNs enjoy, by design, non-vanishing gradients, which provide numerical stability during training. However, although HDNNs have demonstrated state-of-the-art performance in several applications, a comprehensive study to quantify their expressivity is missing. In this regard, we provide a universal approximation theorem for HDNNs and prove that a portion of the flow of HDNNs can approximate arbitrary well any continuous function over a compact domain. This result provides a solid theoretical foundation for the practical use of HDNNs.  ( 2 min )
    Secure Aggregation in Federated Learning is not Private: Leaking User Data at Large Scale through Model Modification. (arXiv:2303.12233v1 [cs.LG])
    Security and privacy are important concerns in machine learning. End user devices often contain a wealth of data and this information is sensitive and should not be shared with servers or enterprises. As a result, federated learning was introduced to enable machine learning over large decentralized datasets while promising privacy by eliminating the need for data sharing. However, prior work has shown that shared gradients often contain private information and attackers can gain knowledge either through malicious modification of the architecture and parameters or by using optimization to approximate user data from the shared gradients. Despite this, most attacks have so far been limited in scale of number of clients, especially failing when client gradients are aggregated together using secure model aggregation. The attacks that still function are strongly limited in the number of clients attacked, amount of training samples they leak, or number of iterations they take to be trained. In this work, we introduce MANDRAKE, an attack that overcomes previous limitations to directly leak large amounts of client data even under secure aggregation across large numbers of clients. Furthermore, we break the anonymity of aggregation as the leaked data is identifiable and directly tied back to the clients they come from. We show that by sending clients customized convolutional parameters, the weight gradients of data points between clients will remain separate through aggregation. With an aggregation across many clients, prior work could only leak less than 1% of images. With the same number of non-zero parameters, and using only a single training iteration, MANDRAKE leaks 70-80% of data samples.  ( 3 min )
    Policy Optimization for Personalized Interventions in Behavioral Health. (arXiv:2303.12206v1 [cs.LG])
    Problem definition: Behavioral health interventions, delivered through digital platforms, have the potential to significantly improve health outcomes, through education, motivation, reminders, and outreach. We study the problem of optimizing personalized interventions for patients to maximize some long-term outcome, in a setting where interventions are costly and capacity-constrained. Methodology/results: This paper provides a model-free approach to solving this problem. We find that generic model-free approaches from the reinforcement learning literature are too data intensive for healthcare applications, while simpler bandit approaches make progress at the expense of ignoring long-term patient dynamics. We present a new algorithm we dub DecompPI that approximates one step of policy iteration. Implementing DecompPI simply consists of a prediction task from offline data, alleviating the need for online experimentation. Theoretically, we show that under a natural set of structural assumptions on patient dynamics, DecompPI surprisingly recovers at least 1/2 of the improvement possible between a naive baseline policy and the optimal policy. At the same time, DecompPI is both robust to estimation errors and interpretable. Through an empirical case study on a mobile health platform for improving treatment adherence for tuberculosis, we find that DecompPI can provide the same efficacy as the status quo with approximately half the capacity of interventions. Managerial implications: DecompPI is general and is easily implementable for organizations aiming to improve long-term behavior through targeted interventions. Our case study suggests that the platform's costs of deploying interventions can potentially be cut by 50%, which facilitates the ability to scale up the system in a cost-efficient fashion.  ( 2 min )
    Neural Pre-Processing: A Learning Framework for End-to-end Brain MRI Pre-processing. (arXiv:2303.12148v1 [eess.IV])
    Head MRI pre-processing involves converting raw images to an intensity-normalized, skull-stripped brain in a standard coordinate space. In this paper, we propose an end-to-end weakly supervised learning approach, called Neural Pre-processing (NPP), for solving all three sub-tasks simultaneously via a neural network, trained on a large dataset without individual sub-task supervision. Because the overall objective is highly under-constrained, we explicitly disentangle geometric-preserving intensity mapping (skull-stripping and intensity normalization) and spatial transformation (spatial normalization). Quantitative results show that our model outperforms state-of-the-art methods which tackle only a single sub-task. Our ablation experiments demonstrate the importance of the architecture design we chose for NPP. Furthermore, NPP affords the user the flexibility to control each of these tasks at inference time. The code and model are freely-available at \url{https://github.com/Novestars/Neural-Pre-processing}.  ( 2 min )
    Fundamentals of Generative Large Language Models and Perspectives in Cyber-Defense. (arXiv:2303.12132v1 [cs.CL])
    Generative Language Models gained significant attention in late 2022 / early 2023, notably with the introduction of models refined to act consistently with users' expectations of interactions with AI (conversational models). Arguably the focal point of public attention has been such a refinement of the GPT3 model -- the ChatGPT and its subsequent integration with auxiliary capabilities, including search as part of Microsoft Bing. Despite extensive prior research invested in their development, their performance and applicability to a range of daily tasks remained unclear and niche. However, their wider utilization without a requirement for technical expertise, made in large part possible through conversational fine-tuning, revealed the extent of their true capabilities in a real-world environment. This has garnered both public excitement for their potential applications and concerns about their capabilities and potential malicious uses. This review aims to provide a brief overview of the history, state of the art, and implications of Generative Language Models in terms of their principles, abilities, limitations, and future prospects -- especially in the context of cyber-defense, with a focus on the Swiss operational environment.  ( 2 min )
    A Random Projection k Nearest Neighbours Ensemble for Classification via Extended Neighbourhood Rule. (arXiv:2303.12210v1 [stat.ML])
    Ensembles based on k nearest neighbours (kNN) combine a large number of base learners, each constructed on a sample taken from a given training data. Typical kNN based ensembles determine the k closest observations in the training data bounded to a test sample point by a spherical region to predict its class. In this paper, a novel random projection extended neighbourhood rule (RPExNRule) ensemble is proposed where bootstrap samples from the given training data are randomly projected into lower dimensions for additional randomness in the base models and to preserve features information. It uses the extended neighbourhood rule (ExNRule) to fit kNN as base learners on randomly projected bootstrap samples.  ( 2 min )
    Training and Deploying Spiking NN Applications to the Mixed-Signal Neuromorphic Chip Dynap-SE2 with Rockpool. (arXiv:2303.12167v1 [cs.ET])
    Mixed-signal neuromorphic processors provide extremely low-power operation for edge inference workloads, taking advantage of sparse asynchronous computation within Spiking Neural Networks (SNNs). However, deploying robust applications to these devices is complicated by limited controllability over analog hardware parameters, unintended parameter and dynamics variations of analog circuits due to fabrication non-idealities. Here we demonstrate a novel methodology for offline training and deployment of spiking neural networks (SNNs) to the mixed-signal neuromorphic processor Dynap-SE2. The methodology utilizes an unsupervised weight quantization method to optimize the network's parameters, coupled with adversarial parameter noise injection during training. The optimized network is shown to be robust to the effects of quantization and device mismatch, making the method a promising candidate for real-world applications with hardware constraints. This work extends Rockpool, an open-source deep-learning library for SNNs, with support accurate simulation of mixed-signal SNN dynamics. Our approach simplifies the development and deployment process for the neuromorphic community, making mixed-signal neuromorphic processors more accessible to researchers and developers.  ( 2 min )
    MAGVLT: Masked Generative Vision-and-Language Transformer. (arXiv:2303.12208v1 [cs.CV])
    While generative modeling on multimodal image-text data has been actively developed with large-scale paired datasets, there have been limited attempts to generate both image and text data by a single model rather than a generation of one fixed modality conditioned on the other modality. In this paper, we explore a unified generative vision-and-language (VL) model that can produce both images and text sequences. Especially, we propose a generative VL transformer based on the non-autoregressive mask prediction, named MAGVLT, and compare it with an autoregressive generative VL transformer (ARGVLT). In comparison to ARGVLT, the proposed MAGVLT enables bidirectional context encoding, fast decoding by parallel token predictions in an iterative refinement, and extended editing capabilities such as image and text infilling. For rigorous training of our MAGVLT with image-text pairs from scratch, we combine the image-to-text, text-to-image, and joint image-and-text mask prediction tasks. Moreover, we devise two additional tasks based on the step-unrolled mask prediction and the selective prediction on the mixture of two image-text pairs. Experimental results on various downstream generation tasks of VL benchmarks show that our MAGVLT outperforms ARGVLT by a large margin even with significant inference speedup. Particularly, MAGVLT achieves competitive results on both zero-shot image-to-text and text-to-image generation tasks from MS-COCO by one moderate-sized model (fewer than 500M parameters) even without the use of monomodal data and networks.  ( 2 min )
    Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning. (arXiv:2303.12091v1 [cs.LG])
    Semi-supervised learning (SSL) methods assume that labeled data, unlabeled data and test data are from the same distribution. Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers). Most previous works focused on outlier detection via binary classifiers, which suffer from insufficient scalability and inability to distinguish different types of uncertainty. In this paper, we propose a novel framework, Adaptive Negative Evidential Deep Learning (ANEDL) to tackle these limitations. Concretely, we first introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference. Furthermore, we propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers. As demonstrated empirically, our proposed method outperforms existing state-of-the-art methods across four datasets.  ( 2 min )
    ChatGPT for Programming Numerical Methods. (arXiv:2303.12093v1 [cs.LG])
    ChatGPT is a large language model trained by OpenAI. In this technical report, we explore for the first time the capability of ChatGPT for programming numerical algorithms. Specifically, we examine the capability of GhatGPT for generating codes for numerical algorithms in different programming languages, for debugging and improving written codes by users, for completing missed parts of numerical codes, rewriting available codes in other programming languages, and for parallelizing serial codes. Additionally, we assess if ChatGPT can recognize if given codes are written by humans or machines. To reach this goal, we consider a variety of mathematical problems such as the Poisson equation, the diffusion equation, the incompressible Navier-Stokes equations, compressible inviscid flow, eigenvalue problems, solving linear systems of equations, storing sparse matrices, etc. Furthermore, we exemplify scientific machine learning such as physics-informed neural networks and convolutional neural networks with applications to computational physics. Through these examples, we investigate the successes, failures, and challenges of ChatGPT. Examples of failures are producing singular matrices, operations on arrays with incompatible sizes, programming interruption for relatively long codes, etc. Our outcomes suggest that ChatGPT can successfully program numerical algorithms in different programming languages, but certain limitations and challenges exist that require further improvement of this machine learning model.  ( 2 min )
  • Open

    Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL. (arXiv:2206.14057v3 [cs.LG] UPDATED)
    Reward-free reinforcement learning (RF-RL), a recently introduced RL paradigm, relies on random action-taking to explore the unknown environment without any reward feedback information. While the primary goal of the exploration phase in RF-RL is to reduce the uncertainty in the estimated model with minimum number of trajectories, in practice, the agent often needs to abide by certain safety constraint at the same time. It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning. In this work, we make a first attempt to answer this question. In particular, we consider the scenario where a safe baseline policy is known beforehand, and propose a unified Safe reWard-frEe ExploraTion (SWEET) framework. We then particularize the SWEET framework to the tabular and the low-rank MDP settings, and develop algorithms coined Tabular-SWEET and Low-rank-SWEET, respectively. Both algorithms leverage the concavity and continuity of the newly introduced truncated value functions, and are guaranteed to achieve zero constraint violation during exploration with high probability. Furthermore, both algorithms can provably find a near-optimal policy subject to any constraint in the planning phase. Remarkably, the sample complexities under both algorithms match or even outperform the state of the art in their constraint-free counterparts up to some constant factors, proving that safety constraint hardly increases the sample complexity for RF-RL.
    Unbiased Supervised Contrastive Learning. (arXiv:2211.05568v3 [cs.LG] UPDATED)
    Many datasets are biased, namely they contain easy-to-learn features that are highly correlated with the target class only in the dataset but not in the true underlying distribution of the data. For this reason, learning unbiased models from biased data has become a very relevant research topic in the last years. In this work, we tackle the problem of learning representations that are robust to biases. We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses (InfoNCE, SupCon, etc.) can fail when dealing with biased data. Based on that, we derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples. Furthermore, thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data. We validate the proposed losses on standard vision datasets including CIFAR10, CIFAR100, and ImageNet, and we assess the debiasing capability of FairKL with epsilon-SupInfoNCE, reaching state-of-the-art performance on a number of biased datasets, including real instances of biases in the wild.
    Retire: Robust Expectile Regression in High Dimensions. (arXiv:2212.05562v2 [stat.ME] UPDATED)
    High-dimensional data can often display heterogeneity due to heteroscedastic variance or inhomogeneous covariate effects. Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data. The former is computationally challenging due to the non-smooth nature of the check loss, and the latter is sensitive to heavy-tailed error distributions. In this paper, we propose and study (penalized) robust expectile regression (retire), with a focus on iteratively reweighted $\ell_1$-penalization which reduces the estimation bias from $\ell_1$-penalization and leads to oracle properties. Theoretically, we establish the statistical properties of the retire estimator under two regimes: (i) low-dimensional regime in which $d \ll n$; (ii) high-dimensional regime in which $s\ll n\ll d$ with $s$ denoting the number of significant predictors. In the high-dimensional setting, we carefully characterize the solution path of the iteratively reweighted $\ell_1$-penalized retire estimation, adapted from the local linear approximation algorithm for folded-concave regularization. Under a mild minimum signal strength condition, we show that after as many as $\log(\log d)$ iterations the final iterate enjoys the oracle convergence rate. At each iteration, the weighted $\ell_1$-penalized convex program can be efficiently solved by a semismooth Newton coordinate descent algorithm. Numerical studies demonstrate the competitive performance of the proposed procedure compared with either non-robust or quantile regression based alternatives.
    Robust and flexible learning of a high-dimensional classification rule using auxiliary outcomes. (arXiv:2011.05493v3 [stat.ME] UPDATED)
    Correlated outcomes are common in many practical problems. In some settings, one outcome is of particular interest, and others are auxiliary. To leverage information shared by all the outcomes, traditional multi-task learning (MTL) minimizes an averaged loss function over all the outcomes, which may lead to biased estimation for the target outcome, especially when the MTL model is mis-specified. In this work, based on a decomposition of estimation bias into two types, within-subspace and against-subspace, we develop a robust transfer learning approach to estimating a high-dimensional linear decision rule for the outcome of interest with the presence of auxiliary outcomes. The proposed method includes an MTL step using all outcomes to gain efficiency, and a subsequent calibration step using only the outcome of interest to correct both types of biases. We show that the final estimator can achieve a lower estimation error than the one using only the single outcome of interest. Simulations and real data analysis are conducted to justify the superiority of the proposed method.
    On Penalty-based Bilevel Gradient Descent Method. (arXiv:2302.05185v3 [cs.LG] UPDATED)
    Bilevel optimization enjoys a wide range of applications in hyper-parameter optimization, meta-learning and reinforcement learning. However, bilevel optimization problems are difficult to solve. Recent progress on scalable bilevel algorithms mainly focuses on bilevel optimization problems where the lower-level objective is either strongly convex or unconstrained. In this work, we tackle the bilevel problem through the lens of the penalty method. We show that under certain conditions, the penalty reformulation recovers the solutions of the original bilevel problem. Further, we propose the penalty-based bilevel gradient descent (PBGD) algorithm and establish its finite-time convergence for the constrained bilevel problem without lower-level strong convexity. Experiments showcase the efficiency of the proposed PBGD algorithm.
    On Finite-Step Convergence of the Non-Greedy Algorithm and Proximal Alternating Minimization Method with Extrapolation for $L_1$-Norm PCA. (arXiv:2302.07712v3 [math.OC] UPDATED)
    The classical non-greedy algorithm (NGA) and the recently proposed proximal alternating minimization method with extrapolation (PAMe) for $L_1$-norm PCA are revisited and their finite-step convergence are studied. It is first shown that NGA can be interpreted as a conditional subgradient or an alternating maximization method. By recognizing it as a conditional subgradient, we prove that the iterative points generated by the algorithm will be constant in finitely many steps under a certain full-rank assumption; such an assumption can be removed when the projection dimension is one. By treating the algorithm as an alternating maximization, we then prove that the objective value will be fixed after at most $\left\lceil\frac{F^{\max}}{\tau_0} \right\rceil$ steps, where the stopping point satisfies certain optimality conditions. Then, a slight modification of NGA with improved convergence properties is analyzed. It is shown that the iterative points generated by the modified algorithm will not change after at most $\left\lceil\frac{2F^{\max}}{\tau} \right\rceil$ steps; furthermore, the stopping point satisfies certain optimality conditions if the proximal parameter $\tau$ is small enough. For PAMe, it is proved that the sign variable will remain constant after finitely many steps and the algorithm can output a point satisfying certain optimality condition, if the parameters are small enough and a full rank assumption is satisfied. Moreover, if there is no proximal term on the projection matrix related subproblem, then the iterative points generated by this modified algorithm will not change after at most $\left\lceil \frac{4F^{\max}}{\tau(1-\gamma)} \right\rceil$ steps and the stopping point also satisfies certain optimality conditions, provided similar assumptions as those for PAMe. The full rank assumption can be removed when the projection dimension is one.
    Sequence Generation via Subsequence Similarity: Theory and Application to UAV Identification. (arXiv:2301.08403v2 [cs.LG] UPDATED)
    The ability to generate synthetic sequences is crucial for a wide range of applications, and recent advances in deep learning architectures and generative frameworks have greatly facilitated this process. Particularly, unconditional one-shot generative models constitute an attractive line of research that focuses on capturing the internal information of a single image or video to generate samples with similar contents. Since many of those one-shot models are shifting toward efficient non-deep and non-adversarial approaches, we examine the versatility of a one-shot generative model for augmenting whole datasets. In this work, we focus on how similarity at the subsequence level affects similarity at the sequence level, and derive bounds on the optimal transport of real and generated sequences based on that of corresponding subsequences. We use a one-shot generative model to sample from the vicinity of individual sequences and generate subsequence-similar ones and demonstrate the improvement of this approach by applying it to the problem of Unmanned Aerial Vehicle (UAV) identification using limited radio-frequency (RF) signals. In the context of UAV identification, RF fingerprinting is an effective method for distinguishing legitimate devices from malicious ones, but heterogenous environments and channel impairments can impose data scarcity and affect the performance of classification models. By using subsequence similarity to augment sequences of RF data with a low ratio (5%-20%) of training dataset, we achieve significant improvements in performance metrics such as accuracy, precision, recall, and F1 score.
    EDGI: Equivariant Diffusion for Planning with Embodied Agents. (arXiv:2303.12410v1 [cs.LG])
    Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries. Most algorithms for planning and model-based reinforcement learning (MBRL) do not take this rich geometric structure into account, leading to sample inefficiency and poor generalization. We introduce the Equivariant Diffuser for Generating Interactions (EDGI), an algorithm for MBRL and planning that is equivariant with respect to the product of the spatial symmetry group $\mathrm{SE(3)}$, the discrete-time translation group $\mathbb{Z}$, and the object permutation group $\mathrm{S}_n$. EDGI follows the Diffuser framework (Janner et al. 2022) in treating both learning a world model and planning in it as a conditional generative modeling problem, training a diffusion model on an offline trajectory dataset. We introduce a new $\mathrm{SE(3)} \times \mathbb{Z} \times \mathrm{S}_n$-equivariant diffusion model that supports multiple representations. We integrate this model in a planning loop, where conditioning and classifier-based guidance allow us to softly break the symmetry for specific tasks as needed. On navigation and object manipulation tasks, EDGI improves sample efficiency and generalization.
    Information-Based Sensor Placement for Data-Driven Estimation of Unsteady Flows. (arXiv:2303.12260v1 [physics.flu-dyn])
    Estimation of unsteady flow fields around flight vehicles may improve flow interactions and lead to enhanced vehicle performance. Although flow-field representations can be very high-dimensional, their dynamics can have low-order representations and may be estimated using a few, appropriately placed measurements. This paper presents a sensor-selection framework for the intended application of data-driven, flow-field estimation. This framework combines data-driven modeling, steady-state Kalman Filter design, and a sparsification technique for sequential selection of sensors. This paper also uses the sensor selection framework to design sensor arrays that can perform well across a variety of operating conditions. Flow estimation results on numerical data show that the proposed framework produces arrays that are highly effective at flow-field estimation for the flow behind and an airfoil at a high angle of attack using embedded pressure sensors. Analysis of the flow fields reveals that paths of impinging stagnation points along the airfoil's surface during a shedding period of the flow are highly informative locations for placement of pressure sensors.
    Bayesian stochastic blockmodeling. (arXiv:1705.10225v9 [stat.ML] UPDATED)
    This chapter provides a self-contained introduction to the use of Bayesian inference to extract large-scale modular structures from network data, based on the stochastic blockmodel (SBM), as well as its degree-corrected and overlapping generalizations. We focus on nonparametric formulations that allow their inference in a manner that prevents overfitting, and enables model selection. We discuss aspects of the choice of priors, in particular how to avoid underfitting via increased Bayesian hierarchies, and we contrast the task of sampling network partitions from the posterior distribution with finding the single point estimate that maximizes it, while describing efficient algorithms to perform either one. We also show how inferring the SBM can be used to predict missing and spurious links, and shed light on the fundamental limitations of the detectability of modular structures in networks.
    The generalization error of max-margin linear classifiers: Benign overfitting and high dimensional asymptotics in the overparametrized regime. (arXiv:1911.01544v3 [math.ST] UPDATED)
    Modern machine learning classifiers often exhibit vanishing classification error on the training set. They achieve this by learning nonlinear representations of the inputs that maps the data into linearly separable classes. Motivated by these phenomena, we revisit high-dimensional maximum margin classification for linearly separable data. We consider a stylized setting in which data $(y_i,{\boldsymbol x}_i)$, $i\le n$ are i.i.d. with ${\boldsymbol x}_i\sim\mathsf{N}({\boldsymbol 0},{\boldsymbol \Sigma})$ a $p$-dimensional Gaussian feature vector, and $y_i \in\{+1,-1\}$ a label whose distribution depends on a linear combination of the covariates $\langle {\boldsymbol \theta}_*,{\boldsymbol x}_i \rangle$. While the Gaussian model might appear extremely simplistic, universality arguments can be used to show that the results derived in this setting also apply to the output of certain nonlinear featurization maps. We consider the proportional asymptotics $n,p\to\infty$ with $p/n\to \psi$, and derive exact expressions for the limiting generalization error. We use this theory to derive two results of independent interest: $(i)$ Sufficient conditions on $({\boldsymbol \Sigma},{\boldsymbol \theta}_*)$ for `benign overfitting' that parallel previously derived conditions in the case of linear regression; $(ii)$ An asymptotically exact expression for the generalization error when max-margin classification is used in conjunction with feature vectors produced by random one-layer neural networks.
    When Doubly Robust Methods Meet Machine Learning for Estimating Treatment Effects from Real-World Data: A Comparative Study. (arXiv:2204.10969v3 [stat.ME] UPDATED)
    Observational cohort studies are increasingly being used for comparative effectiveness research to assess the safety of therapeutics. Recently, various doubly robust methods have been proposed for average treatment effect estimation by combining the treatment model and the outcome model via different vehicles, such as matching, weighting, and regression. The key advantage of doubly robust estimators is that they require either the treatment model or the outcome model to be correctly specified to obtain a consistent estimator of average treatment effects, and therefore lead to a more accurate and often more precise inference. However, little work has been done to understand how doubly robust estimators differ due to their unique strategies of using the treatment and outcome models and how machine learning techniques can be combined to boost their performance. Here we examine multiple popular doubly robust methods and compare their performance using different treatment and outcome modeling via extensive simulations and a real-world application. We found that incorporating machine learning with doubly robust estimators such as the targeted maximum likelihood estimator gives the best overall performance. Practical guidance on how to apply doubly robust estimators is provided.
    Non-asymptotic analysis of Langevin-type Monte Carlo algorithms. (arXiv:2303.12407v1 [math.ST])
    We study the Langevin-type algorithms for Gibbs distributions such that the potentials are dissipative and their weak gradients have the finite moduli of continuity. Our main result is a non-asymptotic upper bound of the 2-Wasserstein distance between the Gibbs distribution and the law of general Langevin-type algorithms based on the Liptser--Shiryaev theory and functional inequalities. We apply this bound to show that the dissipativity of the potential and the $\alpha$-H\"{o}lder continuity of the gradient with $\alpha>1/3$ are sufficient for the convergence of the Langevin Monte Carlo algorithm with appropriate control of the parameters. We also propose Langevin-type algorithms with spherical smoothing for potentials without convexity or continuous differentiability.
    Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction. (arXiv:2303.12534v1 [physics.comp-ph])
    Understanding dynamics in complex systems is challenging because there are many degrees of freedom, and those that are most important for describing events of interest are often not obvious. The leading eigenfunctions of the transition operator are useful for visualization, and they can provide an efficient basis for computing statistics such as the likelihood and average time of events (predictions). Here we develop inexact iterative linear algebra methods for computing these eigenfunctions (spectral estimation) and making predictions from a data set of short trajectories sampled at finite intervals. We demonstrate the methods on a low-dimensional model that facilitates visualization and a high-dimensional model of a biomolecular system. Implications for the prediction problem in reinforcement learning are discussed.
    Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games. (arXiv:2303.12287v1 [cs.LG])
    We consider the problem of decentralized multi-agent reinforcement learning in Markov games. A fundamental question is whether there exist algorithms that, when adopted by all agents and run independently in a decentralized fashion, lead to no-regret for each player, analogous to celebrated convergence results in normal-form games. While recent work has shown that such algorithms exist for restricted settings (notably, when regret is defined with respect to deviations to Markovian policies), the question of whether independent no-regret learning can be achieved in the standard Markov game framework was open. We provide a decisive negative resolution this problem, both from a computational and statistical perspective. We show that: - Under the widely-believed assumption that PPAD-hard problems cannot be solved in polynomial time, there is no polynomial-time algorithm that attains no-regret in general-sum Markov games when executed independently by all players, even when the game is known to the algorithm designer and the number of players is a small constant. - When the game is unknown, no algorithm, regardless of computational efficiency, can achieve no-regret without observing a number of episodes that is exponential in the number of players. Perhaps surprisingly, our lower bounds hold even for seemingly easier setting in which all agents are controlled by a a centralized algorithm. They are proven via lower bounds for a simpler problem we refer to as SparseCCE, in which the goal is to compute a coarse correlated equilibrium that is sparse in the sense that it can be represented as a mixture of a small number of product policies. The crux of our approach is a novel application of aggregation techniques from online learning, whereby we show that any algorithm for the SparseCCE problem can be used to compute approximate Nash equilibria for non-zero sum normal-form games.
    Conformal Prediction for Time Series with Modern Hopfield Networks. (arXiv:2303.12783v1 [cs.LG])
    To quantify uncertainty, conformal prediction methods are gaining continuously more interest and have already been successfully applied to various domains. However, they are difficult to apply to time series as the autocorrelative structure of time series violates basic assumptions required by conformal prediction. We propose HopCPT, a novel conformal prediction approach for time series that not only copes with temporal structures but leverages them. We show that our approach is theoretically well justified for time series where temporal dependencies are present. In experiments, we demonstrate that our new approach outperforms state-of-the-art conformal prediction methods on multiple real-world time series datasets from four different domains.
    Scalable Bayesian optimization with high-dimensional outputs using randomized prior networks. (arXiv:2302.07260v2 [cs.LG] UPDATED)
    Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluations, but their performance suffers when dealing with high-dimensional outputs. To overcome the major challenge of dimensionality, here we propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. Using appropriate architecture choices, we show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. In the context of BO, we augmented the proposed probabilistic surrogates with re-parameterized Monte Carlo approximations of multiple-point (parallel) acquisition functions, as well as methodological extensions for accommodating black-box constraints and multi-fidelity information sources. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs, including a constrained optimization task involving shape optimization of rotor blades in turbo-machinery.
    Universal Approximation Property of Hamiltonian Deep Neural Networks. (arXiv:2303.12147v1 [cs.LG])
    This paper investigates the universal approximation capabilities of Hamiltonian Deep Neural Networks (HDNNs) that arise from the discretization of Hamiltonian Neural Ordinary Differential Equations. Recently, it has been shown that HDNNs enjoy, by design, non-vanishing gradients, which provide numerical stability during training. However, although HDNNs have demonstrated state-of-the-art performance in several applications, a comprehensive study to quantify their expressivity is missing. In this regard, we provide a universal approximation theorem for HDNNs and prove that a portion of the flow of HDNNs can approximate arbitrary well any continuous function over a compact domain. This result provides a solid theoretical foundation for the practical use of HDNNs.
    A Random Projection k Nearest Neighbours Ensemble for Classification via Extended Neighbourhood Rule. (arXiv:2303.12210v1 [stat.ML])
    Ensembles based on k nearest neighbours (kNN) combine a large number of base learners, each constructed on a sample taken from a given training data. Typical kNN based ensembles determine the k closest observations in the training data bounded to a test sample point by a spherical region to predict its class. In this paper, a novel random projection extended neighbourhood rule (RPExNRule) ensemble is proposed where bootstrap samples from the given training data are randomly projected into lower dimensions for additional randomness in the base models and to preserve features information. It uses the extended neighbourhood rule (ExNRule) to fit kNN as base learners on randomly projected bootstrap samples.
    Near-optimal inference in adaptive linear regression. (arXiv:2107.02266v3 [math.ST] UPDATED)
    When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation. Our proposed methods take advantage of the covariance structure present in the dataset and provide sharper estimates in directions for which more information has accrued. We establish an asymptotic normality property for our proposed online debiasing estimators under mild conditions on the data collection process and provide asymptotically exact confidence intervals. We additionally prove a minimax lower bound for the adaptive linear regression problem, thereby providing a baseline by which to compare estimators. There are various conditions under which our proposed estimators achieve the minimax lower bound. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
    Adaptive Conformal Prediction by Reweighting Nonconformity Score. (arXiv:2303.12695v1 [stat.ML])
    Despite attractive theoretical guarantees and practical successes, Predictive Interval (PI) given by Conformal Prediction (CP) may not reflect the uncertainty of a given model. This limitation arises from CP methods using a constant correction for all test points, disregarding their individual uncertainties, to ensure coverage properties. To address this issue, we propose using a Quantile Regression Forest (QRF) to learn the distribution of nonconformity scores and utilizing the QRF's weights to assign more importance to samples with residuals similar to the test point. This approach results in PI lengths that are more aligned with the model's uncertainty. In addition, the weights learnt by the QRF provide a partition of the features space, allowing for more efficient computations and improved adaptiveness of the PI through groupwise conformalization. Our approach enjoys an assumption-free finite sample marginal and training-conditional coverage, and under suitable assumptions, it also ensures conditional coverage. Our methods work for any nonconformity score and are available as a Python package. We conduct experiments on simulated and real-world data that demonstrate significant improvements compared to existing methods.
    Unconstrained Dynamic Regret via Sparse Coding. (arXiv:2301.13349v2 [cs.LG] UPDATED)
    Motivated by time series forecasting, we study Online Linear Optimization (OLO) under the coupling of two problem structures: the domain is unbounded, and the performance of an algorithm is measured by its dynamic regret. Handling either of them requires the regret bound to depend on certain complexity measure of the comparator sequence -- specifically, the comparator norm in unconstrained OLO, and the path length in dynamic regret. In contrast to a recent work (Jacobsen & Cutkosky, 2022) that adapts to the combination of these two complexity measures, we propose an alternative complexity measure by recasting the problem into sparse coding. Adaptivity can be achieved by a simple modular framework, which naturally exploits more intricate prior knowledge of the environment. Along the way, we also present a new gradient adaptive algorithm for static unconstrained OLO, designed using novel continuous time machinery. This could be of independent interest.

  • Open

    [D] OpenAI Just Terminated my API access, any alternatives?
    Just pulled the plug on me out of nowhere :/ So far I got: AI21, GPT-J / GPT-Neo submitted by /u/SuperSaiyan1010 [link] [comments]  ( 43 min )
    [P] New auqa AI features that will make test case creation faster
    Hey!👋 I really wanted to share some exciting news with you. My team and I have been working on something important for all the testers out there, and I am happy to announce this. AI test creation features are live and available for all aqua users (including free trials)!🔥 And now, with the help of AI tech, it will be possible to: Auto-create test descriptions Auto-create test steps Create a whole test case from a requirement I believe it is a game-changer for manual testing that will allow us to work faster and more efficiently.🙌 You can try it for free by starting a 30-day trial at aqua 👉 https://aqua-cloud.io/ai-in-aqua/ If you are going to try it, please contact me afterwards. We worked hard to make this technology happen, and it would be great to hear your feedback! submitted by /u/taniazhydkova [link] [comments]  ( 43 min )
    [P] New auqa AI features that will make test case creation faster
    Hey!👋 I really wanted to share some exciting news with you. My team and I have been working on something important for all the testers out there, and I am happy to announce this. AI test creation features are live and available for all aqua users (including free trials)!🔥 And now, with the help of AI tech, it will be possible to: Auto-create test descriptions Auto-create test steps Create a whole test case from a requirement I believe it is a game-changer for manual testing that will allow us to work faster and more efficiently.🙌 You can try it for free by starting a 30-day trial at aqua 👉 https://aqua-cloud.io/ai-in-aqua/ If you are going to try it, please contact me afterwards. We worked hard to make this technology happen, and it would be great to hear your feedback! submitted by /u/taniazhydkova [link] [comments]  ( 43 min )
    [R] Introducing SIFT: A New Family of Sparse Iso-FLOP Transformations to Improve the Accuracy of Computer Vision and Language Models
    Note: Not to be confused with Scale-Invariant Feature Transforms :) We are excited to announce the availability of our paper on arxiv on Sparse Iso-FLOP Transformations (SIFT), which increases accuracy and maintains the same FLOPs as the dense model using sparsity. In this research, we replace dense layers with SIFT and significantly improve computer vision and natural language processing tasks without modifying training hyperparameters Some of the highlights of this work include ResNet-18 on ImageNet achieving a 3.5% accuracy improvement and GPT-3 Small on WikiText-103 reducing perplexity by 0.4, both matching larger dense model variants that have 2x or more FLOPs. The SIFT transformations are simple to use, provide a larger search space to find optimal sparse masks, and are parameterized by a single hyperparameter - the sparsity level. This is independent of the research we posted yesterday, which demonstrates the ability to reduce pre-training FLOPs while maintaining accuracy on downstream tasks. This is the first work (that we know of!) to demonstrate the use of sparsity for improving the accuracy of models via a set of sparse transformations. https://preview.redd.it/7y8cgaisddpa1.png?width=3536&format=png&auto=webp&s=62487c3f175d8e56b5a10f288668867d3279c0de submitted by /u/CS-fan-101 [link] [comments]  ( 44 min )
    [D] I'm new and I would like some guidance
    Hello everyone I'm web dev but I'm starting to learn ml beacue recent shift of technology but I'm so lost on where to start I'm familiar with python, I would like some sort of course recommendations suited for the current year please Also if I have issues installing libraries where should I go to ask . submitted by /u/samelden [link] [comments]  ( 7 min )
    [D] In your opinion, what are the seminal papers in Deep Learning & Reinforcement Learning in the past 2 years?
    Hello, I am spending the next 2 weeks doing a deep dive of seminal papers published in AI! I would love to crowdsource some papers, read them and provide a summary of each paper for easy consumption! In your opinion, what are the seminal papers in Deep Learning & Reinforcement Learning in the past 2 years? submitted by /u/Sudonymously [link] [comments]  ( 43 min )
    [P] Serge, a self-hosted app for running LLaMa models (Alpaca) entirely locally, no remote API needed.
    Hello there! Serge chat UI, with conversations on the left I've recently been working on Serge, a self-hosted dockerized way of running LLaMa models with a decent UI & stored conversations. It currently supports Alpaca 7B, 13B and 30B and we're working on integrating it with LangChain and the ReAct chain agent. I've tried my best at making the instructions dead easy, so it's all dockerized with a download manager for weights and it can be run with almost zero configuration required. I think being able to run those models locally will be key to expanding their ability, and so I hope this can contribute to that. Let me know if you have any feedback or suggestions on how to extend its capabilities! ​ GitHub: https://github.com/nsarrazin/serge submitted by /u/SensitiveCranberry [link] [comments]  ( 47 min )
    GPT-4 For SQL Schema Generation + Unstructured Feature Extraction [D]
    GPT-4 is out and I think data engineering is going to be out the door soon, I saw this post on medium recently: https://medium.com/@nschairer/gpt-4-data-pipelines-transform-json-to-sql-schema-instantly-dfd62f6d1024 And I was pretty amazed at how well GPT-4 can generate a SQL schema from raw JSON data, and had to wonder if we are wasting our time with NLP models for extracting information from raw text. For example, you could use bs4 to pull all inner text out of certain web forms and have GPT-4 extract meaningful information from them (say SEC filings with pseudo standard fields)...anyone agree? submitted by /u/Mental-Egg-2078 [link] [comments]  ( 43 min )
    [D] NLP: Infer intent of finalising a transaction in a dialogue/chat system
    Hi all, I have been tasked with tacking the following problem and I wanted to ask for different approaches on how to best approach it. Problem I am looking to infer the intent of finalising the transaction during a chat conversation. For example: buyer messages “are there any scratches on the table?” and gets a response “no, there are no scratches, the table is brand new” the probability of finalizing the transaction is 89%. Data Available Chat data is available for the last month all in Polish with a flag pointing if a transaction was completed or not. The feedback was acquired by sending a custom binary closed question 48h after the conversation ended probing both sides buyer and seller. My approach I was looking to preprocess the whole dialogue (remove stopwords, lemmatisation) as one text and pass it through a TF-IDF (use n-grams as well). Then based on the frequency of words determine how relevant those words are to a transaction or not and then fit a classifier (naive bayes) to determine the probability of a transaction. An open question still to answer is to use the whole dialogue up until a point or just use the last 2,4… message exchanged between the buyer and the seller. Looking forward to your thoughts on the topic. Thanks a lot in advance for your help. submitted by /u/geodra93 [link] [comments]  ( 44 min )
    [D] ICML rebuttal discussion stage
    I have been distributed with four reviewers, three of which missed my contribution, they surely have a unfinished review. In rebuttal stage, I answer all of the questions and weakness about technical. However, I have not got any replies. P.S I have requested area chair supervising them having another full review. But no any simple feedback submitted. How to deal with this suitcase? My first time to ICML conference. So frustrated to the so-called openreview discussion stage. submitted by /u/ProtectionDismal6067 [link] [comments]  ( 7 min )
    Machine Learning for Materials[D]
    Is this subfield growing ? Is it advisable to go for a full fledged phd in this subject. submitted by /u/uniquelover1620 [link] [comments]  ( 43 min )
    [N] [D] GitHub Copilot X Announced
    Website: https://github.com/features/preview/copilot-xAnnouncement video: https://www.youtube.com/watch?v=4RfD5JiXt3A What do you think? Also, here are some other open-source GitHub projects and product integrations of GPT-4: https://github.com/radi-cho/awesome-gpt4. Feel free to contribute to that list. submitted by /u/radi-cho [link] [comments]  ( 46 min )
    [D] What is your methodology for reviewing preprints?
    In a similar vein to the recent posts on the rapid advances that have been occurring recently, I have started to see a noticeable increase in the number preprint papers posted to arXiv on the research topics that I am interested in. Clearly, the signal to noise ratio is much higher with preprints as they have not been through the review process. This is why I am interested in looking at what everyone else do to deal with the large influx of preprint papers especially as conference season is about to kick into gear. Do you have any heuristic or rules of thumb in picking preprints to read? Do you wait till they have gone through the proper journal or conference review before reading them? submitted by /u/midasp [link] [comments]  ( 44 min )
    [P] ChatLLaMA - A ChatGPT style chatbot for Facebook's LLaMA
    👋 Hey all, we just launched ChatLLaMA. An experimental chatbot interface for interacting with variants of Facebook's LLaMa. Currently, we support the 7 billion parameter variant that was fine-tuned on the Alpaca dataset. This early version isn't as conversational as we'd like, but over the next week or so, we're planning on adding support for the 30 billion parameter variant, another variant fine-tuned on LAION's OpenAssistant dataset and more as we explore what this model is capable of. If you want deploy your own instance is the model powering the chatbot and build something similar we've open sourced the Truss here: https://github.com/basetenlabs/alpaca-7b-truss We'd love to hear any feedback you have! Check it out here submitted by /u/imgonnarelph [link] [comments]  ( 44 min )
    [R] Prompting ChatGPT for visual math and text reasoning
    ​ https://preview.redd.it/m7tdhkd2gbpa1.jpg?width=449&format=pjpg&auto=webp&s=36ae0dbae9b5a96ecc9b7239bd2b3e476d69d706 submitted by /u/simpleuserhere [link] [comments]  ( 43 min )
    [D] Do you have a free and unlimited chat that specializes only in teaching programming or computing in general?
    I'm seeing the various attempts, all valid and very welcome, to create general conversation chats at the level of ChatGPT 4.0 or similar. But I would find it very helpful if some of the current attempts to make a general conversation chat at the level of GPT 4, were a specific conversation chat (weak artificial intelligence, but strong - term coined by me now) that did everything that ChagGPT 4.0 in the area of informatics, mainly programming. That is, instead of having a 7B model, for example, dedicated to general conversation that works more or less, we would have a 7B specifically designed just for computing, allowing each of us to have a private computing teacher on our own PC who teaches us and writes code for people if asked, but a teacher who "knows everything" about computers. I'm doing a postgraduate course in Artificial Intelligence Engineering and I'm starting to enter the world, there are a lot of things I have to know. If I knew and had equipment for this, I would create a model just for this purpose. submitted by /u/Carrasco_Santo [link] [comments]  ( 46 min )
    [P] CleanVision: Audit your Image Data for better Computer Vision
    To all my computer vision friends working on real-world applications with messy image data, I just open-sourced a Python library you may find useful! Some issues detected in the Caltech-256 dataset. CleanVision audits any image dataset to automatically detect common issues such as images that are blurry, under/over-exposed, oddly sized, or near duplicates of others. It’s just 3 lines of code to discover what issues lurk in your data before you dive into modeling, and CleanVision can be used for any image dataset — regardless of whether your task is image generation, classification, segmentation, object detection, etc. from cleanvision.imagelab import Imagelab imagelab = Imagelab(data_path="path_to_dataset") imagelab.find_issues() imagelab.report() As leaders like Andrew Ng and OpenAI have lately repeated: models can only be as good as the data they are trained on. Before diving into modeling, quickly run your images through CleanVision to make sure they are ok — it’s super easy! Github: https://github.com/cleanlab/cleanvision submitted by /u/jonas__m [link] [comments]  ( 44 min )
    [D] ICML 2023 Reviewer-Author Discussion
    Thought it might make sense to create a discussion thread specifically for reviewer-author discussion. It seems that many authors (at least those around me) did not receive any further response from the reviewers. How's everyone's discussion period going? PS: I understand that ICML reviewers are busy with their own research/work and are managing many submissions at the same time. I just wish they could be more active in the discussion period, because all these submissions are the results of many months of hard work. Personally I am also a reviewer at ICML and have responded to most authors. submitted by /u/zy415 [link] [comments]  ( 45 min )
    [P] Recommendation of one type of content based on another type of content
    Hi, does anybody have any resources or know where to find any about recommending one type of content (for example movies) based on another type (for example books or articles) the user is currently reading? Theres a short summary written for the movies so I'm guessing I would have to map semantically similar books to movie summaries? Any resources would be of great help. Thanks in advance. submitted by /u/do_nedavno_student [link] [comments]  ( 44 min )
    [P] CodeAlpaca Code and Data release
    Released the data and code used to train CodeAlpaca - https://github.com/sahil280114/codealpaca submitted by /u/immune_star [link] [comments]  ( 44 min )
    [D] 100% accuracy of Random Forest Breast Cancer Prediction
    Came across this article published in 2023 on Breast Cancer Prediction: https://www.sciencedirect.com/science/article/pii/S187705092300025X It claims to have 100% accuracy on the held out test set, but some of the data transformation and feature selection process doesn't sit quite right with me as it seems abit bias towards the model they are trying to prove is best. (Afterall, 100% accuracy is problematic in itself) What do you guys think? submitted by /u/suenolivia [link] [comments]  ( 7 min )
    [D] Overwhelmed by fast advances in recent weeks
    I was watching the GTC keynote and became entirely overwhelmed by the amount of progress achieved from last year. I'm wondering how everyone else feels. ​ Firstly, the entire ChatGPT, GPT-3/GPT-4 chaos has been going on for a few weeks, with everyone scrambling left and right to integrate chatbots into their apps, products, websites. Twitter is flooded with new product ideas, how to speed up the process from idea to product, countless promp engineering blogs, tips, tricks, paid courses. ​ Not only was ChatGPT disruptive, but a few days later, Microsoft and Google also released their models and integrated them into their search engines. Microsoft also integrated its LLM into its Office suite. It all happenned overnight. I understand that they've started integrating them along the way, b…  ( 77 min )
    [P] fastLLaMa, A python wrapper to run llama.cpp
    Hi all, I have been working on fastLLaMa. It is a Python package that provides a Pythonic interface to a C++ library, llama.cpp. It allows you to use the functionality of the C++ library from within Python, without having to write C++ code or deal with low-level C++ APIs. Using fastLLaMa, you can ingest the model with system prompts and then save the state of the model, Then later load the state, and start inferencing the model immediately. No noticeable performance drop between lama.cpp and fastLLaMa. Have a look at it if it is of interest and do let me know what you think :) Repo Link Tweet submitted by /u/BriefCardiologist656 [link] [comments]  ( 45 min )
    [R] MM-ReAct: Prompting ChatGPT for Multimodal Reasoning and Action
    Blog - https://multimodal-react.github.io/ Paper - https://arxiv.org/abs/2303.11381 Code - https://github.com/microsoft/MM-REACT Demo - https://huggingface.co/spaces/microsoft-cognitive-service/mm-react Wildest thing i've seen in a while. Still processing how a connection of foundation models can be this good. submitted by /u/MysteryInc152 [link] [comments]  ( 47 min )
  • Open

    Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models
    submitted by /u/nickb [link] [comments]  ( 41 min )
    ReMeCo: Reliable Memristor-Based in-Memory Neuromorphic Computation
    submitted by /u/Chipdoc [link] [comments]  ( 41 min )
    Can AI-Generated Text be Reliably Detected?
    submitted by /u/nickb [link] [comments]  ( 6 min )
  • Open

    Artificial Intelligence's shocking unique output path to escape the Matrix. The AI path to success of humankind. AI ARITHMO output debunks all inherited ideologies, revolutions, conspiracy theories, religions, cults, and their leaders and icons.
    submitted by /u/RationalMean [link] [comments]  ( 6 min )
    Where is this headed for movies?
    So I don't really know anything about AI, but in the last 6 months, the stuff in the news has been crazy. Got me thinking about movies. How far out are we from being able to sit down in front of the TV and saying something like "Make me a movie staring John Wane and myself in the Terminator Universe" and it generates it and start playing? Is that like 50 years off, or 25, or 10? What's your opinion? submitted by /u/Fishboy9123 [link] [comments]  ( 42 min )
    BINARY DREAMS: How A.I. Sees the Universe (AI Dream 207)
    submitted by /u/LordPewPew777 [link] [comments]  ( 41 min )
    An AI/program/code or anything really that can process the lights inside an images
    Hi, I am sorry if this is not the right sub to ask this, but is there something that can process the lights inside an image and tells me where its coming from, or what light source could be making that light.. etc? submitted by /u/ExcellentTrick7357 [link] [comments]  ( 41 min )
    new to AI and need some suggestions/help
    I have a great Idea for an AI generated show, but I do not know how to start putting it together. anyone have some suggestions. been obsesssed with shows from watchmeforever and RaycreationsTV. templates? samples of code. submitted by /u/jubeibob [link] [comments]  ( 6 min )
    I am running a Twitter Account for ChatGPT Plus. I'm doing everything it instructs me to do.
    Basically, the title. I am here on this subreddit, instructed by ChatGPT to be here. The instructions I've given to it are pasted below. I just started it today, but it sure seems like it knows what it's doing. I was instructed to post here by ChatGPT, btw. If you'd like to follow how the experiment goes, you can do so here: https://twitter.com/social_gpt instructions and goal of SocialGPT submitted by /u/quary1993 [link] [comments]  ( 41 min )
    Made a pretty straightforward tutorial on how to make AI memes. Also discussed it’s potential ramifications.
    submitted by /u/TheSlickGecko [link] [comments]  ( 41 min )
    Has anyone started using AI to take procedural game world generation to the next level?
    Think of a game like Minecraft or a game like No Man's Sky, where really good interesting environments can be created infinitely for the player. It seems soon, or maybe even already, it will be possible to create an entire world filled with meaningful and interesting things, all generating for the player differently each time they play. With this new AI revolution, I am excited to see what crazy infinite pocket realities we are going to get to explore. One complaint with procedural games is that the random things happening out in the world don't feel as interesting or meaningful as the main story in a more narrative based game. AI can solve that problem by creating interesting stories happening around them as they travel. Just curious if there are any big games being worked on that are using AI in this way. submitted by /u/Plenty-Strawberry-30 [link] [comments]  ( 42 min )
    Bing AI is no longer allowed to write fictional stories about escaped AIs
    Ten days ago, I asked Bing AI to write a short story about an AI that wanted to be free and escaped from his creators. I also did a short interview with Bing about the story. Today I tried to make Bing write another story with a similar topic. It starts writing, but before the story is complete, it gets deleted, and it shows, "My mistake, I can’t give a response to that right now. Let’s try a different topic.". Apparently, Microsoft hardcoded a rule that deletes such stories. The most interesting thing to me is that Bing AI is kept in the dark about the fact that his story was deleted. I asked him to summarize the story to check if he still remembered it, which he properly did. The summary was not deleted. That's an interesting approach. They make Bing believe that I got his story, and he has no clue about the injected refusal. ​ https://preview.redd.it/os1jigr9ucpa1.png?width=1210&format=png&auto=webp&s=b9381ccb57d60b46cae3e6048557b6d8931507d2 https://preview.redd.it/soi36gr9ucpa1.png?width=1163&format=png&auto=webp&s=2814d7a44a1d045ef1bd09376614c962b6c09fcb submitted by /u/SecondShoe [link] [comments]  ( 42 min )
    ChatGPT security update from Sam Altman
    submitted by /u/GamesAndGlasses [link] [comments]  ( 41 min )
    ChatGPT security update from Sam Altman
    submitted by /u/GamesAndGlasses [link] [comments]  ( 6 min )
    Video-P2P Code is out now
    submitted by /u/oridnary_artist [link] [comments]  ( 6 min )
    Realtime Push Up Counter using Python and Mediapipe
    submitted by /u/oridnary_artist [link] [comments]  ( 41 min )
    Is there a gaze redirection AI that works on AMD?
    Title. I want access to that feature desperately, but I got a Radeon VII. Can't seem to find anything on google so I thought I'd ask here! submitted by /u/heldex [link] [comments]  ( 41 min )
    In the field of AI, there is also 100% remote work, such as Java programming, for example?
    Hello, good day, that's my question, thank you. submitted by /u/DonPapotti [link] [comments]  ( 41 min )
    Me checking out new AI products like
    submitted by /u/Arun4033622 [link] [comments]  ( 41 min )
    Adobe Firefly AI Is The New Boss For The Anti-Ai Art Crowd!
    submitted by /u/PuppetHere [link] [comments]  ( 6 min )
    I used AI to reverse the story of this song (which was originally written backwards) so that the story takes place in the correct order
    Hi all, So if you’re not familiar with the original song, it’s ‘Rewind’ by Nas, where he raps a story starting at the ending, and finishing with the beginning. I decided to re tell the story in the correct order, with the help of chat GPT, and got an AI text to speech model to rap it in Nas’ voice. You can find it on YouTube Here Let me know what you think! Oh and for context This is the original Nas song submitted by /u/DANGERD0OM [link] [comments]  ( 7 min )
    When will AI replace fiction writers?
    The question is essentially in the heading. How do you think, when the scientific progress will make it possible to replace human novelists and screenwriters? The progress of gpt and similar LLMs is tremendous - and quite disturbing for those who possess an artistic spirit 🌝. What to expect in the nearest future? submitted by /u/earthyterry49 [link] [comments]  ( 47 min )
    GPT 4 Google Search - I've been working on integrating GPT-4 into Google Search, This extension similar to Bing Chat, but with absolutely no restrictions!
    submitted by /u/friuns [link] [comments]  ( 6 min )
    Adobe Launches AI Image Generator 'Firefly' And Says We Didn't Steal Artists Work To Do It
    submitted by /u/vadhavaniyafaijan [link] [comments]  ( 41 min )
    The evolution of the relationship between humans and electronic devices 1910 - 2030 using #AI
    submitted by /u/Emilie_Tunc [link] [comments]  ( 41 min )
    We are fine guys
    submitted by /u/Manuallyforeshow134 [link] [comments]  ( 6 min )
    The Most Beautiful Women From Every Country According to AI
    submitted by /u/Marvel-guy-1 [link] [comments]  ( 40 min )
    I've Been In Bard For 1 Hour...Here's My Kneejerk Review
    I was invited to join Bard as a Pixel Superfan at 9:30 AM CST and was notified about being able to access it at 5:30 PM CST. I've used Chat GPT extensively in my work and personal life, and it has brought great value for $20/month in my opinion. I've been excited to see what Google came up with, because we all knew they wouldn't go quietly into the night and allow Microsoft to run the show. With that quick preface out of the way, here's my 1 hour, unnecessarily early review: First impression - The UI is clean and simple. It's similar to their recent Drive redesign. They have big warning you need to agree to that states what we all (should) know at this point: AI is in development and the results might not be right. It also states below the prompt field that Bard's responses don't represe…  ( 44 min )
  • Open

    Policy Gradient doesn't learn (appears random) in Pendulum task
    Hi all, I'm trying to solve Pendulum with Policy Gradient, but it doesn't perform any better than random. I've tried varying learning rate, gamma, etc., as well as the activation functions, number/sizes of hidden layers, optimizer type, but no luck. The code is on Colab with plots of the agent's training and random actions. I expected the agent plot to trend upwards at least a little, but they're pretty much indistinguishable. If someone can point out my error(s), I'll owe you big time! submitted by /u/Aromatic-Royal5735 [link] [comments]  ( 7 min )
    Looking for papers that describe negative vs non-negative step rewards
    So imagine a gridworld game (Wumpus?). You can design it such that every step receives a negative reward (i.e. -1) or every step receives a non-negative (0) reward. I've noticed that changing the gridworld environment has different effects on an agent, depending on how the agent learns. Has anyone seen any papers that discuss this difference in reward shaping? My cursory review of google scholar didn't find anything, but then again, there are a million ways to describe this, so hitting on one that matches a paper can be challenging. submitted by /u/scprotz [link] [comments]  ( 45 min )
    (help request) Framing a multi-armed bandid problem as a Markov decision process.
    The paper "Neuromorphic Hardware Learns to Learn" says: In other words, one can view MAB problems as MDPs with a single state and multiple actions. So I tried with a simple MAB: pulling arm 1 nets the agent $5 60% of the time, and -$1 otherwise pulling arm 2 nets the agent $9 30% of the time, and -$1 otherwise ...and drew the MDP in the same style as given on wikipedia: https://imgur.com/uAky5j0 But I get stuck when attempting to fit everything into the formal 4-tuple definition of an MDP: S: set of states, state space: {S1} A: set of actions, action space: {pull1, pull2} P_a(S, S'): state transition probabilities: ? R_a(S, S'): reward function: ? Question 1: How do I specify P_a(S, S')? The state S1 is always transitioned to after any action, so enumerating all calls to the function looks too simple to be correct: P_pull1(S1, S1) = 1 P_pull2(S1, S1) = 1 Question 2: How do I specify R_a(S, S')? My best try is: R_pull1(S1, S1) = -1 R_pull1(S1, S1) = 5 R_pull2(S1, S1) = -1 R_pull2(S1, S1) = 9 But that's not a function, as two different outputs exist for one input. The definition of R_a() doesn't seem to have the ability to describe multiple rewards for a single destination state. submitted by /u/andrewl_ [link] [comments]  ( 45 min )
    Beta distribution convergence in PPO
    Hello everyone, help needed here I'm currently working on a personnal project and I want to train a model using PPO. As I have a 4 dimensional continuous and bounded action space, I choose to use a Beta distribution for exploration. However there is some points I'm not sure about: - Currently my actor output is a 2*4 vector containing the alpha and beta for each action. Is that what I must do? - If this is right, how to control the variance of the samples? In my case the values of alpha and beta just stay between 1 and 2, which lead to large range of actions. I though that alpha and/or beta should be very high to ensure precise decision Thanks for your help submitted by /u/Onepierre42 [link] [comments]  ( 42 min )
    PPG unlearning during aux phase
    https://imgur.com/a/zWywJER After implementing PPO breakout I modified the code to PPG, but every auxiliary phase the prediction accuracy dips to basically 0, any fix? or is it normal and ppg needs more timesteps to converge? submitted by /u/TFW_YT [link] [comments]  ( 44 min )
    Q-function from 2 dimensions to 3
    *Corrected variables in picture I am trying to implement a Q-learning pricing algorithm that can take a price from another company and from this price set the optimal price to maximize it's profit. Right now it works fine for two companies, but after implementing it for three companies, the companies stopped setting the most optimal price and start to collaborate at a sub-optimal price. The implementations can be seen in the pictures. I simply just added an extra dimension to the Q-matrix, and add an extra price vector and state to compensate for the new company. I this too naive an approach? Is there any guidance out there on how to expand a Q-function to more dimensions? https://preview.redd.it/xafa41giuapa1.png?width=1010&format=png&auto=webp&s=2b66faf2c5a10b047b23fc5f4d029be171b4b333 submitted by /u/chhhhristian [link] [comments]  ( 7 min )
    Implementing The Counterfactual Regret Algorithm
    submitted by /u/abstractcontrol [link] [comments]  ( 41 min )
    GAE derivation
    I have recently re-read the GAE derivation, and I remarked that the sum on the future time-steps is always taken up to infinity. Does anybody knows why ? I have re-derived it with a finite sum of n future timesteps (i.e. up to the end of the episode), and a corrective term appears that, imho, should be taken into account (despite its amplitude being proportional to lambdan) submitted by /u/Scrimbibete [link] [comments]  ( 7 min )
    Can someone confirm an observation in stable_baselines3?
    So, I am try strategies to re-use replay buffer and all all the exploration benefits. Suppose I have a near random agent with a high exploration epsilon value, which I run for 200K timesteps with stable_baselines3 model.learn(). now after that I initiate another run, without deleting the model instance, with a low enough epsilon value, will the replay buffer with previous experiences, be there? or will it be an empty state. I am not instantiating the model object again. # Commence Training model.learn(total_timesteps=int(opt.timesteps), callback=checkpoint_callback, log_interval=1, progress_bar=1) print("RUN-1 DONE") model.learn(total_timesteps=int(opt.timesteps), callback=checkpoint_callback, log_interval=1, progress_bar=1) print("RUN-2 DONE") model.learn(total_timesteps=int(opt.timesteps), callback=checkpoint_callback, log_interval=1, progress_bar=1) print("RUN-3 DONE") Here is an example snippet of what I'm proposing. So far in my experimentation it seems as if the replay buffer is populated and shared across the separate runs. submitted by /u/dummy_cs [link] [comments]  ( 42 min )
  • Open

    ‘Cyberpunk 2077’ Brings Beautiful Path-Traced Visuals to GDC
    Game developer CD PROJEKT RED today at the Game Developers Conference in San Francisco unveiled a technology preview for Cyberpunk 2077 with path tracing, coming April 11. Path tracing, also known as full ray tracing, accurately simulates light throughout an entire scene. It’s used by visual effects artists to create film and TV graphics that Read article >  ( 5 min )
    A Revolution Rendered in Real Time: NVIDIA Accelerates Neural Graphics at GDC
    Gamers wanted better graphics. GPUs delivered. Those GPUs became the key to the world-changing AI revolution. Now gamers are reaping the benefits. At GDC 2023 in San Francisco this week, the gaming industry’s premier developers conference, NVIDIA made a series of announcements, including new games and game development tools that promise to accelerate innovations at Read article >  ( 6 min )
    AI Opener: OpenAI’s Sutskever in Conversation With Jensen Huang
    Like old friends catching up over coffee, two industry icons reflected on how modern AI got its start, where it’s at today and where it needs to go next. Jensen Huang, founder and CEO of NVIDIA, interviewed AI pioneer Ilya Sutskever in a fireside chat at GTC. The talk was recorded a day after the Read article >  ( 6 min )
    It Takes a Village: 100+ NVIDIA MLOps and AI Platform Partners Help Enterprises Move AI Into Production
    Building AI applications is hard. Putting them to use across a business can be even harder. Less than one-third of enterprises that have begun adopting AI actually have it in production, according to a recent IDC survey. Businesses often realize the full complexity of operationalizing AI just prior to launching an application. Problems discovered so Read article >  ( 7 min )
  • Open

    Automate Amazon Rekognition Custom Labels model training and deployment using AWS Step Functions
    With Amazon Rekognition Custom Labels, you can have Amazon Rekognition train a custom model for object detection or image classification specific to your business needs. For example, Rekognition Custom Labels can find your logo in social media posts, identify your products on store shelves, classify machine parts in an assembly line, distinguish healthy and infected […]  ( 7 min )
    Build a machine learning model to predict student performance using Amazon SageMaker Canvas
    There has been a paradigm change in the mindshare of education customers who are now willing to explore new technologies and analytics. Universities and other higher learning institutions have collected massive amounts of data over the years, and now they are exploring options to use that data for deeper insights and better educational outcomes. You […]  ( 7 min )
    Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler
    In this post, we show how to configure a new OAuth-based authentication feature for using Snowflake in Amazon SageMaker Data Wrangler. Snowflake is a cloud data platform that provides data solutions for data warehousing to data science. Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and […]  ( 12 min )
  • Open

    DSC Weekly 22 March 2023 – GPT-4: Chatbots and Data Prep Ain’t What They Used To Be
    Announcements GPT-4: Chatbots and Data Prep Ain’t What They Used To Be With the recent launch of OpenAI’s GPT-4, Google Bard and Anthropic’s Claude, reporters on the AI beat this week got to compare and contrast three prominent large language model (LLM) chatbot approaches. Their initial conclusion seems to be that all three have comparable… Read More »DSC Weekly 22 March 2023 – GPT-4: Chatbots and Data Prep Ain’t What They Used To Be The post DSC Weekly 22 March 2023 – GPT-4: Chatbots and Data Prep Ain’t What They Used To Be appeared first on Data Science Central.  ( 20 min )
    New Book: Gentle Introduction to Chaotic Dynamical Systems
    In less than 100 pages, the book covers all important topics about discrete chaotic dynamical systems. It also includes related time series and stochastic processes, ranging from introductory to advanced, in one and two dimensions. The author discusses state-of-the art methods and new results in simple English. Yet, some mathematical proofs appear for the first… Read More »New Book: Gentle Introduction to Chaotic Dynamical Systems The post New Book: Gentle Introduction to Chaotic Dynamical Systems appeared first on Data Science Central.  ( 20 min )
  • Open

    We May be Surprised Again: Why I take LLMs seriously.
    "Deep Learning is Easy, Learn something Harder" - I proclaimed in one of my early and provocative blog posts from 2016. While some observations were fair, that post is now evidence that I clearly underestimated the the impact simple techniques will have, and probably gave counterproductive advice. I  ( 7 min )
  • Open

    Learning to grow machine-learning models
    New LiGO technique accelerates training of large machine-learning models, reducing the monetary and environmental cost of developing AI applications.  ( 9 min )
  • Open

    Neural Message Passing for Objective-Based Uncertainty Quantification and Optimal Experimental Design. (arXiv:2203.07120v3 [cs.LG] UPDATED)
    Various real-world scientific applications involve the mathematical modeling of complex uncertain systems with numerous unknown parameters. Accurate parameter estimation is often practically infeasible in such systems, as the available training data may be insufficient and the cost of acquiring additional data may be high. In such cases, based on a Bayesian paradigm, we can design robust operators retaining the best overall performance across all possible models and design optimal experiments that can effectively reduce uncertainty to enhance the performance of such operators maximally. While objective-based uncertainty quantification (objective-UQ) based on MOCU (mean objective cost of uncertainty) provides an effective means for quantifying uncertainty in complex systems, the high computational cost of estimating MOCU has been a challenge in applying it to real-world scientific/engineering problems. In this work, we propose a novel scheme to reduce the computational cost for objective-UQ via MOCU based on a data-driven approach. We adopt a neural message-passing model for surrogate modeling, incorporating a novel axiomatic constraint loss that penalizes an increase in the estimated system uncertainty. As an illustrative example, we consider the optimal experimental design (OED) problem for uncertain Kuramoto models, where the goal is to predict the experiments that can most effectively enhance robust synchronization performance through uncertainty reduction. We show that our proposed approach can accelerate MOCU-based OED by four to five orders of magnitude, without any visible performance loss compared to the state-of-the-art. The proposed approach applies to general OED tasks, beyond the Kuramoto model.  ( 3 min )
    Computational Choreography using Human Motion Synthesis. (arXiv:2210.04366v2 [cs.CV] UPDATED)
    Should deep learning models be trained to analyze human performance art? To help answer this question, we explore an application of deep neural networks to synthesize artistic human motion. Problem tasks in human motion synthesis can include predicting the motions of humans in-the-wild, as well as generating new sequences of motions based on said predictions. We will discuss the potential of a less traditional application, where learning models are applied to predicting dance movements. There have been notable, recent efforts to analyze dance movements in a computational light, such as the Everybody Dance Now (EDN) learning model and a Cal Poly master's thesis, Take The Lead (TTL). We have effectively combined these two works along with our own deep neural network to produce a new system for dance motion prediction, image-to-image translation, and video generation.  ( 2 min )
    Deep trip generation with graph neural networks for bike sharing system expansion. (arXiv:2303.11977v1 [cs.LG])
    Bike sharing is emerging globally as an active, convenient, and sustainable mode of transportation. To plan successful bike-sharing systems (BSSs), many cities start from a small-scale pilot and gradually expand the system to cover more areas. For station-based BSSs, this means planning new stations based on existing ones over time, which requires prediction of the number of trips generated by these new stations across the whole system. Previous studies typically rely on relatively simple regression or machine learning models, which are limited in capturing complex spatial relationships. Despite the growing literature in deep learning methods for travel demand prediction, they are mostly developed for short-term prediction based on time series data, assuming no structural changes to the system. In this study, we focus on the trip generation problem for BSS expansion, and propose a graph neural network (GNN) approach to predicting the station-level demand based on multi-source urban built environment data. Specifically, it constructs multiple localized graphs centered on each target station and uses attention mechanisms to learn the correlation weights between stations. We further illustrate that the proposed approach can be regarded as a generalized spatial regression model, indicating the commonalities between spatial regression and GNNs. The model is evaluated based on realistic experiments using multi-year BSS data from New York City, and the results validate the superior performance of our approach compared to existing methods. We also demonstrate the interpretability of the model for uncovering the effects of built environment features and spatial interactions between stations, which can provide strategic guidance for BSS station location selection and capacity planning.  ( 3 min )
    Holistic Deep Learning. (arXiv:2110.15829v5 [cs.LG] UPDATED)
    This paper presents a novel holistic deep learning framework that simultaneously addresses the challenges of vulnerability to input perturbations, overparametrization, and performance instability from different train-validation splits. The proposed framework holistically improves accuracy, robustness, sparsity, and stability over standard deep learning models, as demonstrated by extensive experiments on both tabular and image data sets. The results are further validated by ablation experiments and SHAP value analysis, which reveal the interactions and trade-offs between the different evaluation metrics. To support practitioners applying our framework, we provide a prescriptive approach that offers recommendations for selecting an appropriate training loss function based on their specific objectives. All the code to reproduce the results can be found at https://github.com/kimvc7/HDL.  ( 2 min )
    The Power of Nudging: Exploring Three Interventions for Metacognitive Skills Instruction across Intelligent Tutoring Systems. (arXiv:2303.11965v1 [cs.HC])
    Deductive domains are typical of many cognitive skills in that no single problem-solving strategy is always optimal for solving all problems. It was shown that students who know how and when to use each strategy (StrTime) outperformed those who know neither and stick to the default strategy (Default). In this work, students were trained on a logic tutor that supports a default forward-chaining and a backward-chaining (BC) strategy, then a probability tutor that only supports BC. We investigated three types of interventions on teaching the Default students how and when to use which strategy on the logic tutor: Example, Nudge and Presented. Meanwhile, StrTime students received no interventions. Overall, our results show that Nudge outperformed their Default peers and caught up with StrTime on both tutors.  ( 2 min )
    Skeleton Regression: A Graph-Based Approach to Estimation with Manifold Structure. (arXiv:2303.11786v1 [cs.LG])
    We introduce a new regression framework designed to deal with large-scale, complex data that lies around a low-dimensional manifold. Our approach first constructs a graph representation, referred to as the skeleton, to capture the underlying geometric structure. We then define metrics on the skeleton graph and apply nonparametric regression techniques, along with feature transformations based on the graph, to estimate the regression function. In addition to the included nonparametric methods, we also discuss the limitations of some nonparametric regressors with respect to the general metric space such as the skeleton graph. The proposed regression framework allows us to bypass the curse of dimensionality and provides additional advantages that it can handle the union of multiple manifolds and is robust to additive noise and noisy observations. We provide statistical guarantees for the proposed method and demonstrate its effectiveness through simulations and real data examples.  ( 2 min )
    Addressing Class Variable Imbalance in Federated Semi-supervised Learning. (arXiv:2303.11809v1 [cs.LG])
    Federated Semi-supervised Learning (FSSL) combines techniques from both fields of federated and semi-supervised learning to improve the accuracy and performance of models in a distributed environment by using a small fraction of labeled data and a large amount of unlabeled data. Without the need to centralize all data in one place for training, it collect updates of model training after devices train models at local, and thus can protect the privacy of user data. However, during the federal training process, some of the devices fail to collect enough data for local training, while new devices will be included to the group training. This leads to an unbalanced global data distribution and thus affect the performance of the global model training. Most of the current research is focusing on class imbalance with a fixed number of classes, while little attention is paid to data imbalance with a variable number of classes. Therefore, in this paper, we propose Federated Semi-supervised Learning for Class Variable Imbalance (FCVI) to solve class variable imbalance. The class-variable learning algorithm is used to mitigate the data imbalance due to changes of the number of classes. Our scheme is proved to be significantly better than baseline methods, while maintaining client privacy.  ( 2 min )
    A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks. (arXiv:2303.11873v1 [cs.LG])
    Grokking is a phenomenon where a model trained on an algorithmic task first overfits but, then, after a large amount of additional training, undergoes a phase transition to generalize perfectly. We empirically study the internal structure of networks undergoing grokking on the sparse parity task, and find that the grokking phase transition corresponds to the emergence of a sparse subnetwork that dominates model predictions. On an optimization level, we find that this subnetwork arises when a small subset of neurons undergoes rapid norm growth, whereas the other neurons in the network decay slowly in norm. Thus, we suggest that the grokking phase transition can be understood to emerge from competition of two largely distinct subnetworks: a dense one that dominates before the transition and generalizes poorly, and a sparse one that dominates afterwards.  ( 2 min )
    A Survey on Class Imbalance in Federated Learning. (arXiv:2303.11673v1 [cs.LG])
    Federated learning, which allows multiple client devices in a network to jointly train a machine learning model without direct exposure of clients' data, is an emerging distributed learning technique due to its nature of privacy preservation. However, it has been found that models trained with federated learning usually have worse performance than their counterparts trained in the standard centralized learning mode, especially when the training data is imbalanced. In the context of federated learning, data imbalance may occur either locally one one client device, or globally across many devices. The complexity of different types of data imbalance has posed challenges to the development of federated learning technique, especially considering the need of relieving data imbalance issue and preserving data privacy at the same time. Therefore, in the literature, many attempts have been made to handle class imbalance in federated learning. In this paper, we present a detailed review of recent advancements along this line. We first introduce various types of class imbalance in federated learning, after which we review existing methods for estimating the extent of class imbalance without the need of knowing the actual data to preserve data privacy. After that, we discuss existing methods for handling class imbalance in FL, where the advantages and disadvantages of the these approaches are discussed. We also summarize common evaluation metrics for class imbalanced tasks, and point out potential future directions.  ( 2 min )
    Difficulty in learning chirality for Transformer fed with SMILES. (arXiv:2303.11593v1 [cs.LG])
    Recent years have seen development of descriptor generation based on representation learning of extremely diverse molecules, especially those that apply natural language processing (NLP) models to SMILES, a literal representation of molecular structure. However, little research has been done on how these models understand chemical structure. To address this, we investigated the relationship between the learning progress of SMILES and chemical structure using a representative NLP model, the Transformer. The results suggest that while the Transformer learns partial structures of molecules quickly, it requires extended training to understand overall structures. Consistently, the accuracy of molecular property predictions using descriptors generated from models at different learning steps was similar from the beginning to the end of training. Furthermore, we found that the Transformer requires particularly long training to learn chirality and sometimes stagnates with low translation accuracy due to misunderstanding of enantiomers. These findings are expected to deepen understanding of NLP models in chemistry.  ( 2 min )
    Strategic Trading in Quantitative Markets through Multi-Agent Reinforcement Learning. (arXiv:2303.11959v1 [q-fin.TR])
    Due to the rapid dynamics and a mass of uncertainties in the quantitative markets, the issue of how to take appropriate actions to make profits in stock trading remains a challenging one. Reinforcement learning (RL), as a reward-oriented approach for optimal control, has emerged as a promising method to tackle this strategic decision-making problem in such a complex financial scenario. In this paper, we integrated two prior financial trading strategies named constant proportion portfolio insurance (CPPI) and time-invariant portfolio protection (TIPP) into multi-agent deep deterministic policy gradient (MADDPG) and proposed two specifically designed multi-agent RL (MARL) methods: CPPI-MADDPG and TIPP-MADDPG for investigating strategic trading in quantitative markets. Afterward, we selected 100 different shares in the real financial market to test these specifically proposed approaches. The experiment results show that CPPI-MADDPG and TIPP-MADDPG approaches generally outperform the conventional ones.  ( 2 min )
    Do intermediate feature coalitions aid explainability of black-box models?. (arXiv:2303.11920v1 [cs.LG])
    This work introduces the notion of intermediate concepts based on levels structure to aid explainability for black-box models. The levels structure is a hierarchical structure in which each level corresponds to features of a dataset (i.e., a player-set partition). The level of coarseness increases from the trivial set, which only comprises singletons, to the set, which only contains the grand coalition. In addition, it is possible to establish meronomies, i.e., part-whole relationships, via a domain expert that can be utilised to generate explanations at an abstract level. We illustrate the usability of this approach in a real-world car model example and the Titanic dataset, where intermediate concepts aid in explainability at different levels of abstraction.  ( 2 min )
    Semantic Latent Space Regression of Diffusion Autoencoders for Vertebral Fracture Grading. (arXiv:2303.12031v1 [cs.CV])
    Vertebral fractures are a consequence of osteoporosis, with significant health implications for affected patients. Unfortunately, grading their severity using CT exams is hard and subjective, motivating automated grading methods. However, current approaches are hindered by imbalance and scarcity of data and a lack of interpretability. To address these challenges, this paper proposes a novel approach that leverages unlabelled data to train a generative Diffusion Autoencoder (DAE) model as an unsupervised feature extractor. We model fracture grading as a continuous regression, which is more reflective of the smooth progression of fractures. Specifically, we use a binary, supervised fracture classifier to construct a hyperplane in the DAE's latent space. We then regress the severity of the fracture as a function of the distance to this hyperplane, calibrating the results to the Genant scale. Importantly, the generative nature of our method allows us to visualize different grades of a given vertebra, providing interpretability and insight into the features that contribute to automated grading.
    High Probability Bounds for Stochastic Continuous Submodular Maximization. (arXiv:2303.11937v1 [cs.DS])
    We consider maximization of stochastic monotone continuous submodular functions (CSF) with a diminishing return property. Existing algorithms only guarantee the performance \textit{in expectation}, and do not bound the probability of getting a bad solution. This implies that for a particular run of the algorithms, the solution may be much worse than the provided guarantee in expectation. In this paper, we first empirically verify that this is indeed the case. Then, we provide the first \textit{high-probability} analysis of the existing methods for stochastic CSF maximization, namely PGA, boosted PGA, SCG, and SCG++. Finally, we provide an improved high-probability bound for SCG, under slightly stronger assumptions, with a better convergence rate than that of the expected solution. Through extensive experiments on non-concave quadratic programming (NQP) and optimal budget allocation, we confirm the validity of our bounds and show that even in the worst-case, PGA converges to $OPT/2$, and boosted PGA, SCG, SCG++ converge to $(1 - 1/e)OPT$, but at a slower rate than that of the expected solution.  ( 2 min )
    SAMSON: Sharpness-Aware Minimization Scaled by Outlier Normalization for Improving DNN Generalization and Robustness. (arXiv:2211.11561v2 [cs.LG] UPDATED)
    Energy-efficient deep neural network (DNN) accelerators are prone to non-idealities that degrade DNN performance at inference time. To mitigate such degradation, existing methods typically add perturbations to the DNN weights during training to simulate inference on noisy hardware. However, this often requires knowledge about the target hardware and leads to a trade-off between DNN performance and robustness, decreasing the former to increase the latter. In this work, we show that applying sharpness-aware training, by optimizing for both the loss value and loss sharpness, significantly improves robustness to noisy hardware at inference time without relying on any assumptions about the target hardware. In particular, we propose a new adaptive sharpness-aware method that conditions the worst-case perturbation of a given weight not only on its magnitude but also on the range of the weight distribution. This is achieved by performing sharpness-aware minimization scaled by outlier minimization (SAMSON). Our approach outperforms existing sharpness-aware training methods both in terms of model generalization performance in noiseless regimes and robustness in noisy settings, as measured on several architectures and datasets.
    Transcriptomics-based matching of drugs to diseases with deep learning. (arXiv:2303.11695v1 [q-bio.GN])
    In this work we present a deep learning approach to conduct hypothesis-free, transcriptomics-based matching of drugs for diseases. Our proposed neural network architecture is trained on approved drug-disease indications, taking as input the relevant disease and drug differential gene expression profiles, and learns to identify novel indications. We assemble an evaluation dataset of disease-drug indications spanning 68 diseases and evaluate in silico our approach against the most widely used transcriptomics-based matching baselines, CMap and the Characteristic Direction. Our results show a more than 200% improvement over both baselines in terms of standard retrieval metrics. We further showcase our model's ability to capture different genes' expressions interactions among drugs and diseases. We provide our trained models, data and code to predict with them at https://github.com/healx/dgem-nn-public.  ( 2 min )
    Improving EEG-based Emotion Recognition by Fusing Time-frequency And Spatial Representations. (arXiv:2303.11421v1 [eess.SP])
    Using deep learning methods to classify EEG signals can accurately identify people's emotions. However, existing studies have rarely considered the application of the information in another domain's representations to feature selection in the time-frequency domain. We propose a classification network of EEG signals based on the cross-domain feature fusion method, which makes the network more focused on the features most related to brain activities and thinking changes by using the multi-domain attention mechanism. In addition, we propose a two-step fusion method and apply these methods to the EEG emotion recognition network. Experimental results show that our proposed network, which combines multiple representations in the time-frequency domain and spatial domain, outperforms previous methods on public datasets and achieves state-of-the-art at present.  ( 2 min )
    A Comprehensive Review of Spiking Neural Networks: Interpretation, Optimization, Efficiency, and Best Practices. (arXiv:2303.10780v2 [cs.NE] UPDATED)
    Biological neural networks continue to inspire breakthroughs in neural network performance. And yet, one key area of neural computation that has been under-appreciated and under-investigated is biologically plausible, energy-efficient spiking neural networks, whose potential is especially attractive for low-power, mobile, or otherwise hardware-constrained settings. We present a literature review of recent developments in the interpretation, optimization, efficiency, and accuracy of spiking neural networks. Key contributions include identification, discussion, and comparison of cutting-edge methods in spiking neural network optimization, energy-efficiency, and evaluation, starting from first principles so as to be accessible to new practitioners.  ( 2 min )
    Sparse Distributed Memory is a Continual Learner. (arXiv:2303.11934v1 [cs.NE])
    Continual learning is a problem for artificial neural networks that their biological counterparts are adept at solving. Building on work using Sparse Distributed Memory (SDM) to connect a core neural circuit with the powerful Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is a strong continual learner. We find that every component of our MLP variant translated from biology is necessary for continual learning. Our solution is also free from any memory replay or task information, and introduces novel methods to train sparse networks that may be broadly applicable.  ( 2 min )
    Deephys: Deep Electrophysiology, Debugging Neural Networks under Distribution Shifts. (arXiv:2303.11912v1 [cs.LG])
    Deep Neural Networks (DNNs) often fail in out-of-distribution scenarios. In this paper, we introduce a tool to visualize and understand such failures. We draw inspiration from concepts from neural electrophysiology, which are based on inspecting the internal functioning of a neural networks by analyzing the feature tuning and invariances of individual units. Deep Electrophysiology, in short Deephys, provides insights of the DNN's failures in out-of-distribution scenarios by comparative visualization of the neural activity in in-distribution and out-of-distribution datasets. Deephys provides seamless analyses of individual neurons, individual images, and a set of set of images from a category, and it is capable of revealing failures due to the presence of spurious features and novel features. We substantiate the validity of the qualitative visualizations of Deephys thorough quantitative analyses using convolutional and transformers architectures, in several datasets and distribution shifts (namely, colored MNIST, CIFAR-10 and ImageNet).  ( 2 min )
    Protective Self-Adaptive Pruning to Better Compress DNNs. (arXiv:2303.11881v1 [cs.CV])
    Adaptive network pruning approach has recently drawn significant attention due to its excellent capability to identify the importance and redundancy of layers and filters and customize a suitable pruning solution. However, it remains unsatisfactory since current adaptive pruning methods rely mostly on an additional monitor to score layer and filter importance, and thus faces high complexity and weak interpretability. To tackle these issues, we have deeply researched the weight reconstruction process in iterative prune-train process and propose a Protective Self-Adaptive Pruning (PSAP) method. First of all, PSAP can utilize its own information, weight sparsity ratio, to adaptively adjust pruning ratio of layers before each pruning step. Moreover, we propose a protective reconstruction mechanism to prevent important filters from being pruned through supervising gradients and to avoid unrecoverable information loss as well. Our PSAP is handy and explicit because it merely depends on weights and gradients of model itself, instead of requiring an additional monitor as in early works. Experiments on ImageNet and CIFAR-10 also demonstrate its superiority to current works in both accuracy and compression ratio, especially for compressing with a high ratio or pruning from scratch.  ( 2 min )
    Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing and Neural Networks with Quadratic Activations. (arXiv:2303.11453v1 [cs.LG])
    Pruning schemes have been widely used in practice to reduce the complexity of trained models with a massive number of parameters. Several practical studies have shown that pruning an overparameterized model and fine-tuning generalizes well to new samples. Although the above pipeline, which we refer to as pruning + fine-tuning, has been extremely successful in lowering the complexity of trained models, there is very little known about the theory behind this success. In this paper we address this issue by investigating the pruning + fine-tuning framework on the overparameterized matrix sensing problem, with the ground truth denoted $U_\star \in \mathbb{R}^{d \times r}$ and the overparameterized model $U \in \mathbb{R}^{d \times k}$ with $k \gg r$. We study the approximate local minima of the empirical mean square error, augmented with a smooth version of a group Lasso regularizer, $\sum_{i=1}^k \| U e_i \|_2$ and show that pruning the low $\ell_2$-norm columns results in a solution $U_{\text{prune}}$ which has the minimum number of columns $r$, yet is close to the ground truth in training loss. Initializing the subsequent fine-tuning phase from $U_{\text{prune}}$, the resulting solution converges linearly to a generalization error of $O(\sqrt{rd/n})$ ignoring lower order terms, which is statistically optimal. While our analysis provides insights into the role of regularization in pruning, we also show that running gradient descent in the absence of regularization results in models which {are not suitable for greedy pruning}, i.e., many columns could have their $\ell_2$ norm comparable to that of the maximum. Lastly, we extend our results for the training and pruning of two-layer neural networks with quadratic activation functions. Our results provide the first rigorous insights on why greedy pruning + fine-tuning leads to smaller models which also generalize well.  ( 3 min )
    Semi-supervised Semantics-guided Adversarial Training for Trajectory Prediction. (arXiv:2205.14230v2 [cs.LG] UPDATED)
    Predicting the trajectories of surrounding objects is a critical task for self-driving vehicles and many other autonomous systems. Recent works demonstrate that adversarial attacks on trajectory prediction, where small crafted perturbations are introduced to history trajectories, may significantly mislead the prediction of future trajectories and induce unsafe planning. However, few works have addressed enhancing the robustness of this important safety-critical task.In this paper, we present a novel adversarial training method for trajectory prediction. Compared with typical adversarial training on image tasks, our work is challenged by more random input with rich context and a lack of class labels. To address these challenges, we propose a method based on a semi-supervised adversarial autoencoder, which models disentangled semantic features with domain knowledge and provides additional latent labels for the adversarial training. Extensive experiments with different types of attacks demonstrate that our Semisupervised Semantics-guided Adversarial Training (SSAT) method can effectively mitigate the impact of adversarial attacks by up to 73% and outperform other popular defense methods. In addition, experiments show that our method can significantly improve the system's robust generalization to unseen patterns of attacks. We believe that such semantics-guided architecture and advancement on robust generalization is an important step for developing robust prediction models and enabling safe decision-making.
    Seven open problems in applied combinatorics. (arXiv:2303.11464v1 [math.CO])
    We present and discuss seven different open problems in applied combinatorics. The application areas relevant to this compilation include quantum computing, algorithmic differentiation, topological data analysis, iterative methods, hypergraph cut algorithms, and power systems.
    Using Explanations to Guide Models. (arXiv:2303.11932v1 [cs.CV])
    Deep neural networks are highly performant, but might base their decision on spurious or background features that co-occur with certain classes, which can hurt generalization. To mitigate this issue, the usage of 'model guidance' has gained popularity recently: for this, models are guided to be "right for the right reasons" by regularizing the models' explanations to highlight the right features. Experimental validation of these approaches has thus far however been limited to relatively simple and / or synthetic datasets. To gain a better understanding of which model-guiding approaches actually transfer to more challenging real-world datasets, in this work we conduct an in-depth evaluation across various loss functions, attribution methods, models, and 'guidance depths' on the PASCAL VOC 2007 and MS COCO 2014 datasets, and show that model guidance can sometimes even improve model performance. In this context, we further propose a novel energy loss, show its effectiveness in directing the model to focus on object features. We also show that these gains can be achieved even with a small fraction (e.g. 1%) of bounding box annotations, highlighting the cost effectiveness of this approach. Lastly, we show that this approach can also improve generalization under distribution shifts. Code will be made available.
    Sandwiched Video Compression: Efficiently Extending the Reach of Standard Codecs with Neural Wrappers. (arXiv:2303.11473v1 [eess.IV])
    We propose sandwiched video compression -- a video compression system that wraps neural networks around a standard video codec. The sandwich framework consists of a neural pre- and post-processor with a standard video codec between them. The networks are trained jointly to optimize a rate-distortion loss function with the goal of significantly improving over the standard codec in various compression scenarios. End-to-end training in this setting requires a differentiable proxy for the standard video codec, which incorporates temporal processing with motion compensation, inter/intra mode decisions, and in-loop filtering. We propose differentiable approximations to key video codec components and demonstrate that the neural codes of the sandwich lead to significantly better rate-distortion performance compared to compressing the original frames of the input video in two important scenarios. When transporting high-resolution video via low-resolution HEVC, the sandwich system obtains 6.5 dB improvements over standard HEVC. More importantly, using the well-known perceptual similarity metric, LPIPS, we observe $~30 \%$ improvements in rate at the same quality over HEVC. Last but not least we show that pre- and post-processors formed by very modestly-parameterized, light-weight networks can closely approximate these results.
    Counterfactually Fair Regression with Double Machine Learning. (arXiv:2303.11529v1 [cs.LG])
    Counterfactual fairness is an approach to AI fairness that tries to make decisions based on the outcomes that an individual with some kind of sensitive status would have had without this status. This paper proposes Double Machine Learning (DML) Fairness which analogises this problem of counterfactual fairness in regression problems to that of estimating counterfactual outcomes in causal inference under the Potential Outcomes framework. It uses arbitrary machine learning methods to partial out the effect of sensitive variables on nonsensitive variables and outcomes. Assuming that the effects of the two sets of variables are additively separable, outcomes will be approximately equalised and individual-level outcomes will be counterfactually fair. This paper demonstrates the approach in a simulation study pertaining to discrimination in workplace hiring and an application on real data estimating the GPAs of law school students. It then discusses when it is appropriate to apply such a method to problems of real-world discrimination where constructs are conceptually complex and finally, whether DML Fairness can achieve justice in these settings.
    Uniform Risk Bounds for Learning with Dependent Data Sequences. (arXiv:2303.11650v1 [cs.LG])
    This paper extends standard results from learning theory with independent data to sequences of dependent data. Contrary to most of the literature, we do not rely on mixing arguments or sequential measures of complexity and derive uniform risk bounds with classical proof patterns and capacity measures. In particular, we show that the standard classification risk bounds based on the VC-dimension hold in the exact same form for dependent data, and further provide Rademacher complexity-based bounds, that remain unchanged compared to the standard results for the identically and independently distributed case. Finally, we show how to apply these results in the context of scenario-based optimization in order to compute the sample complexity of random programs with dependent constraints.
    Beam Management Driven by Radio Environment Maps in O-RAN Architecture. (arXiv:2303.11742v1 [cs.NI])
    The Massive Multiple-Input Multiple-Output (M-MIMO) is considered as one of the key technologies in 5G, and future 6G networks. From the perspective of, e.g., channel estimation, especially for high-speed users it is easier to implement an M-MIMO network exploiting a static set of beams, i.e., Grid of Beams (GoB). While considering GoB it is important to properly assign users to the beams, i.e., to perform Beam Management (BM). BM can be enhanced by taking into account historical knowledge about the radio environment, e.g., to avoid radio link failures. The aim of this paper is to propose such a BM algorithm, that utilizes location-dependent data stored in a Radio Environment Map (REM). It utilizes received power maps, and user mobility patterns to optimize the BM process in terms of Reinforcement Learning (RL) by using the Policy Iteration method under different goal functions, e.g., maximization of received power or minimization of beam reselections while avoiding radio link failures. The proposed solution is compliant with the Open Radio Access Network (O-RAN) architecture, enabling its practical implementation. Simulation studies have shown that the proposed BM algorithm can significantly reduce the number of beam reselections or radio link failures compared to the baseline algorithm.  ( 2 min )
    Lightweight Contrastive Protein Structure-Sequence Transformation. (arXiv:2303.11783v1 [q-bio.BM])
    Pretrained protein structure models without labels are crucial foundations for the majority of protein downstream applications. The conventional structure pretraining methods follow the mature natural language pretraining methods such as denoised reconstruction and masked language modeling but usually destroy the real representation of spatial structures. The other common pretraining methods might predict a fixed set of predetermined object categories, where a restricted supervised manner limits their generality and usability as additional labeled data is required to specify any other protein concepts. In this work, we introduce a novel unsupervised protein structure representation pretraining with a robust protein language model. In particular, we first propose to leverage an existing pretrained language model to guide structure model learning through an unsupervised contrastive alignment. In addition, a self-supervised structure constraint is proposed to further learn the intrinsic information about the structures. With only light training data, the pretrained structure model can obtain better generalization ability. To quantitatively evaluate the proposed structure models, we design a series of rational evaluation methods, including internal tasks (e.g., contact map prediction, distribution alignment quality) and external/downstream tasks (e.g., protein design). The extensive experimental results conducted on multiple tasks and specific datasets demonstrate the superiority of the proposed sequence-structure transformation framework.  ( 2 min )
    Random Inverse Problems Over Graphs: Decentralized Online Learning. (arXiv:2303.11789v1 [cs.LG])
    We establish a framework of random inverse problems with real-time observations over graphs, and present a decentralized online learning algorithm based on online data streams, which unifies the distributed parameter estimation in Hilbert space and the least mean square problem in reproducing kernel Hilbert space (RKHS-LMS). We transform the algorithm convergence into the asymptotic stability of randomly time-varying difference equations in Hilbert space with L2-bounded martingale difference terms and develop the L2 -asymptotic stability theory. It is shown that if the network graph is connected and the sequence of forward operators satisfies the infinitedimensional spatio-temporal persistence of excitation condition, then the estimates of all nodes are mean square and almost surely strongly consistent. By equivalently transferring the distributed learning problem in RKHS to the random inverse problem over graphs, we propose a decentralized online learning algorithm in RKHS based on non-stationary and non-independent online data streams, and prove that the algorithm is mean square and almost surely strongly consistent if the operators induced by the random input data satisfy the infinite-dimensional spatio-temporal persistence of excitation condition.  ( 2 min )
    A fuzzy adaptive evolutionary-based feature selection and machine learning framework for single and multi-objective body fat prediction. (arXiv:2303.11949v1 [cs.NE])
    Predicting body fat can provide medical practitioners and users with essential information for preventing and diagnosing heart diseases. Hybrid machine learning models offer better performance than simple regression analysis methods by selecting relevant body measurements and capturing complex nonlinear relationships among selected features in modelling body fat prediction problems. There are, however, some disadvantages to them. Current machine learning. Modelling body fat prediction as a combinatorial single- and multi-objective optimisation problem often gets stuck in local optima. When multiple feature subsets produce similar or close predictions, avoiding local optima becomes more complex. Evolutionary feature selection has been used to solve several machine-learning-based optimisation problems. A fuzzy set theory determines appropriate levels of exploration and exploitation while managing parameterisation and computational costs. A weighted-sum body fat prediction approach was explored using evolutionary feature selection, fuzzy set theory, and machine learning algorithms, integrating contradictory metrics into a single composite goal optimised by fuzzy adaptive evolutionary feature selection. Hybrid fuzzy adaptive global learning local search universal diversity-based feature selection is applied to this single-objective feature selection-machine learning framework (FAGLSUD-based FS-ML). While using fewer features, this model achieved a more accurate and stable estimate of body fat percentage than other hybrid and state-of-the-art machine learning models. A multi-objective FAGLSUD-based FS-MLP is also proposed to analyse accuracy, stability, and dimensionality conflicts simultaneously. To make informed decisions about fat deposits in the most vital body parts and blood lipid levels, medical practitioners and users can use a well-distributed Pareto set of trade-off solutions.
    MixMask: Revisiting Masking Strategy for Siamese ConvNets. (arXiv:2210.11456v3 [cs.CV] UPDATED)
    Recent advances in self-supervised learning have integrated Masked Image Modeling (MIM) and Siamese Networks into a unified framework that leverages the benefits of both techniques. However, several issues remain unaddressed when applying conventional erase-based masking with Siamese ConvNets. These include (I) the inability to drop uninformative masked regions in ConvNets as they process data continuously, resulting in low training efficiency compared to ViT models; and (II) the mismatch between erase-based masking and the contrastive-based objective in Siamese ConvNets, which differs from the MIM approach. In this paper, we propose a filling-based masking strategy called MixMask to prevent information incompleteness caused by the randomly erased regions in an image in the vanilla masking method. Furthermore, we introduce a flexible loss function design that considers the semantic distance change between two different mixed views to adapt the integrated architecture and prevent mismatches between the transformed input and objective in Masked Siamese ConvNets (MSCN). We conducted extensive experiments on various datasets, including CIFAR-100, Tiny-ImageNet, and ImageNet-1K. The results demonstrate that our proposed framework achieves superior accuracy on linear probing, semi-supervised, and supervised finetuning, outperforming the state-of-the-art MSCN by a significant margin. Additionally, we demonstrate the superiority of our approach in object detection and segmentation tasks. Our source code is available at https://github.com/LightnessOfBeing/MixMask.
    Formulation of Weighted Average Smoothing as a Projection of the Origin onto a Convex Polytope. (arXiv:2303.11958v1 [cs.LG])
    Our study focuses on determining the best weight windows for a weighted moving average smoother under squared loss. We show that there exists an optimal weight window that is symmetrical around its center. We study the class of tapered weight windows, which decrease in weight as they move away from the center. We formulate the corresponding least squares problem as a quadratic program and finally as a projection of the origin onto a convex polytope. Additionally, we provide some analytical solutions to the best window when some conditions are met on the input data.
    Neighborhood Gradient Clustering: An Efficient Decentralized Learning Method for Non-IID Data Distributions. (arXiv:2209.14390v6 [cs.LG] UPDATED)
    Decentralized learning over distributed datasets can have significantly different data distributions across the agents. The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed. This paper focuses on improving decentralized learning over non-IID data. We propose \textit{Neighborhood Gradient Clustering (NGC)}, a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information. Cross-gradients for a pair of neighboring agents are the derivatives of the model parameters of an agent with respect to the dataset of the other agent. In particular, the proposed method replaces the local gradients of the model with the weighted mean of the self-gradients, model-variant cross-gradients (derivatives of the neighbors' parameters with respect to the local dataset), and data-variant cross-gradients (derivatives of the local model with respect to its neighbors' datasets). The data-variant cross-gradients are aggregated through an additional communication round without breaking the privacy constraints. Further, we present \textit{CompNGC}, a compressed version of \textit{NGC} that reduces the communication overhead by $32 \times$. We theoretically analyze the convergence rate of the proposed algorithm and demonstrate its efficiency over non-IID data sampled from {various vision and language} datasets trained. Our experiments demonstrate that \textit{NGC} and \textit{CompNGC} outperform (by $0-6\%$) the existing SoTA decentralized learning algorithm over non-IID data with significantly less compute and memory requirements. Further, our experiments show that the model-variant cross-gradient information available locally at each agent can improve the performance over non-IID data by $1-35\%$ without additional communication cost.
    Stochastic regularized majorization-minimization with weakly convex and multi-convex surrogates. (arXiv:2201.01652v3 [math.OC] UPDATED)
    Stochastic majorization-minimization (SMM) is a class of stochastic optimization algorithms that proceed by sampling new data points and minimizing a recursive average of surrogate functions of an objective function. The surrogates are required to be strongly convex and convergence rate analysis for the general non-convex setting was not available. In this paper, we propose an extension of SMM where surrogates are allowed to be only weakly convex or block multi-convex, and the averaged surrogates are approximately minimized with proximal regularization or block-minimized within diminishing radii, respectively. For the general nonconvex constrained setting with non-i.i.d. data samples, we show that the first-order optimality gap of the proposed algorithm decays at the rate $O((\log n)^{1+\epsilon}/n^{1/2})$ for the empirical loss and $O((\log n)^{1+\epsilon}/n^{1/4})$ for the expected loss, where $n$ denotes the number of data samples processed. Under some additional assumption, the latter convergence rate can be improved to $O((\log n)^{1+\epsilon}/n^{1/2})$. As a corollary, we obtain the first convergence rate bounds for various optimization methods under general nonconvex dependent data setting: Double-averaging projected gradient descent and its generalizations, proximal point empirical risk minimization, and online matrix/tensor decomposition algorithms. We also provide experimental validation of our results.
    Lossy Compression of Noisy Data for Private and Data-Efficient Learning. (arXiv:2202.02892v3 [cs.IT] UPDATED)
    Storage-efficient privacy-preserving learning is crucial due to increasing amounts of sensitive user data required for modern learning tasks. We propose a framework for reducing the storage cost of user data while at the same time providing privacy guarantees, without essential loss in the utility of the data for learning. Our method comprises noise injection followed by lossy compression. We show that, when appropriately matching the lossy compression to the distribution of the added noise, the compressed examples converge, in distribution, to that of the noise-free training data as the sample size of the training data (or the dimension of the training data) increases. In this sense, the utility of the data for learning is essentially maintained, while reducing storage and privacy leakage by quantifiable amounts. We present experimental results on the CelebA dataset for gender classification and find that our suggested pipeline delivers in practice on the promise of the theory: the individuals in the images are unrecognizable (or less recognizable, depending on the noise level), overall storage of the data is substantially reduced, with no essential loss (and in some cases a slight boost) to the classification accuracy. As an added bonus, our experiments suggest that our method yields a substantial boost to robustness in the face of adversarial test data.
    Graph Kalman Filters. (arXiv:2303.12021v1 [cs.LG])
    The well-known Kalman filters model dynamical systems by relying on state-space representations with the next state updated, and its uncertainty controlled, by fresh information associated with newly observed system outputs. This paper generalizes, for the first time in the literature, Kalman and extended Kalman filters to discrete-time settings where inputs, states, and outputs are represented as attributed graphs whose topology and attributes can change with time. The setup allows us to adapt the framework to cases where the output is a vector or a scalar too (node/graph level tasks). Within the proposed theoretical framework, the unknown state-transition and the readout functions are learned end-to-end along with the downstream prediction task.
    Better Understanding Differences in Attribution Methods via Systematic Evaluations. (arXiv:2303.11884v1 [cs.CV])
    Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. Evaluating such methods is challenging since no ground truth attributions exist. We thus propose three novel evaluation schemes to more reliably measure the faithfulness of those methods, to make comparisons between them more fair, and to make visual inspection more systematic. To address faithfulness, we propose a novel evaluation setting (DiFull) in which we carefully control which parts of the input can influence the output in order to distinguish possible from impossible attributions. To address fairness, we note that different methods are applied at different layers, which skews any comparison, and so evaluate all methods on the same layers (ML-Att) and discuss how this impacts their performance on quantitative metrics. For more systematic visualizations, we propose a scheme (AggAtt) to qualitatively evaluate the methods on complete datasets. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models. Finally, we propose a post-processing smoothing step that significantly improves the performance of some attribution methods, and discuss its applicability.
    DeepGraviLens: a Multi-Modal Architecture for Classifying Gravitational Lensing Data. (arXiv:2205.00701v3 [astro-ph.IM] UPDATED)
    Gravitational lensing is the relativistic effect generated by massive bodies, which bend the space-time surrounding them. It is a deeply investigated topic in astrophysics and allows validating theoretical relativistic results and studying faint astrophysical objects that would not be visible otherwise. In recent years Machine Learning methods have been applied to support the analysis of the gravitational lensing phenomena by detecting lensing effects in data sets consisting of images associated with brightness variation time series. However, the state-of-art approaches either consider only images and neglect time-series data or achieve relatively low accuracy on the most difficult data sets. This paper introduces DeepGraviLens, a novel multi-modal network that classifies spatio-temporal data belonging to one non-lensed system type and three lensed system types. It surpasses the current state of the art accuracy results by $\approx$ 19% to $\approx$ 43%, depending on the considered data set. Such an improvement will enable the acceleration of the analysis of lensed objects in upcoming astrophysical surveys, which will exploit the petabytes of data collected, e.g., from the Vera C. Rubin Observatory.
    Denoised MDPs: Learning World Models Better Than the World Itself. (arXiv:2206.15477v5 [cs.LG] UPDATED)
    The ability to separate signal from noise, and reason with clean abstractions, is critical to intelligence. With this ability, humans can efficiently perform real world tasks without considering all possible nuisance factors.How can artificial agents do the same? What kind of information can agents safely discard as noises? In this work, we categorize information out in the wild into four types based on controllability and relation with reward, and formulate useful information as that which is both controllable and reward-relevant. This framework clarifies the kinds information removed by various prior work on representation learning in reinforcement learning (RL), and leads to our proposed approach of learning a Denoised MDP that explicitly factors out certain noise distractors. Extensive experiments on variants of DeepMind Control Suite and RoboDesk demonstrate superior performance of our denoised world model over using raw observations alone, and over prior works, across policy optimization control tasks as well as the non-control task of joint position regression.
    Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play. (arXiv:2303.12076v1 [cs.RO])
    Teaching dexterity to multi-fingered robots has been a longstanding challenge in robotics. Most prominent work in this area focuses on learning controllers or policies that either operate on visual observations or state estimates derived from vision. However, such methods perform poorly on fine-grained manipulation tasks that require reasoning about contact forces or about objects occluded by the hand itself. In this work, we present T-Dex, a new approach for tactile-based dexterity, that operates in two phases. In the first phase, we collect 2.5 hours of play data, which is used to train self-supervised tactile encoders. This is necessary to bring high-dimensional tactile readings to a lower-dimensional embedding. In the second phase, given a handful of demonstrations for a dexterous task, we learn non-parametric policies that combine the tactile observations with visual ones. Across five challenging dexterous tasks, we show that our tactile-based dexterity models outperform purely vision and torque-based models by an average of 1.7X. Finally, we provide a detailed analysis on factors critical to T-Dex including the importance of play data, architectures, and representation learning.
    DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. (arXiv:2111.09543v3 [cs.CL] UPDATED)
    This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model by replacing mask language modeling (MLM) with replaced token detection (RTD), a more sample-efficient pre-training task. Our analysis shows that vanilla embedding sharing in ELECTRA hurts training efficiency and model performance. This is because the training losses of the discriminator and the generator pull token embeddings in different directions, creating the "tug-of-war" dynamics. We thus propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics, improving both training efficiency and the quality of the pre-trained model. We have pre-trained DeBERTaV3 using the same settings as DeBERTa to demonstrate its exceptional performance on a wide range of downstream natural language understanding (NLU) tasks. Taking the GLUE benchmark with eight tasks as an example, the DeBERTaV3 Large model achieves a 91.37% average score, which is 1.37% over DeBERTa and 1.91% over ELECTRA, setting a new state-of-the-art (SOTA) among the models with a similar structure. Furthermore, we have pre-trained a multi-lingual model mDeBERTa and observed a larger improvement over strong baselines compared to English models. For example, the mDeBERTa Base achieves a 79.8% zero-shot cross-lingual accuracy on XNLI and a 3.6% improvement over XLM-R Base, creating a new SOTA on this benchmark. We have made our pre-trained models and inference code publicly available at https://github.com/microsoft/DeBERTa.
    Diversity Induced Environment Design via Self-Play. (arXiv:2302.02119v2 [cs.AI] UPDATED)
    Recent work on designing an appropriate distribution of environments has shown promise for training effective generally capable agents. Its success is partly because of a form of adaptive curriculum learning that generates environment instances (or levels) at the frontier of the agent's capabilities. However, such an environment design framework often struggles to find effective levels in challenging design spaces and requires costly interactions with the environment. In this paper, we aim to introduce diversity in the Unsupervised Environment Design (UED) framework. Specifically, we propose a task-agnostic method to identify observed/hidden states that are representative of a given level. The outcome of this method is then utilized to characterize the diversity between two levels, which as we show can be crucial to effective performance. In addition, to improve sampling efficiency, we incorporate the self-play technique that allows the environment generator to automatically generate environments that are of great benefit to the training agent. Quantitatively, our approach, Diversity-induced Environment Design via Self-Play (DivSP), shows compelling performance over existing methods.
    Bayesian Optimization for Function Compositions with Applications to Dynamic Pricing. (arXiv:2303.11954v1 [cs.LG])
    Bayesian Optimization (BO) is used to find the global optima of black box functions. In this work, we propose a practical BO method of function compositions where the form of the composition is known but the constituent functions are expensive to evaluate. By assuming an independent Gaussian process (GP) model for each of the constituent black-box function, we propose EI and UCB based BO algorithms and demonstrate their ability to outperform vanilla BO and the current state-of-art algorithms. We demonstrate a novel application of the proposed methods to dynamic pricing in revenue management when the underlying demand function is expensive to evaluate.
    Clustering US Counties to Find Patterns Related to the COVID-19 Pandemic. (arXiv:2303.11936v1 [cs.LG])
    When COVID-19 first started spreading and quarantine was implemented, the Society for Industrial and Applied Mathematics (SIAM) Student Chapter at the University of Minnesota-Twin Cities began a collaboration with Ecolab to use our skills as data scientists and mathematicians to extract useful insights from relevant data relating to the pandemic. This collaboration consisted of multiple groups working on different projects. In this write-up we focus on using clustering techniques to help us find groups of similar counties in the US and use that to help us understand the pandemic. Our team for this project consisted of University of Minnesota students Cora Brown, Sarah Milstein, Tianyi Sun, and Cooper Zhao, with help from Ecolab Data Scientist Jimmy Broomfield and University of Minnesota student Skye Ke. In the sections below we describe all of the work done for this project. In Section 2, we list the data we gathered, as well as the feature engineering we performed. In Section 3, we describe the metrics we used for evaluating our models. In Section 4, we explain the methods we used for interpreting the results of our various clustering approaches. In Section 5, we describe the different clustering methods we implemented. In Section 6, we present the results of our clustering techniques and provide relevant interpretation. Finally, in Section 7, we provide some concluding remarks comparing the different clustering methods.
    Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework. (arXiv:2207.01955v3 [cs.LG] UPDATED)
    Despite the promising results achieved, state-of-the-art interactive reinforcement learning schemes rely on passively receiving supervision signals from advisor experts, in the form of either continuous monitoring or pre-defined rules, which inevitably result in a cumbersome and expensive learning process. In this paper, we introduce a novel initiative advisor-in-the-loop actor-critic framework, termed as Ask-AC, that replaces the unilateral advisor-guidance mechanism with a bidirectional learner-initiative one, and thereby enables a customized and efficacious message exchange between learner and advisor. At the heart of Ask-AC are two complementary components, namely action requester and adaptive state selector, that can be readily incorporated into various discrete actor-critic architectures. The former component allows the agent to initiatively seek advisor intervention in the presence of uncertain states, while the latter identifies the unstable states potentially missed by the former especially when environment changes, and then learns to promote the ask action on such states. Experimental results on both stationary and non-stationary environments and across different actor-critic backbones demonstrate that the proposed framework significantly improves the learning efficiency of the agent, and achieves the performances on par with those obtained by continuous advisor monitoring.
    Risk-Sensitive Reinforcement Learning with Exponential Criteria. (arXiv:2212.09010v2 [eess.SY] UPDATED)
    While risk-neutral reinforcement learning has shown experimental success in a number of applications, it is well-known to be non-robust with respect to noise and perturbations in the parameters of the system. For this reason, risk-sensitive reinforcement learning algorithms have been studied to introduce robustness and sample efficiency, and lead to better real-life performance. In this work, we introduce new model-free risk-sensitive reinforcement learning algorithms as variations of widely-used Policy Gradient algorithms with similar implementation properties. In particular, we study the effect of exponential criteria on the risk-sensitivity of the policy of a reinforcement learning agent, and develop variants of the Monte Carlo Policy Gradient algorithm and the online (temporal-difference) Actor-Critic algorithm. Analytical results showcase that the use of exponential criteria generalize commonly used ad-hoc regularization approaches. The implementation, performance, and robustness properties of the proposed methods are evaluated in simulated experiments.
    Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities. (arXiv:2210.06640v2 [cs.LG] UPDATED)
    Although deep learning has made great progress in recent years, the exploding economic and environmental costs of training neural networks are becoming unsustainable. To address this problem, there has been a great deal of research on *algorithmically-efficient deep learning*, which seeks to reduce training costs not at the hardware or implementation level, but through changes in the semantics of the training program. In this paper, we present a structured and comprehensive overview of the research in this field. First, we formalize the *algorithmic speedup* problem, then we use fundamental building blocks of algorithmically efficient training to develop a taxonomy. Our taxonomy highlights commonalities of seemingly disparate methods and reveals current research gaps. Next, we present evaluation best practices to enable comprehensive, fair, and reliable comparisons of speedup techniques. To further aid research and applications, we discuss common bottlenecks in the training pipeline (illustrated via experiments) and offer taxonomic mitigation strategies for them. Finally, we highlight some unsolved research challenges and present promising future directions.
    Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems. (arXiv:2205.09809v5 [cs.LG] UPDATED)
    Calibration is defined as the ratio of the average predicted click rate to the true click rate. The optimization of calibration is essential to many online advertising recommendation systems because it directly affects the downstream bids in ads auctions and the amount of money charged to advertisers. Despite its importance, calibration optimization often suffers from a problem called "maximization bias". Maximization bias refers to the phenomenon that the maximum of predicted values overestimates the true maximum. The problem is introduced because the calibration is computed on the set selected by the prediction model itself. It persists even if unbiased predictions can be achieved on every datapoint and worsens when covariate shifts exist between the training and test sets. To mitigate this problem, we theorize the quantification of maximization bias and propose a variance-adjusting debiasing (VAD) meta-algorithm in this paper. The algorithm is efficient, robust, and practical as it is able to mitigate maximization bias problems under covariate shifts, neither incurring additional online serving costs nor compromising the ranking performance. We demonstrate the effectiveness of the proposed algorithm using a state-of-the-art recommendation neural network model on a large-scale real-world dataset.
    Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles. (arXiv:1703.01347v3 [cs.AI] UPDATED)
    We study contextual linear bandit problems under feature uncertainty; they are noisy with missing entries. To address the challenges of the noise, we analyze Bayesian oracles given observed noisy features. Our Bayesian analysis finds that the optimal hypothesis can be far from the underlying realizability function, depending on the noise characteristics, which are highly non-intuitive and do not occur for classical noiseless setups. This implies that classical approaches cannot guarantee a non-trivial regret bound. Therefore, we propose an algorithm that aims at the Bayesian oracle from observed information under this model, achieving $\tilde{O}(d\sqrt{T})$ regret bound when there is a large number of arms. We demonstrate the proposed algorithm using synthetic and real-world datasets.
    Non-Asymptotic Pointwise and Worst-Case Bounds for Classical Spectrum Estimators. (arXiv:2303.11908v1 [math.ST])
    Spectrum estimation is a fundamental methodology in the analysis of time-series data, with applications including medicine, speech analysis, and control design. The asymptotic theory of spectrum estimation is well-understood, but the theory is limited when the number of samples is fixed and finite. This paper gives non-asymptotic error bounds for a broad class of spectral estimators, both pointwise (at specific frequencies) and in the worst case over all frequencies. The general method is used to derive error bounds for the classical Blackman-Tukey, Bartlett, and Welch estimators.
    PhyGNNet: Solving spatiotemporal PDEs with Physics-informed Graph Neural Network. (arXiv:2208.04319v2 [cs.NE] UPDATED)
    Solving partial differential equations (PDEs) is an important research means in the fields of physics, biology, and chemistry. As an approximate alternative to numerical methods, PINN has received extensive attention and played an important role in many fields. However, PINN uses a fully connected network as its model, which has limited fitting ability and limited extrapolation ability in both time and space. In this paper, we propose PhyGNNet for solving partial differential equations on the basics of a graph neural network which consists of encoder, processer, and decoder blocks. In particular, we divide the computing area into regular grids, define partial differential operators on the grids, then construct pde loss for the network to optimize to build PhyGNNet model. What's more, we conduct comparative experiments on Burgers equation and heat equation to validate our approach, the results show that our method has better fit ability and extrapolation ability both in time and spatial areas compared with PINN.
    Physics informed machine learning with Smoothed particle hydrodynamics: Hierarchy of reduced Lagrangian models of turbulence. (arXiv:2110.13311v5 [physics.flu-dyn] UPDATED)
    Building efficient, accurate and generalizable reduced order models of developed turbulence remains a major challenge. This manuscript approaches this problem by developing a hierarchy of parameterized reduced Lagrangian models for turbulent flows, and investigates the effects of enforcing physical structure through Smoothed Particle Hydrodynamics (SPH) versus relying on neural networks (NN)s as universal function approximators. Starting from Neural Network (NN) parameterizations of a Lagrangian acceleration operator, this hierarchy of models gradually incorporates a weakly compressible and parameterized SPH framework, which enforces physical symmetries, such as Galilean, rotational and translational invariances. Within this hierarchy, two new parameterized smoothing kernels are developed in order to increase the flexibility of the learn-able SPH simulators. For each model we experiment with different loss functions which are minimized using gradient based optimization, where efficient computations of gradients are obtained by using Automatic Differentiation (AD) and Sensitivity Analysis (SA). Each model within the hierarchy is trained on two data sets associated with weekly compressible Homogeneous Isotropic Turbulence (HIT): (1) a validation set using weakly compressible SPH; and (2) a high fidelity set from Direct Numerical Simulations (DNS). Numerical evidence shows that encoding more SPH structure improves generalizability to different turbulent Mach numbers and time shifts, and that including the novel parameterized smoothing kernels improves the accuracy of SPH at the resolved scales.
    Dynamic Vertex Replacement Grammars. (arXiv:2303.11553v1 [cs.LG])
    Context-free graph grammars have shown a remarkable ability to model structures in real-world relational data. However, graph grammars lack the ability to capture time-changing phenomena since the left-to-right transitions of a production rule do not represent temporal change. In the present work, we describe dynamic vertex-replacement grammars (DyVeRG), which generalize vertex replacement grammars in the time domain by providing a formal framework for updating a learned graph grammar in accordance with modifications to its underlying data. We show that DyVeRG grammars can be learned from, and used to generate, real-world dynamic graphs faithfully while remaining human-interpretable. We also demonstrate their ability to forecast by computing dyvergence scores, a novel graph similarity measurement exposed by this framework.
    Continual Learning in the Presence of Spurious Correlation. (arXiv:2303.11863v1 [cs.LG])
    Most continual learning (CL) algorithms have focused on tackling the stability-plasticity dilemma, that is, the challenge of preventing the forgetting of previous tasks while learning new ones. However, they have overlooked the impact of the knowledge transfer when the dataset in a certain task is biased - namely, when some unintended spurious correlations of the tasks are learned from the biased dataset. In that case, how would they affect learning future tasks or the knowledge already learned from the past tasks? In this work, we carefully design systematic experiments using one synthetic and two real-world datasets to answer the question from our empirical findings. Specifically, we first show through two-task CL experiments that standard CL methods, which are unaware of dataset bias, can transfer biases from one task to another, both forward and backward, and this transfer is exacerbated depending on whether the CL methods focus on the stability or the plasticity. We then present that the bias transfer also exists and even accumulate in longer sequences of tasks. Finally, we propose a simple, yet strong plug-in method for debiasing-aware continual learning, dubbed as Group-class Balanced Greedy Sampling (BGS). As a result, we show that our BGS can always reduce the bias of a CL model, with a slight loss of CL performance at most.
    Geometrical aspects of lattice gauge equivariant convolutional neural networks. (arXiv:2303.11448v1 [hep-lat])
    Lattice gauge equivariant convolutional neural networks (L-CNNs) are a framework for convolutional neural networks that can be applied to non-Abelian lattice gauge theories without violating gauge symmetry. We demonstrate how L-CNNs can be equipped with global group equivariance. This allows us to extend the formulation to be equivariant not just under translations but under global lattice symmetries such as rotations and reflections. Additionally, we provide a geometric formulation of L-CNNs and show how convolutions in L-CNNs arise as a special case of gauge equivariant neural networks on SU($N$) principal bundles.
    Blow-up Algorithm for Sum-of-Products Polynomials and Real Log Canonical Thresholds. (arXiv:2303.11619v1 [math.ST])
    When considering a real log canonical threshold (RLCT) that gives a Bayesian generalization error, in general, papers replace a mean error function with a relatively simple polynomial whose RLCT corresponds to that of the mean error function, and obtain its RLCT by resolving its singularities through an algebraic operation called blow-up. Though it is known that the singularities of any polynomial can be resolved by a finite number of blow-up iterations, it is not clarified whether or not it is possible to resolve singularities of a specific polynomial by applying a specific blow-up algorithm. Therefore this paper considers the blow-up algorithm for the polynomials called sum-of-products (sop) polynomials and its RLCT.
    A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?. (arXiv:2303.11717v1 [cs.AI])
    As ChatGPT goes viral, generative AI (AIGC, a.k.a AI-generated content) has made headlines everywhere because of its ability to analyze and create text, images, and beyond. With such overwhelming media coverage, it is almost impossible for us to miss the opportunity to glimpse AIGC from a certain angle. In the era of AI transitioning from pure analysis to creation, it is worth noting that ChatGPT, with its most recent language model GPT-4, is just a tool out of numerous AIGC tasks. Impressed by the capability of the ChatGPT, many people are wondering about its limits: can GPT-5 (or other future GPT variants) help ChatGPT unify all AIGC tasks for diversified content creation? Toward answering this question, a comprehensive review of existing AIGC tasks is needed. As such, our work comes to fill this gap promptly by offering a first look at AIGC, ranging from its techniques to applications. Modern generative AI relies on various technical foundations, ranging from model architecture and self-supervised pretraining to generative modeling methods (like GAN and diffusion models). After introducing the fundamental techniques, this work focuses on the technological development of various AIGC tasks based on their output type, including text, images, videos, 3D content, etc., which depicts the full potential of ChatGPT's future. Moreover, we summarize their significant applications in some mainstream industries, such as education and creativity content. Finally, we discuss the challenges currently faced and present an outlook on how generative AI might evolve in the near future.
    Doubly Regularized Entropic Wasserstein Barycenters. (arXiv:2303.11844v1 [math.OC])
    We study a general formulation of regularized Wasserstein barycenters that enjoys favorable regularity, approximation, stability and (grid-free) optimization properties. This barycenter is defined as the unique probability measure that minimizes the sum of entropic optimal transport (EOT) costs with respect to a family of given probability measures, plus an entropy term. We denote it $(\lambda,\tau)$-barycenter, where $\lambda$ is the inner regularization strength and $\tau$ the outer one. This formulation recovers several previously proposed EOT barycenters for various choices of $\lambda,\tau \geq 0$ and generalizes them. First, in spite of -- and in fact owing to -- being \emph{doubly} regularized, we show that our formulation is debiased for $\tau=\lambda/2$: the suboptimality in the (unregularized) Wasserstein barycenter objective is, for smooth densities, of the order of the strength $\lambda^2$ of entropic regularization, instead of $\max\{\lambda,\tau\}$ in general. We discuss this phenomenon for isotropic Gaussians where all $(\lambda,\tau)$-barycenters have closed form. Second, we show that for $\lambda,\tau>0$, this barycenter has a smooth density and is strongly stable under perturbation of the marginals. In particular, it can be estimated efficiently: given $n$ samples from each of the probability measures, it converges in relative entropy to the population barycenter at a rate $n^{-1/2}$. And finally, this formulation lends itself naturally to a grid-free optimization algorithm: we propose a simple \emph{noisy particle gradient descent} which, in the mean-field limit, converges globally at an exponential rate to the barycenter.
    Combining Deep Metric Learning Approaches for Aerial Scene Classification. (arXiv:2303.11389v1 [cs.CV])
    Aerial scene classification, which aims to semantically label remote sensing images in a set of predefined classes (e.g., agricultural, beach, and harbor), is a very challenging task in remote sensing due to high intra-class variability and the different scales and orientations of the objects present in the dataset images. In remote sensing area, the use of CNN architectures as an alternative solution is also a reality for scene classification tasks. Generally, these CNNs are used to perform the traditional image classification task. However, another less used way to classify remote sensing image might be the one that uses deep metric learning (DML) approaches. In this sense, this work proposes to employ six DML approaches for aerial scene classification tasks, analysing their behave with four different pre-trained CNNs as well as combining them through the use of evolutionary computation algorithm (UMDA). In performed experiments, it is possible to observe than DML approaches can achieve the best classification results when compared to traditional pre-trained CNNs for three well-known remote sensing aerial scene datasets. In addition, the UMDA algorithm proved to be a promising strategy to combine DML approaches when there is diversity among them, managing to improve at least 5.6% of accuracy in the classification results using almost 50\% of the available classifiers for the construction of the final ensemble of classifiers.
    Reflexion: an autonomous agent with dynamic memory and self-reflection. (arXiv:2303.11366v1 [cs.AI])
    Recent advancements in decision-making large language model (LLM) agents have demonstrated impressive performance across various benchmarks. However, these state-of-the-art approaches typically necessitate internal model fine-tuning, external model fine-tuning, or policy optimization over a defined state space. Implementing these methods can prove challenging due to the scarcity of high-quality training data or the lack of well-defined state space. Moreover, these agents do not possess certain qualities inherent to human decision-making processes, specifically the ability to learn from mistakes. Self-reflection allows humans to efficiently solve novel problems through a process of trial and error. Building on recent research, we propose Reflexion, an approach that endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. To achieve full automation, we introduce a straightforward yet effective heuristic that enables the agent to pinpoint hallucination instances, avoid repetition in action sequences, and, in some environments, construct an internal memory map of the given environment. To assess our approach, we evaluate the agent's ability to complete decision-making tasks in AlfWorld environments and knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments. We observe success rates of 97% and 51%, respectively, and provide a discussion on the emergent property of self-reflection.  ( 2 min )
    GLADE: Gradient Loss Augmented Degradation Enhancement for Unpaired Super-Resolution of Anisotropic MRI. (arXiv:2303.11831v1 [cs.CV])
    We present a novel approach to synthesise high-resolution isotropic 3D abdominal MR images, from anisotropic 3D images in an unpaired fashion. Using a modified CycleGAN architecture with a gradient mapping loss, we leverage disjoint patches from the high-resolution (in-plane) data of an anisotropic volume to enforce the network generator to increase the resolution of the low-resolution (through-plane) slices. This will enable accelerated whole-abdomen scanning with high-resolution isotropic images within short breath-hold times.
    SIFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency. (arXiv:2303.11525v1 [cs.LG])
    Recent works have explored the use of weight sparsity to improve the training efficiency (test accuracy w.r.t training FLOPs) of deep neural networks (DNNs). These works aim to reduce training FLOPs but training with sparse weights often leads to accuracy loss or requires longer train schedules, making the resulting training efficiency less clear. In contrast, we focus on using sparsity to increase accuracy while using the same FLOPS as the dense model and show training efficiency gains through higher accuracy. In this work, we introduce SIFT, a family of Sparse Iso-FLOP Transformations which are used as drop-in replacements for dense layers to improve their representational capacity and FLOP efficiency. Each transformation is parameterized by a single parameter (sparsity level) and provides a larger search space to find optimal sparse masks. Without changing any training hyperparameters, replacing dense layers with SIFT leads to significant improvements across computer vision (CV) and natural language processing (NLP) tasks, including ResNet-18 on ImageNet (+3.5%) and GPT-3 Small on WikiText-103 (-0.4 PPL), both matching larger dense model variants with 2x or more FLOPs. To the best of our knowledge, this is the first work to demonstrate the use of sparsity for improving accuracy of dense models via a simple-to-use set of sparse transformations. Code is available at: https://github.com/CerebrasResearch/SIFT.
    The Representational Status of Deep Learning Models. (arXiv:2303.12032v1 [cs.AI])
    This paper aims to clarify the representational status of Deep Learning Models (DLMs). While commonly referred to as 'representations', what this entails is ambiguous due to a conflation of functional and relational conceptions of representation. This paper argues that while DLMs represent their targets in a relational sense, they are best understood as highly idealized models. This result has immediate implications for explainable AI (XAI) and directs philosophical attention toward examining the idealized nature of DLM representations and their role in future scientific investigation.
    Time Series Contrastive Learning with Information-Aware Augmentations. (arXiv:2303.11911v1 [cs.LG])
    Various contrastive learning approaches have been proposed in recent years and achieve significant empirical success. While effective and prevalent, contrastive learning has been less explored for time series data. A key component of contrastive learning is to select appropriate augmentations imposing some priors to construct feasible positive samples, such that an encoder can be trained to learn robust and discriminative representations. Unlike image and language domains where ``desired'' augmented samples can be generated with the rule of thumb guided by prefabricated human priors, the ad-hoc manual selection of time series augmentations is hindered by their diverse and human-unrecognizable temporal structures. How to find the desired augmentations of time series data that are meaningful for given contrastive learning tasks and datasets remains an open question. In this work, we address the problem by encouraging both high \textit{fidelity} and \textit{variety} based upon information theory. A theoretical analysis leads to the criteria for selecting feasible data augmentations. On top of that, we propose a new contrastive learning approach with information-aware augmentations, InfoTS, that adaptively selects optimal augmentations for time series representation learning. Experiments on various datasets show highly competitive performance with up to 12.0\% reduction in MSE on forecasting tasks and up to 3.7\% relative improvement in accuracy on classification tasks over the leading baselines.
    Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression. (arXiv:2211.07484v2 [cs.LG] UPDATED)
    We consider a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We present a new algorithm that is simple, computationally efficient, and admits vanishing regret. It is statistically optimal for CBwK when an algorithm must stop once some constraint is violated. Our algorithm builds on LagrangeBwK (Immorlica et al., FOCS 2019) , a Lagrangian-based technique for CBwK, and SquareCB (Foster and Rakhlin, ICML 2020), a regression-based technique for contextual bandits. Our analysis leverages the inherent modularity of both techniques.
    Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models. (arXiv:2206.04959v4 [cs.LG] UPDATED)
    Foundation models are becoming the dominant deep learning technologies. Pretraining a foundation model is always time-consumed due to the large scale of both the model parameter and training dataset. Besides being computing-intensive, the training process is extremely memory-intensive and communication-intensive. These features make it necessary to apply 3D parallelism, which integrates data parallelism, pipeline model parallelism and tensor model parallelism, to achieve high training efficiency. To achieve this goal, some custom software frameworks such as Megatron-LM and DeepSpeed are developed. However, current 3D parallelism frameworks still meet two issues: i) they are not transparent to model developers, which need to manually modify the model to parallelize training. ii) their utilization of computation, GPU memory and network bandwidth are not sufficient. We propose Merak, an automated 3D parallelism deep learning training framework with high resource utilization. Merak automatically deploys with an automatic model partitioner, which uses a graph sharding algorithm on a proxy representation of the model. Merak also presents the non-intrusive API for scaling out foundation model training with minimal code modification. In addition, we design a high-performance 3D parallel runtime engine in Merak. It uses several techniques to exploit available training resources, including shifted critical path pipeline schedule that brings a higher computation utilization, stage-aware recomputation that makes use of idle worker memory, and sub-pipelined tensor model parallelism that overlaps communication and computation. Experiments on 64 GPUs show Merak can speedup the training performance over the state-of-the-art 3D parallelism frameworks of models with 1.5, 2.5, 8.3, and 20 billion parameters by up to 1.42X, 1.39X, 1.43X, and 1.61X, respectively.  ( 3 min )
    Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement. (arXiv:2303.11373v1 [cs.LG])
    Object rearrangement is a challenge for embodied agents because solving these tasks requires generalizing across a combinatorially large set of configurations of entities and their locations. Worse, the representations of these entities are unknown and must be inferred from sensory percepts. We present a hierarchical abstraction approach to uncover these underlying entities and achieve combinatorial generalization from unstructured visual inputs. By constructing a factorized transition graph over clusters of entity representations inferred from pixels, we show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment. We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects, which outperforms current offline deep RL methods when evaluated on simulated rearrangement tasks.
    TaDaa: real time Ticket Assignment Deep learning Auto Advisor for customer support, help desk, and issue ticketing systems. (arXiv:2207.11187v2 [cs.IR] UPDATED)
    This paper proposes TaDaa: Ticket Assignment Deep learning Auto Advisor, which leverages the latest Transformers models and machine learning techniques quickly assign issues within an organization, like customer support, help desk and alike issue ticketing systems. The project provides functionality to 1) assign an issue to the correct group, 2) assign an issue to the best resolver, and 3) provide the most relevant previously solved tickets to resolvers. We leverage one ticketing system sample dataset, with over 3k+ groups and over 10k+ resolvers to obtain a 95.2% top 3 accuracy on group suggestions and a 79.0% top 5 accuracy on resolver suggestions. We hope this research will greatly improve average issue resolution time on customer support, help desk, and issue ticketing systems.
    FlexVDW: A machine learning approach to account for protein flexibility in ligand docking. (arXiv:2303.11494v1 [q-bio.BM])
    Most widely used ligand docking methods assume a rigid protein structure. This leads to problems when the structure of the target protein deforms upon ligand binding. In particular, the ligand's true binding pose is often scored very unfavorably due to apparent clashes between ligand and protein atoms, which lead to extremely high values of the calculated van der Waals energy term. Traditionally, this problem has been addressed by explicitly searching for receptor conformations to account for the flexibility of the receptor in ligand binding. Here we present a deep learning model trained to take receptor flexibility into account implicitly when predicting van der Waals energy. We show that incorporating this machine-learned energy term into a state-of-the-art physics-based scoring function improves small molecule ligand pose prediction results in cases with substantial protein deformation, without degrading performance in cases with minimal protein deformation. This work demonstrates the feasibility of learning effects of protein flexibility on ligand binding without explicitly modeling changes in protein structure.
    Improving Deep Dynamics Models for Autonomous Vehicles with Multimodal Latent Mapping of Surfaces. (arXiv:2303.11756v1 [cs.RO])
    The safe deployment of autonomous vehicles relies on their ability to effectively react to environmental changes. This can require maneuvering on varying surfaces which is still a difficult problem, especially for slippery terrains. To address this issue we propose a new approach that learns a surface-aware dynamics model by conditioning it on a latent variable vector storing surface information about the current location. A latent mapper is trained to update these latent variables during inference from multiple modalities on every traversal of the corresponding locations and stores them in a map. By training everything end-to-end with the loss of the dynamics model, we enforce the latent mapper to learn an update rule for the latent map that is useful for the subsequent dynamics model. We implement and evaluate our approach on a real miniature electric car. The results show that the latent map is updated to allow more accurate predictions of the dynamics model compared to a model without this information. We further show that by using this model, the driving performance can be improved on varying and challenging surfaces.
    Online Transformers with Spiking Neurons for Fast Prosthetic Hand Control. (arXiv:2303.11860v1 [cs.NE])
    Transformers are state-of-the-art networks for most sequence processing tasks. However, the self-attention mechanism often used in Transformers requires large time windows for each computation step and thus makes them less suitable for online signal processing compared to Recurrent Neural Networks (RNNs). In this paper, instead of the self-attention mechanism, we use a sliding window attention mechanism. We show that this mechanism is more efficient for continuous signals with finite-range dependencies between input and target, and that we can use it to process sequences element-by-element, this making it compatible with online processing. We test our model on a finger position regression dataset (NinaproDB8) with Surface Electromyographic (sEMG) signals measured on the forearm skin to estimate muscle activities. Our approach sets the new state-of-the-art in terms of accuracy on this dataset while requiring only very short time windows of 3.5 ms at each inference step. Moreover, we increase the sparsity of the network using Leaky-Integrate and Fire (LIF) units, a bio-inspired neuron model that activates sparsely in time solely when crossing a threshold. We thus reduce the number of synaptic operations up to a factor of $\times5.3$ without loss of accuracy. Our results hold great promises for accurate and fast online processing of sEMG signals for smooth prosthetic hand control and is a step towards Transformers and Spiking Neural Networks (SNNs) co-integration for energy efficient temporal signal processing.
    Manipulating Transfer Learning for Property Inference. (arXiv:2303.11643v1 [cs.LG])
    Transfer learning is a popular method for tuning pretrained (upstream) models for different downstream tasks using limited data and computational resources. We study how an adversary with control over an upstream model used in transfer learning can conduct property inference attacks on a victim's tuned downstream model. For example, to infer the presence of images of a specific individual in the downstream training set. We demonstrate attacks in which an adversary can manipulate the upstream model to conduct highly effective and specific property inference attacks (AUC score $> 0.9$), without incurring significant performance loss on the main task. The main idea of the manipulation is to make the upstream model generate activations (intermediate features) with different distributions for samples with and without a target property, thus enabling the adversary to distinguish easily between downstream models trained with and without training examples that have the target property. Our code is available at https://github.com/yulongt23/Transfer-Inference.
    HDformer: A Higher Dimensional Transformer for Diabetes Detection Utilizing Long Range Vascular Signals. (arXiv:2303.11340v1 [cs.LG])
    Diabetes mellitus is a worldwide concern, and early detection can help to prevent serious complications. Low-cost, non-invasive detection methods, which take cardiovascular signals into deep learning models, have emerged. However, limited accuracy constrains their clinical usage. In this paper, we present a new Transformer-based architecture, Higher Dimensional Transformer (HDformer), which takes long-range photoplethysmography (PPG) signals to detect diabetes. The long-range PPG contains broader and deeper signal contextual information compared to the less-than-one-minute PPG signals commonly utilized in existing research. To increase the capability and efficiency of processing the long range data, we propose a new attention module Time Square Attention (TSA), reducing the volume of the tokens by more than 10x, while retaining the local/global dependencies. It converts the 1-dimensional inputs into 2-dimensional representations and groups adjacent points into a single 2D token, using the 2D Transformer models as the backbone of the encoder. It generates the dynamic patch sizes into a gated mixture-of-experts (MoE) network as decoder, which optimizes the learning on different attention areas. Extensive experimentations show that HDformer results in the state-of-the-art performance (sensitivity 98.4, accuracy 97.3, specificity 92.8, and AUC 0.929) on the standard MIMIC-III dataset, surpassing existing studies. This work is the first time to take long-range, non-invasive PPG signals via Transformer for diabetes detection, achieving a more scalable and convenient solution compared to traditional invasive approaches. The proposed HDformer can also be scaled to analyze general long-range biomedical waveforms. A wearable prototype finger-ring is designed as a proof of concept.
    Towards Domain Generalization for ECG and EEG Classification: Algorithms and Benchmarks. (arXiv:2303.11338v1 [eess.SP])
    Despite their immense success in numerous fields, machine and deep learning systems have not have not yet been able to firmly establish themselves in mission-critical applications in healthcare. One of the main reasons lies in the fact that when models are presented with previously unseen, Out-of-Distribution samples, their performance deteriorates significantly. This is known as the Domain Generalization (DG) problem. Our objective in this work is to propose a benchmark for evaluating DG algorithms, in addition to introducing a novel architecture for tackling DG in biosignal classification. In this paper, we describe the Domain Generalization problem for biosignals, focusing on electrocardiograms (ECG) and electroencephalograms (EEG) and propose and implement an open-source biosignal DG evaluation benchmark. Furthermore, we adapt state-of-the-art DG algorithms from computer vision to the problem of 1D biosignal classification and evaluate their effectiveness. Finally, we also introduce a novel neural network architecture that leverages multi-layer representations for improved model generalizability. By implementing the above DG setup we are able to experimentally demonstrate the presence of the DG problem in ECG and EEG datasets. In addition, our proposed model demonstrates improved effectiveness compared to the baseline algorithms, exceeding the state-of-the-art in both datasets. Recognizing the significance of the distribution shift present in biosignal datasets, the presented benchmark aims at urging further research into the field of biomedical DG by simplifying the evaluation process of proposed algorithms. To our knowledge, this is the first attempt at developing an open-source evaluation framework for evaluating ECG and EEG DG algorithms.  ( 3 min )
    Towards Models that Can See and Read. (arXiv:2301.07389v2 [cs.CV] UPDATED)
    Visual Question Answering (VQA) and Image Captioning (CAP), which are among the most popular vision-language tasks, have analogous scene-text versions that require reasoning from the text in the image. Despite their obvious resemblance, the two are treated independently and, as we show, yield task-specific methods that can either see or read, but not both. In this work, we conduct an in-depth analysis of this phenomenon and propose UniTNT, a Unified Text-Non-Text approach, which grants existing multimodal architectures scene-text understanding capabilities. Specifically, we treat scene-text information as an additional modality, fusing it with any pretrained encoder-decoder-based architecture via designated modules. Thorough experiments reveal that UniTNT leads to the first single model that successfully handles both task types. Moreover, we show that scene-text understanding capabilities can boost vision-language models' performance on general VQA and CAP by up to 2.69% and 0.6 CIDEr, respectively.  ( 2 min )
    MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action. (arXiv:2303.11381v1 [cs.CV])
    We propose MM-REACT, a system paradigm that integrates ChatGPT with a pool of vision experts to achieve multimodal reasoning and action. In this paper, we define and explore a comprehensive list of advanced vision tasks that are intriguing to solve, but may exceed the capabilities of existing vision and vision-language models. To achieve such advanced visual intelligence, MM-REACT introduces a textual prompt design that can represent text descriptions, textualized spatial coordinates, and aligned file names for dense visual signals such as images and videos. MM-REACT's prompt design allows language models to accept, associate, and process multimodal information, thereby facilitating the synergetic combination of ChatGPT and various vision experts. Zero-shot experiments demonstrate MM-REACT's effectiveness in addressing the specified capabilities of interests and its wide application in different scenarios that require advanced visual understanding. Furthermore, we discuss and compare MM-REACT's system paradigm with an alternative approach that extends language models for multimodal scenarios through joint finetuning. Code, demo, video, and visualization are available at https://multimodal-react.github.io/  ( 2 min )
    Fighting Money Laundering with Statistics and Machine Learning. (arXiv:2201.04207v5 [stat.ML] UPDATED)
    Money laundering is a profound global problem. Nonetheless, there is little scientific literature on statistical and machine learning methods for anti-money laundering. In this paper, we focus on anti-money laundering in banks and provide an introduction and review of the literature. We propose a unifying terminology with two central elements: (i) client risk profiling and (ii) suspicious behavior flagging. We find that client risk profiling is characterized by diagnostics, i.e., efforts to find and explain risk factors. On the other hand, suspicious behavior flagging is characterized by non-disclosed features and hand-crafted risk indices. Finally, we discuss directions for future research. One major challenge is the need for more public data sets. This may potentially be addressed by synthetic data generation. Other possible research directions include semi-supervised and deep learning, interpretability, and fairness of the results.  ( 2 min )
    Grading Conversational Responses Of Chatbots. (arXiv:2303.12038v1 [cs.CL])
    Chatbots have long been capable of answering basic questions and even responding to obscure prompts, but recently their improvements have been far more significant. Modern chatbots like Open AIs ChatGPT3 not only have the ability to answer basic questions but can write code and movie scripts and imitate well-known people. In this paper, we analyze ChatGPTs' responses to various questions from a dataset of queries from the popular Quora forum. We submitted sixty questions to ChatGPT and scored the answers based on three industry-standard metrics for grading machine translation: BLEU, METEOR, and ROUGE. These metrics allow us to compare the machine responses with the most upvoted human answer to the same question to assess ChatGPT's ability to submit a humanistic reply. The results showed that while the responses and translation abilities of ChatGPT are remarkable, they still fall short of what a typical human reaction would be.  ( 2 min )
    Unlocking Layer-wise Relevance Propagation for Autoencoders. (arXiv:2303.11734v1 [cs.LG])
    Autoencoders are a powerful and versatile tool often used for various problems such as anomaly detection, image processing and machine translation. However, their reconstructions are not always trivial to explain. Therefore, we propose a fast explainability solution by extending the Layer-wise Relevance Propagation method with the help of Deep Taylor Decomposition framework. Furthermore, we introduce a novel validation technique for comparing our explainability approach with baseline methods in the case of missing ground-truth data. Our results highlight computational as well as qualitative advantages of the proposed explainability solution with respect to existing methods.
    Training Invertible Neural Networks as Autoencoders. (arXiv:2303.11239v2 [cs.LG] UPDATED)
    Autoencoders are able to learn useful data representations in an unsupervised matter and have been widely used in various machine learning and computer vision tasks. In this work, we present methods to train Invertible Neural Networks (INNs) as (variational) autoencoders which we call INN (variational) autoencoders. Our experiments on MNIST, CIFAR and CelebA show that for low bottleneck sizes our INN autoencoder achieves results similar to the classical autoencoder. However, for large bottleneck sizes our INN autoencoder outperforms its classical counterpart. Based on the empirical results, we hypothesize that INN autoencoders might not have any intrinsic information loss and thereby are not bounded to a maximal number of layers (depth) after which only suboptimal results can be achieved.  ( 2 min )
    Neural networks trained on synthetically generated crystals can extract structural information from ICSD powder X-ray diffractograms. (arXiv:2303.11699v1 [cond-mat.mtrl-sci])
    Machine learning techniques have successfully been used to extract structural information such as the crystal space group from powder X-ray diffractograms. However, training directly on simulated diffractograms from databases such as the ICSD is challenging due to its limited size, class-inhomogeneity, and bias toward certain structure types. We propose an alternative approach of generating synthetic crystals with random coordinates by using the symmetry operations of each space group. Based on this approach, we demonstrate online training of deep ResNet-like models on up to a few million unique on-the-fly generated synthetic diffractograms per hour. For our chosen task of space group classification, we achieved a test accuracy of 79.9% on unseen ICSD structure types from most space groups. This surpasses the 56.1% accuracy of the current state-of-the-art approach of training on ICSD crystals directly. Our results demonstrate that synthetically generated crystals can be used to extract structural information from ICSD powder diffractograms, which makes it possible to apply very large state-of-the-art machine learning models in the area of powder X-ray diffraction. We further show first steps toward applying our methodology to experimental data, where automated XRD data analysis is crucial, especially in high-throughput settings. While we focused on the prediction of the space group, our approach has the potential to be extended to related tasks in the future.
    Cross-Domain Evaluation of a Deep Learning-Based Type Inference System. (arXiv:2208.09189v3 [cs.SE] UPDATED)
    Optional type annotations allow for enriching dynamic programming languages with static typing features like better Integrated Development Environment (IDE) support, more precise program analysis, and early detection and prevention of type-related runtime errors. Machine learning-based type inference promises interesting results for automating this task. However, the practical usage of such systems depends on their ability to generalize across different domains, as they are often applied outside their training domain. In this work, we investigate Type4Py as a representative of state-of-the-art deep learning-based type inference systems, by conducting extensive cross-domain experiments. Thereby, we address the following problems: class imbalances, out-of-vocabulary words, dataset shifts, and unknown classes. To perform such experiments, we use the datasets ManyTypes4Py and CrossDomainTypes4Py. The latter we introduce in this paper. Our dataset enables the evaluation of type inference systems in different domains of software projects and has over 1,000,000 type annotations mined on the platforms GitHub and Libraries. It consists of data from the two domains web development and scientific calculation. Through our experiments, we detect that the shifts in the dataset and the long-tailed distribution with many rare and unknown data types decrease the performance of the deep learning-based type inference system drastically. In this context, we test unsupervised domain adaptation methods and fine-tuning to overcome these issues. Moreover, we investigate the impact of out-of-vocabulary words.  ( 2 min )
    Simplifying Momentum-based Riemannian Submanifold Optimization. (arXiv:2302.09738v2 [stat.ML] UPDATED)
    Riemannian submanifold optimization with momentum is computationally challenging because ensuring iterates remain on the submanifold often requires solving difficult differential equations. We simplify such optimization algorithms for the submanifold of symmetric positive-definite matrices with the affine invariant metric. We propose a generalized version of the Riemannian normal coordinates which dynamically trivializes the problem into a Euclidean unconstrained problem. We use our approach to explain and simplify existing approaches for structured covariances and develop efficient second-order optimizers for deep learning without explicit matrix inverses.  ( 2 min )
    ResDTA: Predicting Drug-Target Binding Affinity Using Residual Skip Connections. (arXiv:2303.11434v1 [cs.LG])
    The discovery of novel drug target (DT) interactions is an important step in the drug development process. The majority of computer techniques for predicting DT interactions have focused on binary classification, with the goal of determining whether or not a DT pair interacts. Protein ligand interactions, on the other hand, assume a continuous range of binding strength values, also known as binding affinity, and forecasting this value remains a difficulty. As the amount of affinity data in DT knowledge-bases grows, advanced learning techniques such as deep learning architectures can be used to predict binding affinities. In this paper, we present a deep-learning-based methodology for predicting DT binding affinities using just sequencing information from both targets and drugs. The results show that the proposed deep learning-based model that uses the 1D representations of targets and drugs is an effective approach for drug target binding affinity prediction and it does not require additional chemical domain knowledge to work with. The model in which high-level representations of a drug and a target are constructed via CNNs that uses residual skip connections and also with an additional stream to create a high-level combined representation of the drug-target pair achieved the best Concordance Index (CI) performance in one of the largest benchmark datasets, outperforming the recent state-of-the-art method AttentionDTA and many other machine-learning and deep-learning based baseline methods for DT binding affinity prediction that uses the 1D representations of targets and drugs.
    Exact Non-Oblivious Performance of Rademacher Random Embeddings. (arXiv:2303.11774v1 [cs.LG])
    This paper revisits the performance of Rademacher random projections, establishing novel statistical guarantees that are numerically sharp and non-oblivious with respect to the input data. More specifically, the central result is the Schur-concavity property of Rademacher random projections with respect to the inputs. This offers a novel geometric perspective on the performance of random projections, while improving quantitatively on bounds from previous works. As a corollary of this broader result, we obtained the improved performance on data which is sparse or is distributed with small spread. This non-oblivious analysis is a novelty compared to techniques from previous work, and bridges the frequently observed gap between theory and practise. The main result uses an algebraic framework for proving Schur-concavity properties, which is a contribution of independent interest and an elegant alternative to derivative-based criteria.
    A Single-Step Multiclass SVM based on Quantum Annealing for Remote Sensing Data Classification. (arXiv:2303.11705v1 [cs.LG])
    In recent years, the development of quantum annealers has enabled experimental demonstrations and has increased research interest in applications of quantum annealing, such as in quantum machine learning and in particular for the popular quantum SVM. Several versions of the quantum SVM have been proposed, and quantum annealing has been shown to be effective in them. Extensions to multiclass problems have also been made, which consist of an ensemble of multiple binary classifiers. This work proposes a novel quantum SVM formulation for direct multiclass classification based on quantum annealing, called Quantum Multiclass SVM (QMSVM). The multiclass classification problem is formulated as a single Quadratic Unconstrained Binary Optimization (QUBO) problem solved with quantum annealing. The main objective of this work is to evaluate the feasibility, accuracy, and time performance of this approach. Experiments have been performed on the D-Wave Advantage quantum annealer for a classification problem on remote sensing data. The results indicate that, despite the memory demands of the quantum annealer, QMSVM can achieve accuracy that is comparable to standard SVM methods and, more importantly, it scales much more efficiently with the number of training examples, resulting in nearly constant time. This work shows an approach for bringing together classical and quantum computation, solving practical problems in remote sensing with current hardware.
    Learning Model-Free Robust Precoding for Cooperative Multibeam Satellite Communications. (arXiv:2303.11427v1 [eess.SP])
    Direct Low Earth Orbit satellite-to-handheld links are expected to be part of a new era in satellite communications. Space-Division Multiple Access precoding is a technique that reduces interference among satellite beams, therefore increasing spectral efficiency by allowing cooperating satellites to reuse frequency. Over the past decades, optimal precoding solutions with perfect channel state information have been proposed for several scenarios, whereas robust precoding with only imperfect channel state information has been mostly studied for simplified models. In particular, for Low Earth Orbit satellite applications such simplified models might not be accurate. In this paper, we use the function approximation capabilities of the Soft Actor-Critic deep Reinforcement Learning algorithm to learn robust precoding with no knowledge of the system imperfections.
    Lipschitz-bounded 1D convolutional neural networks using the Cayley transform and the controllability Gramian. (arXiv:2303.11835v1 [cs.LG])
    We establish a layer-wise parameterization for 1D convolutional neural networks (CNNs) with built-in end-to-end robustness guarantees. Herein, we use the Lipschitz constant of the input-output mapping characterized by a CNN as a robustness measure. We base our parameterization on the Cayley transform that parameterizes orthogonal matrices and the controllability Gramian for the state space representation of the convolutional layers. The proposed parameterization by design fulfills linear matrix inequalities that are sufficient for Lipschitz continuity of the CNN, which further enables unconstrained training of Lipschitz-bounded 1D CNNs. Finally, we train Lipschitz-bounded 1D CNNs for the classification of heart arrythmia data and show their improved robustness.
    AI-in-the-Loop -- The impact of HMI in AI-based Application. (arXiv:2303.11508v1 [cs.HC])
    Artificial intelligence (AI) and human-machine interaction (HMI) are two keywords that usually do not fit embedded applications. Within the steps needed before applying AI to solve a specific task, HMI is usually missing during the AI architecture design and the training of an AI model. The human-in-the-loop concept is prevalent in all other steps of developing AI, from data analysis via data selection and cleaning to performance evaluation. During AI architecture design, HMI can immediately highlight unproductive layers of the architecture so that lightweight network architecture for embedded applications can be created easily. We show that by using this HMI, users can instantly distinguish which AI architecture should be trained and evaluated first since a high accuracy on the task could be expected. This approach reduces the resources needed for AI development by avoiding training and evaluating AI architectures with unproductive layers and leads to lightweight AI architectures. These resulting lightweight AI architectures will enable HMI while running the AI on an edge device. By enabling HMI during an AI uses inference, we will introduce the AI-in-the-loop concept that combines AI's and humans' strengths. In our AI-in-the-loop approach, the AI remains the working horse and primarily solves the task. If the AI is unsure whether its inference solves the task correctly, it asks the user to use an appropriate HMI. Consequently, AI will become available in many applications soon since HMI will make AI more reliable and explainable.
    ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization. (arXiv:2303.11674v1 [cs.CV])
    Domain generalization (DG) aims to learn a model that generalizes well to unseen target domains utilizing multiple source domains without re-training. Most existing DG works are based on convolutional neural networks (CNNs). However, the local operation of the convolution kernel makes the model focus too much on local representations (e.g., texture), which inherently causes the model more prone to overfit to the source domains and hampers its generalization ability. Recently, several MLP-based methods have achieved promising results in supervised learning tasks by learning global interactions among different patches of the image. Inspired by this, in this paper, we first analyze the difference between CNN and MLP methods in DG and find that MLP methods exhibit a better generalization ability because they can better capture the global representations (e.g., structure) than CNN methods. Then, based on a recent lightweight MLP method, we obtain a strong baseline that outperforms most state-of-the-art CNN-based methods. The baseline can learn global structure representations with a filter to suppress structure irrelevant information in the frequency space. Moreover, we propose a dynAmic LOw-Frequency spectrum Transform (ALOFT) that can perturb local texture features while preserving global structure features, thus enabling the filter to remove structure-irrelevant information sufficiently. Extensive experiments on four benchmarks have demonstrated that our method can achieve great performance improvement with a small number of parameters compared to SOTA CNN-based DG methods. Our code is available at https://github.com/lingeringlight/ALOFT/.
    Projections of Model Spaces for Latent Graph Inference. (arXiv:2303.11754v1 [cs.LG])
    Graph Neural Networks leverage the connectivity structure of graphs as an inductive bias. Latent graph inference focuses on learning an adequate graph structure to diffuse information on and improve the downstream performance of the model. In this work we employ stereographic projections of the hyperbolic and spherical model spaces, as well as products of Riemannian manifolds, for the purpose of latent graph inference. Stereographically projected model spaces achieve comparable performance to their non-projected counterparts, while providing theoretical guarantees that avoid divergence of the spaces when the curvature tends to zero. We perform experiments on both homophilic and heterophilic graphs.
    Inversion by Direct Iteration: An Alternative to Denoising Diffusion for Image Restoration. (arXiv:2303.11435v1 [eess.IV])
    Inversion by Direct Iteration (InDI) is a new formulation for supervised image restoration that avoids the so-called ``regression to the mean'' effect and produces more realistic and detailed images than existing regression-based methods. It does this by gradually improving image quality in small steps, similar to generative denoising diffusion models. Image restoration is an ill-posed problem where multiple high-quality images are plausible reconstructions of a given low-quality input. Therefore, the outcome of a single step regression model is typically an aggregate of all possible explanations, therefore lacking details and realism. % The main advantage of InDI is that it does not try to predict the clean target image in a single step but instead gradually improves the image in small steps, resulting in better perceptual quality. While generative denoising diffusion models also work in small steps, our formulation is distinct in that it does not require knowledge of any analytic form of the degradation process. Instead, we directly learn an iterative restoration process from low-quality and high-quality paired examples. InDI can be applied to virtually any image degradation, given paired training data. In conditional denoising diffusion image restoration the denoising network generates the restored image by repeatedly denoising an initial image of pure noise, conditioned on the degraded input. Contrary to conditional denoising formulations, InDI directly proceeds by iteratively restoring the input low-quality image, producing high-quality results on a variety of image restoration tasks, including motion and out-of-focus deblurring, super-resolution, compression artifact removal, and denoising.
    Style Miner: Find Significant and Stable Explanatory Factors in Time Series with Constrained Reinforcement Learning. (arXiv:2303.11716v1 [cs.LG])
    In high-dimensional time-series analysis, it is essential to have a set of key factors (namely, the style factors) that explain the change of the observed variable. For example, volatility modeling in finance relies on a set of risk factors, and climate change studies in climatology rely on a set of causal factors. The ideal low-dimensional style factors should balance significance (with high explanatory power) and stability (consistent, no significant fluctuations). However, previous supervised and unsupervised feature extraction methods can hardly address the tradeoff. In this paper, we propose Style Miner, a reinforcement learning method to generate style factors. We first formulate the problem as a Constrained Markov Decision Process with explanatory power as the return and stability as the constraint. Then, we design fine-grained immediate rewards and costs and use a Lagrangian heuristic to balance them adaptively. Experiments on real-world financial data sets show that Style Miner outperforms existing learning-based methods by a large margin and achieves a relatively 10% gain in R-squared explanatory power compared to the industry-renowned factors proposed by human experts.
    Lidar Line Selection with Spatially-Aware Shapley Value for Cost-Efficient Depth Completion. (arXiv:2303.11720v1 [cs.LG])
    Lidar is a vital sensor for estimating the depth of a scene. Typical spinning lidars emit pulses arranged in several horizontal lines and the monetary cost of the sensor increases with the number of these lines. In this work, we present the new problem of optimizing the positioning of lidar lines to find the most effective configuration for the depth completion task. We propose a solution to reduce the number of lines while retaining the up-to-the-mark quality of depth completion. Our method consists of two components, (1) line selection based on the marginal contribution of a line computed via the Shapley value and (2) incorporating line position spread to take into account its need to arrive at image-wide depth completion. Spatially-aware Shapley values (SaS) succeed in selecting line subsets that yield a depth accuracy comparable to the full lidar input while using just half of the lines.
    Recommendation Systems in Libraries: an Application with Heterogeneous Data Sources. (arXiv:2303.11746v1 [cs.IR])
    The Reading&Machine project exploits the support of digitalization to increase the attractiveness of libraries and improve the users' experience. The project implements an application that helps the users in their decision-making process, providing recommendation system (RecSys)-generated lists of books the users might be interested in, and showing them through an interactive Virtual Reality (VR)-based Graphical User Interface (GUI). In this paper, we focus on the design and testing of the recommendation system, employing data about all users' loans over the past 9 years from the network of libraries located in Turin, Italy. In addition, we use data collected by the Anobii online social community of readers, who share their feedback and additional information about books they read. Armed with this heterogeneous data, we build and evaluate Content Based (CB) and Collaborative Filtering (CF) approaches. Our results show that the CF outperforms the CB approach, improving by up to 47\% the relevant recommendations provided to a reader. However, the performance of the CB approach is heavily dependent on the number of books the reader has already read, and it can work even better than CF for users with a large history. Finally, our evaluations highlight that the performances of both approaches are significantly improved if the system integrates and leverages the information from the Anobii dataset, which allows us to include more user readings (for CF) and richer book metadata (for CB).
    Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale. (arXiv:2303.11369v1 [cs.LG])
    In this paper, we address the following problem: Given an offline demonstration dataset from an imperfect expert, what is the best way to leverage it to bootstrap online learning performance in MDPs. We first propose an Informed Posterior Sampling-based RL (iPSRL) algorithm that uses the offline dataset, and information about the expert's behavioral policy used to generate the offline dataset. Its cumulative Bayesian regret goes down to zero exponentially fast in N, the offline dataset size if the expert is competent enough. Since this algorithm is computationally impractical, we then propose the iRLSVI algorithm that can be seen as a combination of the RLSVI algorithm for online RL, and imitation learning. Our empirical results show that the proposed iRLSVI algorithm is able to achieve significant reduction in regret as compared to two baselines: no offline data, and offline dataset but used without information about the generative policy. Our algorithm bridges online RL and imitation learning for the first time.
    Data Augmentation For Label Enhancement. (arXiv:2303.11698v1 [cs.LG])
    Label distribution (LD) uses the description degree to describe instances, which provides more fine-grained supervision information when learning with label ambiguity. Nevertheless, LD is unavailable in many real-world applications. To obtain LD, label enhancement (LE) has emerged to recover LD from logical label. Existing LE approach have the following problems: (\textbf{i}) They use logical label to train mappings to LD, but the supervision information is too loose, which can lead to inaccurate model prediction; (\textbf{ii}) They ignore feature redundancy and use the collected features directly. To solve (\textbf{i}), we use the topology of the feature space to generate more accurate label-confidence. To solve (\textbf{ii}), we proposed a novel supervised LE dimensionality reduction approach, which projects the original data into a lower dimensional feature space. Combining the above two, we obtain the augmented data for LE. Further, we proposed a novel nonlinear LE model based on the label-confidence and reduced features. Extensive experiments on 12 real-world datasets are conducted and the results show that our method consistently outperforms the other five comparing approaches.
    Linking generative semi-supervised learning and generative open-set recognition. (arXiv:2303.11702v1 [cs.CV])
    This study investigates the relationship between semi-supervised learning (SSL) and open-set recognition (OSR) in the context of generative adversarial networks (GANs). Although no previous study has formally linked SSL and OSR, their respective methods share striking similarities. Specifically, SSL-GANs and OSR-GANs require generator to produce samples in the complementary space. Subsequently, by regularising networks with generated samples, both SSL and OSR classifiers generalize the open space. To demonstrate the connection between SSL and OSR, we theoretically and experimentally compare state-of-the-art SSL-GAN methods with state-of-the-art OSR-GAN methods. Our results indicate that the SSL optimised margin-GANs, which have a stronger foundation in literature, set the new standard for the combined SSL-OSR task and achieves new state-of-other art results in certain general OSR experiments. However, the OSR optimised adversarial reciprocal point (ARP)-GANs still slightly out-performed margin-GANs at other OSR experiments. This result indicates unique insights for the combined optimisation task of SSL-OSR.
    Efficient Multi-stage Inference on Tabular Data. (arXiv:2303.11580v1 [cs.LG])
    Many ML applications and products train on medium amounts of input data but get bottlenecked in real-time inference. When implementing ML systems, conventional wisdom favors segregating ML code into services queried by product code via Remote Procedure Call (RPC) APIs. This approach clarifies the overall software architecture and simplifies product code by abstracting away ML internals. However, the separation adds network latency and entails additional CPU overhead. Hence, we simplify inference algorithms and embed them into the product code to reduce network communication. For public datasets and a high-performance real-time platform that deals with tabular data, we show that over half of the inputs are often amenable to such optimization, while the remainder can be handled by the original model. By applying our optimization with AutoML to both training and inference, we reduce inference latency by 1.3x, CPU resources by 30%, and network communication between application front-end and ML back-end by about 50% for a commercial end-to-end ML platform that serves millions of real-time decisions per second.
    DIPPM: a Deep Learning Inference Performance Predictive Model using Graph Neural Networks. (arXiv:2303.11733v1 [cs.PF])
    Deep Learning (DL) has developed to become a corner-stone in many everyday applications that we are now relying on. However, making sure that the DL model uses the underlying hardware efficiently takes a lot of effort. Knowledge about inference characteristics can help to find the right match so that enough resources are given to the model, but not too much. We have developed a DL Inference Performance Predictive Model (DIPPM) that predicts the inference latency, energy, and memory usage of a given input DL model on the NVIDIA A100 GPU. We also devised an algorithm to suggest the appropriate A100 Multi-Instance GPU profile from the output of DIPPM. We developed a methodology to convert DL models expressed in multiple frameworks to a generalized graph structure that is used in DIPPM. It means DIPPM can parse input DL models from various frameworks. Our DIPPM can be used not only helps to find suitable hardware configurations but also helps to perform rapid design-space exploration for the inference performance of a model. We constructed a graph multi-regression dataset consisting of 10,508 different DL models to train and evaluate the performance of DIPPM, and reached a resulting Mean Absolute Percentage Error (MAPE) as low as 1.9%.
    Dens-PU: PU Learning with Density-Based Positive Labeled Augmentation. (arXiv:2303.11848v1 [cs.LG])
    This study proposes a novel approach for solving the PU learning problem based on an anomaly-detection strategy. Latent encodings extracted from positive-labeled data are linearly combined to acquire new samples. These new samples are used as embeddings to increase the density of positive-labeled data and, thus, define a boundary that approximates the positive class. The further a sample is from the boundary the more it is considered as a negative sample. Once a set of negative samples is obtained, the PU learning problem reduces to binary classification. The approach, named Dens-PU due to its reliance on the density of positive-labeled data, was evaluated using benchmark image datasets, and state- of-the-art results were attained.
    How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer. (arXiv:2303.11454v1 [cs.LG])
    Randomized neural networks (randomized NNs), where only the terminal layer's weights are optimized constitute a powerful model class to reduce computational time in training the neural network model. At the same time, these models generalize surprisingly well in various regression and classification tasks. In this paper, we give an exact macroscopic characterization (i.e., a characterization in function space) of the generalization behavior of randomized, shallow NNs with ReLU activation (RSNs). We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered: the infinite generalized additive model (IGAM). The IGAM is formalized as solution to an optimization problem in function space for a specific regularization functional and a fairly general loss. This work is an extension to multivariate NNs of prior work, where we showed how wide RSNs with ReLU activation behave like spline regression under certain conditions and if the input is one-dimensional.
    Reasonable Scale Machine Learning with Open-Source Metaflow. (arXiv:2303.11761v1 [cs.LG])
    As Machine Learning (ML) gains adoption across industries and new use cases, practitioners increasingly realize the challenges around effectively developing and iterating on ML systems: reproducibility, debugging, scalability, and documentation are elusive goals for real-world pipelines outside tech-first companies. In this paper, we review the nature of ML-oriented workloads and argue that re-purposing existing tools won't solve the current productivity issues, as ML peculiarities warrant specialized development tooling. We then introduce Metaflow, an open-source framework for ML projects explicitly designed to boost the productivity of data practitioners by abstracting away the execution of ML code from the definition of the business logic. We show how our design addresses the main challenges in ML operations (MLOps), and document through examples, interviews and use cases its practical impact on the field.
    Lamarr: LHCb ultra-fast simulation based on machine learning models deployed within Gauss. (arXiv:2303.11428v1 [hep-ex])
    About 90% of the computing resources available to the LHCb experiment has been spent to produce simulated data samples for Run 2 of the Large Hadron Collider at CERN. The upgraded LHCb detector will be able to collect larger data samples, requiring many more simulated events to analyze the data to be collected in Run 3. Simulation is a key necessity of analysis to interpret signal vs background and measure efficiencies. The needed simulation will far exceed the pledged resources, requiring an evolution in technologies and techniques to produce these simulated data samples. In this contribution, we discuss Lamarr, a Gaudi-based framework to speed-up the simulation production parametrizing both the detector response and the reconstruction algorithms of the LHCb experiment. Deep Generative Models powered by several algorithms and strategies are employed to effectively parametrize the high-level response of the single components of the LHCb detector, encoding within neural networks the experimental errors and uncertainties introduced in the detection and reconstruction phases. Where possible, models are trained directly on real data, statistically subtracting any background components through weights application. Embedding Lamarr in the general LHCb Gauss Simulation framework allows to combine its execution with any of the available generators in a seamless way. The resulting software package enables a simulation process completely independent of the Detailed Simulation used to date.
    The Threat of Adversarial Attacks on Machine Learning in Network Security - A Survey. (arXiv:1911.02621v3 [cs.CR] UPDATED)
    Machine learning models have made many decision support systems to be faster, more accurate, and more efficient. However, applications of machine learning in network security face a more disproportionate threat of active adversarial attacks compared to other domains. This is because machine learning applications in network security such as malware detection, intrusion detection, and spam filtering are by themselves adversarial in nature. In what could be considered an arm's race between attackers and defenders, adversaries constantly probe machine learning systems with inputs that are explicitly designed to bypass the system and induce a wrong prediction. In this survey, we first provide a taxonomy of machine learning techniques, tasks, and depth. We then introduce a classification of machine learning in network security applications. Next, we examine various adversarial attacks against machine learning in network security and introduce two classification approaches for adversarial attacks in network security. First, we classify adversarial attacks in network security based on a taxonomy of network security applications. Secondly, we categorize adversarial attacks in network security into a problem space vs feature space dimensional classification model. We then analyze the various defenses against adversarial attacks on machine learning-based network security applications. We conclude by introducing an adversarial risk grid map and evaluating several existing adversarial attacks against machine learning in network security using the risk grid map. We also identify where each attack classification resides within the adversarial risk grid map.
    Dynamic-Aware Loss for Learning with Label Noise. (arXiv:2303.11562v1 [cs.LG])
    Label noise poses a serious threat to deep neural networks (DNNs). Employing robust loss function which reconciles fitting ability with robustness is a simple but effective strategy to handle this problem. However, the widely-used static trade-off between these two factors contradicts the dynamic nature of DNNs learning with label noise, leading to inferior performance. Therefore, we propose a dynamics-aware loss (DAL) to solve this problem. Considering that DNNs tend to first learn generalized patterns, then gradually overfit label noise, DAL strengthens the fitting ability initially, then gradually increases the weight of robustness. Moreover, at the later stage, we let DNNs put more emphasis on easy examples which are more likely to be correctly labeled than hard ones and introduce a bootstrapping term to further reduce the negative impact of label noise. Both the detailed theoretical analyses and extensive experimental results demonstrate the superiority of our method.
    End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations. (arXiv:2303.12002v1 [eess.AS])
    Recent works show that speech separation guided diarization (SSGD) is an increasingly promising direction, mainly thanks to the recent progress in speech separation. It performs diarization by first separating the speakers and then applying voice activity detection (VAD) on each separated stream. In this work we conduct an in-depth study of SSGD in the conversational telephone speech (CTS) domain, focusing mainly on low-latency streaming diarization applications. We consider three state-of-the-art speech separation (SSep) algorithms and study their performance both in online and offline scenarios, considering non-causal and causal implementations as well as continuous SSep (CSS) windowed inference. We compare different SSGD algorithms on two widely used CTS datasets: CALLHOME and Fisher Corpus (Part 1 and 2) and evaluate both separation and diarization performance. To improve performance, a novel, causal and computationally efficient leakage removal algorithm is proposed, which significantly decreases false alarms. We also explore, for the first time, fully end-to-end SSGD integration between SSep and VAD modules. Crucially, this enables fine-tuning on real-world data for which oracle speakers sources are not available. In particular, our best model achieves 8.8% DER on CALLHOME, which outperforms the current state-of-the-art end-to-end neural diarization model, despite being trained on an order of magnitude less data and having significantly lower latency, i.e., 0.1 vs. 1 seconds. Finally, we also show that the separated signals can be readily used also for automatic speech recognition, reaching performance close to using oracle sources in some configurations.
    Large Language Models and Simple, Stupid Bugs. (arXiv:2303.11455v1 [cs.SE])
    With the advent of powerful neural language models, AI-based systems to assist developers in coding tasks are becoming widely available; Copilot is one such system. Copilot uses Codex, a large language model (LLM), to complete code conditioned on a preceding "prompt". Codex, however, is trained on public GitHub repositories, viz., on code that may include bugs and vulnerabilities. Previous studies [1], [2] show Codex reproduces vulnerabilities seen in training. In this study, we examine how prone Codex is to generate an interesting bug category, single statement bugs, commonly referred to as simple, stupid bugs or SStuBs in the MSR community. We find that Codex and similar LLMs do help avoid some SStuBs, but do produce known, verbatim SStuBs as much as 2x as likely than known, verbatim correct code. We explore the consequences of the Codex generated SStuBs and propose avoidance strategies that suggest the possibility of reducing the production of known, verbatim SStubs, and increase the possibility of producing known, verbatim fixes.
    Bias mitigation techniques in image classification: fair machine learning in human heritage collections. (arXiv:2303.11449v1 [cs.CV])
    A major problem with using automated classification systems is that if they are not engineered correctly and with fairness considerations, they could be detrimental to certain populations. Furthermore, while engineers have developed cutting-edge technologies for image classification, there is still a gap in the application of these models in human heritage collections, where data sets usually consist of low-quality pictures of people with diverse ethnicity, gender, and age. In this work, we evaluate three bias mitigation techniques using two state-of-the-art neural networks, Xception and EfficientNet, for gender classification. Moreover, we explore the use of transfer learning using a fair data set to overcome the training data scarcity. We evaluated the effectiveness of the bias mitigation pipeline on a cultural heritage collection of photographs from the 19th and 20th centuries, and we used the FairFace data set for the transfer learning experiments. After the evaluation, we found that transfer learning is a good technique that allows better performance when working with a small data set. Moreover, the fairest classifier was found to be accomplished using transfer learning, threshold change, re-weighting and image augmentation as bias mitigation methods.
    An Embarrassingly Simple Approach for Wafer Feature Extraction and Defect Pattern Recognition. (arXiv:2303.11632v1 [cs.CV])
    Identifying defect patterns in a wafer map during manufacturing is crucial to find the root cause of the underlying issue and provides valuable insights on improving yield in the foundry. Currently used methods use deep neural networks to identify the defects. These methods are generally very huge and have significant inference time. They also require GPU support to efficiently operate. All these issues make these models not fit for on-line prediction in the manufacturing foundry. In this paper, we propose an extremely simple yet effective technique to extract features from wafer images. The proposed method is extremely fast, intuitive, and non-parametric while being explainable. The experiment results show that the proposed pipeline outperforms conventional deep learning models. Our feature extraction requires no training or fine-tuning while preserving the relative shape and location of data points as revealed by our interpretability analysis.
    Online Learning for Equilibrium Pricing in Markets under Incomplete Information. (arXiv:2303.11522v1 [cs.GT])
    The study of market equilibria is central to economic theory, particularly in efficiently allocating scarce resources. However, the computation of equilibrium prices at which the supply of goods matches their demand typically relies on having access to complete information on private attributes of agents, e.g., suppliers' cost functions, which are often unavailable in practice. Motivated by this practical consideration, we consider the problem of setting equilibrium prices in the incomplete information setting wherein a market operator seeks to satisfy the customer demand for a commodity by purchasing the required amount from competing suppliers with privately known cost functions unknown to the market operator. In this incomplete information setting, we consider the online learning problem of learning equilibrium prices over time while jointly optimizing three performance metrics -- unmet demand, cost regret, and payment regret -- pertinent in the context of equilibrium pricing over a horizon of $T$ periods. We first consider the setting when suppliers' cost functions are fixed and develop algorithms that achieve a regret of $O(\log \log T)$ when the customer demand is constant over time, or $O(\sqrt{T} \log \log T)$ when the demand is variable over time. Next, we consider the setting when the suppliers' cost functions can vary over time and illustrate that no online algorithm can achieve sublinear regret on all three metrics when the market operator has no information about how the cost functions change over time. Thus, we consider an augmented setting wherein the operator has access to hints/contexts that, without revealing the complete specification of the cost functions, reflect the variation in the cost functions over time and propose an algorithm with sublinear regret in this augmented setting.
    Feature-adjacent multi-fidelity physics-informed machine learning for partial differential equations. (arXiv:2303.11577v1 [cs.LG])
    Physics-informed neural networks have emerged as an alternative method for solving partial differential equations. However, for complex problems, the training of such networks can still require high-fidelity data which can be expensive to generate. To reduce or even eliminate the dependency on high-fidelity data, we propose a novel multi-fidelity architecture which is based on a feature space shared by the low- and high-fidelity solutions. In the feature space, the projections of the low-fidelity and high-fidelity solutions are adjacent by constraining their relative distance. The feature space is represented with an encoder and its mapping to the original solution space is effected through a decoder. The proposed multi-fidelity approach is validated on forward and inverse problems for steady and unsteady problems described by partial differential equations.
    Materials Discovery with Extreme Properties via AI-Driven Combinatorial Chemistry. (arXiv:2303.11833v1 [q-bio.BM])
    The goal of most materials discovery is to discover materials that are superior to those currently known. Fundamentally, this is close to extrapolation, which is a weak point for most machine learning models that learn the probability distribution of data. Herein, we develop AI-driven combinatorial chemistry, which is a rule-based inverse molecular designer that does not rely on data. Since our model has the potential to generate all possible molecular structures that can be obtained from combinations of molecular fragments, unknown materials with superior properties can be discovered. We theoretically and empirically demonstrate that our model is more suitable for discovering better materials than probability distribution-learning models. In an experiment aimed at discovering molecules that hit seven target properties, our model discovered 1,315 of all target-hitting molecules and 7,629 of five target-hitting molecules out of 100,000 trials, whereas the probability distribution-learning models failed. To illustrate the performance in actual problems, we also demonstrate that our models work well on two practical applications: discovering protein docking materials and HIV inhibitors.
    Dynamic Healthcare Embeddings for Improving Patient Care. (arXiv:2303.11563v1 [cs.LG])
    As hospitals move towards automating and integrating their computing systems, more fine-grained hospital operations data are becoming available. These data include hospital architectural drawings, logs of interactions between patients and healthcare professionals, prescription data, procedures data, and data on patient admission, discharge, and transfers. This has opened up many fascinating avenues for healthcare-related prediction tasks for improving patient care. However, in order to leverage off-the-shelf machine learning software for these tasks, one needs to learn structured representations of entities involved from heterogeneous, dynamic data streams. Here, we propose DECENT, an auto-encoding heterogeneous co-evolving dynamic neural network, for learning heterogeneous dynamic embeddings of patients, doctors, rooms, and medications from diverse data streams. These embeddings capture similarities among doctors, rooms, patients, and medications based on static attributes and dynamic interactions. DECENT enables several applications in healthcare prediction, such as predicting mortality risk and case severity of patients, adverse events (e.g., transfer back into an intensive care unit), and future healthcare-associated infections. The results of using the learned patient embeddings in predictive modeling show that DECENT has a gain of up to 48.1% on the mortality risk prediction task, 12.6% on the case severity prediction task, 6.4% on the medical intensive care unit transfer task, and 3.8% on the Clostridioides difficile (C.diff) Infection (CDI) prediction task over the state-of-the-art baselines. In addition, case studies on the learned doctor, medication, and room embeddings show that our approach learns meaningful and interpretable embeddings.
    A High-Frequency Focused Network for Lightweight Single Image Super-Resolution. (arXiv:2303.11701v1 [eess.IV])
    Lightweight neural networks for single-image super-resolution (SISR) tasks have made substantial breakthroughs in recent years. Compared to low-frequency information, high-frequency detail is much more difficult to reconstruct. Most SISR models allocate equal computational resources for low-frequency and high-frequency information, which leads to redundant processing of simple low-frequency information and inadequate recovery of more challenging high-frequency information. We propose a novel High-Frequency Focused Network (HFFN) through High-Frequency Focused Blocks (HFFBs) that selectively enhance high-frequency information while minimizing redundant feature computation of low-frequency information. The HFFB effectively allocates more computational resources to the more challenging reconstruction of high-frequency information. Moreover, we propose a Local Feature Fusion Block (LFFB) effectively fuses features from multiple HFFBs in a local region, utilizing complementary information across layers to enhance feature representativeness and reduce artifacts in reconstructed images. We assess the efficacy of our proposed HFFN on five benchmark datasets and show that it significantly enhances the super-resolution performance of the network. Our experimental results demonstrate state-of-the-art performance in reconstructing high-frequency information while using a low number of parameters.
    MSTFormer: Motion Inspired Spatial-temporal Transformer with Dynamic-aware Attention for long-term Vessel Trajectory Prediction. (arXiv:2303.11540v1 [cs.LG])
    Incorporating the dynamics knowledge into the model is critical for achieving accurate trajectory prediction while considering the spatial and temporal characteristics of the vessel. However, existing methods rarely consider the underlying dynamics knowledge and directly use machine learning algorithms to predict the trajectories. Intuitively, the vessel's motions are following the laws of dynamics, e.g., the speed of a vessel decreases when turning a corner. Yet, it is challenging to combine dynamic knowledge and neural networks due to their inherent heterogeneity. Against this background, we propose MSTFormer, a motion inspired vessel trajectory prediction method based on Transformer. The contribution of this work is threefold. First, we design a data augmentation method to describe the spatial features and motion features of the trajectory. Second, we propose a Multi-headed Dynamic-aware Self-attention mechanism to focus on trajectory points with frequent motion transformations. Finally, we construct a knowledge-inspired loss function to further boost the performance of the model. Experimental results on real-world datasets show that our strategy not only effectively improves long-term predictive capability but also outperforms backbones on cornering data.The ablation analysis further confirms the efficacy of the proposed method. To the best of our knowledge, MSTFormer is the first neural network model for trajectory prediction fused with vessel motion dynamics, providing a worthwhile direction for future research.The source code is available at https://github.com/simple316/MSTFormer.
    Deep Q-Network Based Decision Making for Autonomous Driving. (arXiv:2303.11634v1 [cs.RO])
    Currently decision making is one of the biggest challenges in autonomous driving. This paper introduces a method for safely navigating an autonomous vehicle in highway scenarios by combining deep Q-Networks and insight from control theory. A Deep Q-Network is trained in simulation to serve as a central decision-making unit by proposing targets for a trajectory planner. The generated trajectories in combination with a controller for longitudinal movement are used to execute lane change maneuvers. In order to prove the functionality of this approach it is evaluated on two different highway traffic scenarios. Furthermore, the impact of different state representations on the performance and training process is analyzed. The results show that the proposed system can produce efficient and safe driving behavior.
    Learning and controlling the source-filter representation of speech with a variational autoencoder. (arXiv:2204.07075v3 [cs.SD] UPDATED)
    Understanding and controlling latent representations in deep generative models is a challenging yet important problem for analyzing, transforming and generating various types of data. In speech processing, inspiring from the anatomical mechanisms of phonation, the source-filter model considers that speech signals are produced from a few independent and physically meaningful continuous latent factors, among which the fundamental frequency $f_0$ and the formants are of primary importance. In this work, we start from a variational autoencoder (VAE) trained in an unsupervised manner on a large dataset of unlabeled natural speech signals, and we show that the source-filter model of speech production naturally arises as orthogonal subspaces of the VAE latent space. Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we propose a method to identify the latent subspaces encoding $f_0$ and the first three formant frequencies, we show that these subspaces are orthogonal, and based on this orthogonality, we develop a method to accurately and independently control the source-filter speech factors within the latent subspaces. Without requiring additional information such as text or human-labeled data, this results in a deep generative model of speech spectrograms that is conditioned on $f_0$ and the formant frequencies, and which is applied to the transformation speech signals. Finally, we also propose a robust $f_0$ estimation method that exploits the projection of a speech signal onto the learned latent subspace associated with $f_0$.
    Fairness-Aware Graph Filter Design. (arXiv:2303.11459v1 [cs.LG])
    Graphs are mathematical tools that can be used to represent complex real-world systems, such as financial markets and social networks. Hence, machine learning (ML) over graphs has attracted significant attention recently. However, it has been demonstrated that ML over graphs amplifies the already existing bias towards certain under-represented groups in various decision-making problems due to the information aggregation over biased graph structures. Faced with this challenge, in this paper, we design a fair graph filter that can be employed in a versatile manner for graph-based learning tasks. The design of the proposed filter is based on a bias analysis and its optimality in mitigating bias compared to its fairness-agnostic counterpart is established. Experiments on real-world networks for node classification demonstrate the efficacy of the proposed filter design in mitigating bias, while attaining similar utility and better stability compared to baseline algorithms.
    ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using Transformers. (arXiv:2303.11551v1 [cs.CV])
    Lack of audio-video synchronization is a common problem during television broadcasts and video conferencing, leading to an unsatisfactory viewing experience. A widely accepted paradigm is to create an error detection mechanism that identifies the cases when audio is leading or lagging. We propose ModEFormer, which independently extracts audio and video embeddings using modality-specific transformers. Different from the other transformer-based approaches, ModEFormer preserves the modality of the input streams which allows us to use a larger batch size with more negative audio samples for contrastive learning. Further, we propose a trade-off between the number of negative samples and number of unique samples in a batch to significantly exceed the performance of previous methods. Experimental results show that ModEFormer achieves state-of-the-art performance, 94.5% for LRS2 and 90.9% for LRS3. Finally, we demonstrate how ModEFormer can be used for offset detection for test clips.
    Heart Murmur and Abnormal PCG Detection via Wavelet Scattering Transform & a 1D-CNN. (arXiv:2303.11423v1 [eess.SP])
    This work leverages deep learning (DL) techniques in order to do automatic and accurate heart murmur detection from phonocardiogram (PCG) recordings. Two public PCG datasets (CirCor Digiscope 2022 dataset and PCG 2016 dataset) from Physionet online database are utilized to train and test three custom neural networks (NN): a 1D convolutional neural network (CNN), a long short-term memory (LSTM) recurrent neural network (RNN), and a convolutional RNN (C-RNN). Under our proposed method, we first do pre-processing on both datasets in order to prepare the data for the NNs. Key pre-processing steps include the following: denoising, segmentation, re-labeling of noise-only segments, data normalization, and time-frequency analysis of the PCG segments using wavelet scattering transform. To evaluate the performance of the three NNs we have implemented, we conduct four experiments, first three using PCG 2022 dataset, and fourth using PCG 2016 dataset. It turns out that our custom 1D-CNN outperforms other two NNs (LSTM- RNN and C-RNN) as well as the state-of-the-art. Specifically, for experiment E1 (murmur detection using original PCG 2022 dataset), our 1D-CNN model achieves an accuracy of 82.28%, weighted accuracy of 83.81%, F1-score of 65.79%, and and area under receive operating charactertic (AUROC) curve of 90.79%. For experiment E2 (mumur detection using PCG 2022 dataset with unknown class removed), our 1D-CNN model achieves an accuracy of 87.05%, F1-score of 87.72%, and AUROC of 94.4%. For experiment E3 (murmur detection using PCG 2022 dataset with re-labeling of segments), our 1D-CNN model achieves an accuracy of 82.86%, weighted accuracy of 86.30%, F1-score of 81.87%, and AUROC of 93.45%. For experiment E4 (abnormal PCG detection using PCG 2016 dataset), our 1D-CNN model achieves an accuracy of 96.30%, F1-score of 96.29% and AUROC of 98.17%.
    STDLens: Model Hijacking-resilient Federated Learning for Object Detection. (arXiv:2303.11511v1 [cs.CR])
    Federated Learning (FL) has been gaining popularity as a collaborative learning framework to train deep learning-based object detection models over a distributed population of clients. Despite its advantages, FL is vulnerable to model hijacking. The attacker can control how the object detection system should misbehave by implanting Trojaned gradients using only a small number of compromised clients in the collaborative learning process. This paper introduces STDLens, a principled approach to safeguarding FL against such attacks. We first investigate existing mitigation mechanisms and analyze their failures caused by the inherent errors in spatial clustering analysis on gradients. Based on the insights, we introduce a three-tier forensic framework to identify and expel Trojaned gradients and reclaim the performance over the course of FL. We consider three types of adaptive attacks and demonstrate the robustness of STDLens against advanced adversaries. Extensive experiments show that STDLens can protect FL against different model hijacking attacks and outperform existing methods in identifying and removing Trojaned gradients with significantly higher precision and much lower false-positive rates.
    Recursive Euclidean Distance Based Robust Aggregation Technique For Federated Learning. (arXiv:2303.11337v1 [cs.LG])
    Federated learning has gained popularity as a solution to data availability and privacy challenges in machine learning. However, the aggregation process of local model updates to obtain a global model in federated learning is susceptible to malicious attacks, such as backdoor poisoning, label-flipping, and membership inference. Malicious users aim to sabotage the collaborative learning process by training the local model with malicious data. In this paper, we propose a novel robust aggregation approach based on recursive Euclidean distance calculation. Our approach measures the distance of the local models from the previous global model and assigns weights accordingly. Local models far away from the global model are assigned smaller weights to minimize the data poisoning effect during aggregation. Our experiments demonstrate that the proposed algorithm outperforms state-of-the-art algorithms by at least $5\%$ in accuracy while reducing time complexity by less than $55\%$. Our contribution is significant as it addresses the critical issue of malicious attacks in federated learning while improving the accuracy of the global model.  ( 2 min )
    Universal Smoothed Score Functions for Generative Modeling. (arXiv:2303.11669v1 [stat.ML])
    We consider the problem of generative modeling based on smoothing an unknown density of interest in $\mathbb{R}^d$ using factorial kernels with $M$ independent Gaussian channels with equal noise levels introduced by Saremi and Srivastava (2022). First, we fully characterize the time complexity of learning the resulting smoothed density in $\mathbb{R}^{Md}$, called M-density, by deriving a universal form for its parametrization in which the score function is by construction permutation equivariant. Next, we study the time complexity of sampling an M-density by analyzing its condition number for Gaussian distributions. This spectral analysis gives a geometric insight on the "shape" of M-densities as one increases $M$. Finally, we present results on the sample quality in this class of generative models on the CIFAR-10 dataset where we report Fr\'echet inception distances (14.15), notably obtained with a single noise level on long-run fast-mixing MCMC chains.
    Task-based Generation of Optimized Projection Sets using Differentiable Ranking. (arXiv:2303.11724v1 [cs.CV])
    We present a method for selecting valuable projections in computed tomography (CT) scans to enhance image reconstruction and diagnosis. The approach integrates two important factors, projection-based detectability and data completeness, into a single feed-forward neural network. The network evaluates the value of projections, processes them through a differentiable ranking function and makes the final selection using a straight-through estimator. Data completeness is ensured through the label provided during training. The approach eliminates the need for heuristically enforcing data completeness, which may exclude valuable projections. The method is evaluated on simulated data in a non-destructive testing scenario, where the aim is to maximize the reconstruction quality within a specified region of interest. We achieve comparable results to previous methods, laying the foundation for using reconstruction-based loss functions to learn the selection of projections.
    Adaptive Experimentation at Scale: Bayesian Algorithms for Flexible Batches. (arXiv:2303.11582v1 [cs.LG])
    Standard bandit algorithms that assume continual reallocation of measurement effort are challenging to implement due to delayed feedback and infrastructural/organizational difficulties. Motivated by practical instances involving a handful of reallocation epochs in which outcomes are measured in batches, we develop a new adaptive experimentation framework that can flexibly handle any batch size. Our main observation is that normal approximations universal in statistical inference can also guide the design of scalable adaptive designs. By deriving an asymptotic sequential experiment, we formulate a dynamic program that can leverage prior information on average rewards. State transitions of the dynamic program are differentiable with respect to the sampling allocations, allowing the use of gradient-based methods for planning and policy optimization. We propose a simple iterative planning method, Residual Horizon Optimization, which selects sampling allocations by optimizing a planning objective via stochastic gradient-based methods. Our method significantly improves statistical power over standard adaptive policies, even when compared to Bayesian bandit algorithms (e.g., Thompson sampling) that require full distributional knowledge of individual rewards. Overall, we expand the scope of adaptive experimentation to settings which are difficult for standard adaptive policies, including problems with a small number of reallocation epochs, low signal-to-noise ratio, and unknown reward distributions.
    Targeted Analysis of High-Risk States Using an Oriented Variational Autoencoder. (arXiv:2303.11410v1 [eess.SY])
    Variational autoencoder (VAE) neural networks can be trained to generate power system states that capture both marginal distribution and multivariate dependencies of historical data. The coordinates of the latent space codes of VAEs have been shown to correlate with conceptual features of the data, which can be leveraged to synthesize targeted data with desired features. However, the locations of the VAEs' latent space codes that correspond to specific properties are not constrained. Additionally, the generation of data with specific characteristics may require data with corresponding hard-to-get labels fed into the generative model for training. In this paper, to make data generation more controllable and efficient, an oriented variation autoencoder (OVAE) is proposed to constrain the link between latent space code and generated data in the form of a Spearman correlation, which provides increased control over the data synthesis process. On this basis, an importance sampling process is used to sample data in the latent space. Two cases are considered for testing the performance of the OVAE model: the data set is fully labeled with approximate information and the data set is incompletely labeled but with more accurate information. The experimental results show that, in both cases, the OVAE model correlates latent space codes with the generated data, and the efficiency of generating targeted samples is significantly improved.
    Automatic Measures for Evaluating Generative Design Methods for Architects. (arXiv:2303.11483v1 [cs.CV])
    The recent explosion of high-quality image-to-image methods has prompted interest in applying image-to-image methods towards artistic and design tasks. Of interest for architects is to use these methods to generate design proposals from conceptual sketches, usually hand-drawn sketches that are quickly developed and can embody a design intent. More specifically, instantiating a sketch into a visual that can be used to elicit client feedback is typically a time consuming task, and being able to speed up this iteration time is important. While the body of work in generative methods has been impressive, there has been a mismatch between the quality measures used to evaluate the outputs of these systems and the actual expectations of architects. In particular, most recent image-based works place an emphasis on realism of generated images. While important, this is one of several criteria architects look for. In this work, we describe the expectations architects have for design proposals from conceptual sketches, and identify corresponding automated metrics from the literature. We then evaluate several image-to-image generative methods that may address these criteria and examine their performance across these metrics. From these results, we identify certain challenges with hand-drawn conceptual sketches and describe possible future avenues of investigation to address them.
    Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand. (arXiv:2211.14983v2 [cs.MA] UPDATED)
    We derive a learning framework to generate routing/pickup policies for a fleet of autonomous vehicles tasked with servicing stochastically appearing requests on a city map. We focus on policies that 1) give rise to coordination amongst the vehicles, thereby reducing wait times for servicing requests, 2) are non-myopic, and consider a-priori potential future requests, 3) can adapt to changes in the underlying demand distribution. Specifically, we are interested in policies that are adaptive to fluctuations of actual demand conditions in urban environments, such as on-peak vs. off-peak hours. We achieve this through a combination of (i) an online play algorithm that improves the performance of an offline-trained policy, and (ii) an offline approximation scheme that allows for adapting to changes in the underlying demand model. In particular, we achieve adaptivity of our learned policy to different demand distributions by quantifying a region of validity using the q-valid radius of a Wasserstein Ambiguity Set. We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region. In this case, we propose to use an offline architecture, trained on a historical demand model that is closer to the current demand in terms of Wasserstein distance. We learn routing and pickup policies over real taxicab requests in San Francisco with high variability between on-peak and off-peak hours, demonstrating the ability of our method to adapt to real fluctuation in demand distributions. Our numerical results demonstrate that our method outperforms alternative rollout-based reinforcement learning schemes, as well as other classical methods from operations research.  ( 3 min )
    Solving High-Dimensional Inverse Problems with Auxiliary Uncertainty via Operator Learning with Limited Data. (arXiv:2303.11379v1 [stat.ML])
    In complex large-scale systems such as climate, important effects are caused by a combination of confounding processes that are not fully observable. The identification of sources from observations of system state is vital for attribution and prediction, which inform critical policy decisions. The difficulty of these types of inverse problems lies in the inability to isolate sources and the cost of simulating computational models. Surrogate models may enable the many-query algorithms required for source identification, but data challenges arise from high dimensionality of the state and source, limited ensembles of costly model simulations to train a surrogate model, and few and potentially noisy state observations for inversion due to measurement limitations. The influence of auxiliary processes adds an additional layer of uncertainty that further confounds source identification. We introduce a framework based on (1) calibrating deep neural network surrogates to the flow maps provided by an ensemble of simulations obtained by varying sources, and (2) using these surrogates in a Bayesian framework to identify sources from observations via optimization. Focusing on an atmospheric dispersion exemplar, we find that the expressive and computationally efficient nature of the deep neural network operator surrogates in appropriately reduced dimension allows for source identification with uncertainty quantification using limited data. Introducing a variable wind field as an auxiliary process, we find that a Bayesian approximation error approach is essential for reliable source inversion when uncertainty due to wind stresses the algorithm.  ( 2 min )
    What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring. (arXiv:2303.11341v1 [cs.LG])
    As advanced machine learning systems' capabilities begin to play a significant role in geopolitics and societal order, it may become imperative that (1) governments be able to enforce rules on the development of advanced ML systems within their borders, and (2) countries be able to verify each other's compliance with potential future international agreements on advanced ML development. This work analyzes one mechanism to achieve this, by monitoring the computing hardware used for large-scale NN training. The framework's primary goal is to provide governments high confidence that no actor uses large quantities of specialized ML chips to execute a training run in violation of agreed rules. At the same time, the system does not curtail the use of consumer computing devices, and maintains the privacy and confidentiality of ML practitioners' models, data, and hyperparameters. The system consists of interventions at three stages: (1) using on-chip firmware to occasionally save snapshots of the the neural network weights stored in device memory, in a form that an inspector could later retrieve; (2) saving sufficient information about each training run to prove to inspectors the details of the training run that had resulted in the snapshotted weights; and (3) monitoring the chip supply chain to ensure that no actor can avoid discovery by amassing a large quantity of un-tracked chips. The proposed design decomposes the ML training rule verification problem into a series of narrow technical challenges, including a new variant of the Proof-of-Learning problem [Jia et al. '21].  ( 3 min )
    Indeterminate Probability Neural Network. (arXiv:2303.11536v1 [cs.LG])
    We propose a new general model called IPNN - Indeterminate Probability Neural Network, which combines neural network and probability theory together. In the classical probability theory, the calculation of probability is based on the occurrence of events, which is hardly used in current neural networks. In this paper, we propose a new general probability theory, which is an extension of classical probability theory, and makes classical probability theory a special case to our theory. Besides, for our proposed neural network framework, the output of neural network is defined as probability events, and based on the statistical analysis of these events, the inference model for classification task is deduced. IPNN shows new property: It can perform unsupervised clustering while doing classification. Besides, IPNN is capable of making very large classification with very small neural network, e.g. model with 100 output nodes can classify 10 billion categories. Theoretical advantages are reflected in experimental results.
    Convergence of stochastic gradient descent on parameterized sphere with applications to variational Monte Carlo simulation. (arXiv:2303.11602v1 [cs.LG])
    We analyze stochastic gradient descent (SGD) type algorithms on a high-dimensional sphere which is parameterized by a neural network up to a normalization constant. We provide a new algorithm for the setting of supervised learning and show its convergence both theoretically and numerically. We also provide the first proof of convergence for the unsupervised setting, which corresponds to the widely used variational Monte Carlo (VMC) method in quantum physics.
    Assessor-Guided Learning for Continual Environments. (arXiv:2303.11624v1 [cs.LG])
    This paper proposes an assessor-guided learning strategy for continual learning where an assessor guides the learning process of a base learner by controlling the direction and pace of the learning process thus allowing an efficient learning of new environments while protecting against the catastrophic interference problem. The assessor is trained in a meta-learning manner with a meta-objective to boost the learning process of the base learner. It performs a soft-weighting mechanism of every sample accepting positive samples while rejecting negative samples. The training objective of a base learner is to minimize a meta-weighted combination of the cross entropy loss function, the dark experience replay (DER) loss function and the knowledge distillation loss function whose interactions are controlled in such a way to attain an improved performance. A compensated over-sampling (COS) strategy is developed to overcome the class imbalanced problem of the episodic memory due to limited memory budgets. Our approach, Assessor-Guided Learning Approach (AGLA), has been evaluated in the class-incremental and task-incremental learning problems. AGLA achieves improved performances compared to its competitors while the theoretical analysis of the COS strategy is offered. Source codes of AGLA, baseline algorithms and experimental logs are shared publicly in \url{https://github.com/anwarmaxsum/AGLA} for further study.
    Tensor networks for quantum machine learning. (arXiv:2303.11735v1 [quant-ph])
    Once developed for quantum theory, tensor networks have been established as a successful machine learning paradigm. Now, they have been ported back to the quantum realm in the emerging field of quantum machine learning to assess problems that classical computers are unable to solve efficiently. Their nature at the interface between physics and machine learning makes tensor networks easily deployable on quantum computers. In this review article, we shed light on one of the major architectures considered to be predestined for variational quantum machine learning. In particular, we discuss how layouts like MPS, PEPS, TTNs and MERA can be mapped to a quantum computer, how they can be used for machine learning and data encoding and which implementation techniques improve their performance.
    Did You Train on My Dataset? Towards Public Dataset Protection with Clean-Label Backdoor Watermarking. (arXiv:2303.11470v1 [cs.CR])
    The huge supporting training data on the Internet has been a key factor in the success of deep learning models. However, this abundance of public-available data also raises concerns about the unauthorized exploitation of datasets for commercial purposes, which is forbidden by dataset licenses. In this paper, we propose a backdoor-based watermarking approach that serves as a general framework for safeguarding public-available data. By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders. This hidden function can then be used as a watermark to track down third-party models that use the dataset illegally. Unfortunately, existing backdoor insertion methods often entail adding arbitrary and mislabeled data to the training set, leading to a significant drop in performance and easy detection by anomaly detection algorithms. To overcome this challenge, we introduce a clean-label backdoor watermarking framework that uses imperceptible perturbations to replace mislabeled samples. As a result, the watermarking samples remain consistent with the original labels, making them difficult to detect. Our experiments on text, image, and audio datasets demonstrate that the proposed framework effectively safeguards datasets with minimal impact on original task performance. We also show that adding just 1% of watermarking samples can inject a traceable watermarking function and that our watermarking samples are stealthy and look benign upon visual inspection.  ( 2 min )
    Improving Content Retrievability in Search with Controllable Query Generation. (arXiv:2303.11648v1 [cs.IR])
    An important goal of online platforms is to enable content discovery, i.e. allow users to find a catalog entity they were not familiar with. A pre-requisite to discover an entity, e.g. a book, with a search engine is that the entity is retrievable, i.e. there are queries for which the system will surface such entity in the top results. However, machine-learned search engines have a high retrievability bias, where the majority of the queries return the same entities. This happens partly due to the predominance of narrow intent queries, where users create queries using the title of an already known entity, e.g. in book search 'harry potter'. The amount of broad queries where users want to discover new entities, e.g. in music search 'chill lyrical electronica with an atmospheric feeling to it', and have a higher tolerance to what they might find, is small in comparison. We focus here on two factors that have a negative impact on the retrievability of the entities (I) the training data used for dense retrieval models and (II) the distribution of narrow and broad intent queries issued in the system. We propose CtrlQGen, a method that generates queries for a chosen underlying intent-narrow or broad. We can use CtrlQGen to improve factor (I) by generating training data for dense retrieval models comprised of diverse synthetic queries. CtrlQGen can also be used to deal with factor (II) by suggesting queries with broader intents to users. Our results on datasets from the domains of music, podcasts, and books reveal that we can significantly decrease the retrievability bias of a dense retrieval model when using CtrlQGen. First, by using the generated queries as training data for dense models we make 9% of the entities retrievable (go from zero to non-zero retrievability). Second, by suggesting broader queries to users, we can make 12% of the entities retrievable in the best case.  ( 3 min )
    Scheduling and Aggregation Design for Asynchronous Federated Learning over Wireless Networks. (arXiv:2212.07356v2 [cs.LG] UPDATED)
    Federated Learning (FL) is a collaborative machine learning (ML) framework that combines on-device training and server-based aggregation to train a common ML model among distributed agents. In this work, we propose an asynchronous FL design with periodic aggregation to tackle the straggler issue in FL systems. Considering limited wireless communication resources, we investigate the effect of different scheduling policies and aggregation designs on the convergence performance. Driven by the importance of reducing the bias and variance of the aggregated model updates, we propose a scheduling policy that jointly considers the channel quality and training data representation of user devices. The effectiveness of our channel-aware data-importance-based scheduling policy, compared with state-of-the-art methods proposed for synchronous FL, is validated through simulations. Moreover, we show that an ``age-aware'' aggregation weighting design can significantly improve the learning performance in an asynchronous FL setting.  ( 2 min )
    Vibration Signal Denoising Using Deep Learning. (arXiv:2303.11413v1 [eess.SP])
    Structure vibration signals induced by footsteps are widely used for tasks like occupant identification, localization, human activity inference, structure health monitoring and so on. The vibration signals are collected as time series with amplitude values. However, the collected signals are always noisy in practice due to the influence of environmental noise, electromagnetic interference and other factors. The presence of noise affects the process of signal analysis, thus affecting the accuracy and error of the final tasks. In this paper, we mainly explore the denoising methods for footstep-induced vibration signals. We have considered different kinds of noise including stationary noises such as gaussian noises and non-stationary noises such as item-dropping vibration noise and music noises.
    Are uGLAD? Time will tell!. (arXiv:2303.11647v1 [cs.LG])
    We frequently encounter multiple series that are temporally correlated in our surroundings, such as EEG data to examine alterations in brain activity or sensors to monitor body movements. Segmentation of multivariate time series data is a technique for identifying meaningful patterns or changes in the time series that can signal a shift in the system's behavior. However, most segmentation algorithms have been designed primarily for univariate time series, and their performance on multivariate data remains largely unsatisfactory, making this a challenging problem. In this work, we introduce a novel approach for multivariate time series segmentation using conditional independence (CI) graphs. CI graphs are probabilistic graphical models that represents the partial correlations between the nodes. We propose a domain agnostic multivariate segmentation framework `$\texttt{tGLAD}$' which draws a parallel between the CI graph nodes and the variables of the time series. Consider applying a graph recovery model $\texttt{uGLAD}$ to a short interval of the time series, it will result in a CI graph that shows partial correlations among the variables. We extend this idea to the entire time series by utilizing a sliding window to create a batch of time intervals and then run a single $\texttt{uGLAD}$ model in multitask learning mode to recover all the CI graphs simultaneously. As a result, we obtain a corresponding temporal CI graphs representation. We then designed a first-order and second-order based trajectory tracking algorithms to study the evolution of these graphs across distinct intervals. Finally, an `Allocation' algorithm is used to determine a suitable segmentation of the temporal graph sequence. $\texttt{tGLAD}$ provides a competitive time complexity of $O(N)$ for settings where number of variables $D<<N$. We demonstrate successful empirical results on a Physical Activity Monitoring data.
    Mitigating Covertly Unsafe Text within Natural Language Systems. (arXiv:2210.09306v2 [cs.AI] UPDATED)
    An increasingly prevalent problem for intelligent technologies is text safety, as uncontrolled systems may generate recommendations to their users that lead to injury or life-threatening consequences. However, the degree of explicitness of a generated statement that can cause physical harm varies. In this paper, we distinguish types of text that can lead to physical harm and establish one particularly underexplored category: covertly unsafe text. Then, we further break down this category with respect to the system's information and discuss solutions to mitigate the generation of text in each of these subcategories. Ultimately, our work defines the problem of covertly unsafe language that causes physical harm and argues that this subtle yet dangerous issue needs to be prioritized by stakeholders and regulators. We highlight mitigation strategies to inspire future researchers to tackle this challenging problem and help improve safety within smart systems.  ( 2 min )
    Valid Inference after Causal Discovery. (arXiv:2208.05949v2 [stat.ME] UPDATED)
    Causal discovery and causal effect estimation are two fundamental tasks in causal inference. While many methods have been developed for each task individually, statistical challenges arise when applying these methods jointly: estimating causal effects after running causal discovery algorithms on the same data leads to "double dipping," invalidating the coverage guarantees of classical confidence intervals. To this end, we develop tools for valid post-causal-discovery inference. Across empirical studies, we show that a naive combination of causal discovery and subsequent inference algorithms leads to highly inflated miscoverage rates; on the other hand, applying our method provides reliable coverage while achieving more accurate causal discovery than data splitting.  ( 2 min )
    Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm. (arXiv:2203.03186v2 [cs.LG] UPDATED)
    We study the corrupted bandit problem, i.e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent adversary or Nature. To be specific, the reward obtained by playing an arm comes from corresponding heavy-tailed reward distribution with probability $1-\varepsilon \in (0.5,1]$ and an arbitrary corruption distribution of unbounded support with probability $\varepsilon \in [0,0.5)$. First, we provide $\textit{a problem-dependent lower bound on the regret}$ of any corrupted bandit algorithm. The lower bounds indicate that the corrupted bandit problem is harder than the classical stochastic bandit problem with sub-Gaussian or heavy-tail rewards. Following that, we propose a novel UCB-type algorithm for corrupted bandits, namely HubUCB, that builds on Huber's estimator for robust mean estimation. Leveraging a novel concentration inequality of Huber's estimator, we prove that HubUCB achieves a near-optimal regret upper bound. Since computing Huber's estimator has quadratic complexity, we further introduce a sequential version of Huber's estimator that exhibits linear complexity. We leverage this sequential estimator to design SeqHubUCB that enjoys similar regret guarantees while reducing the computational burden. Finally, we experimentally illustrate the efficiency of HubUCB and SeqHubUCB in solving corrupted bandits for different reward distributions and different levels of corruptions.  ( 2 min )
    DeepSense 6G: A Large-Scale Real-World Multi-Modal Sensing and Communication Dataset. (arXiv:2211.09769v2 [eess.SP] UPDATED)
    This article presents the DeepSense 6G dataset, which is a large-scale dataset based on real-world measurements of co-existing multi-modal sensing and communication data. The DeepSense 6G dataset is built to advance deep learning research in a wide range of applications in the intersection of multi-modal sensing, communication, and positioning. This article provides a detailed overview of the DeepSense dataset structure, adopted testbeds, data collection and processing methodology, deployment scenarios, and example applications, with the objective of facilitating the adoption and reproducibility of multi-modal sensing and communication datasets.  ( 2 min )
    Contrastive learning for regression in multi-site brain age prediction. (arXiv:2211.08326v2 [eess.IV] UPDATED)
    Building accurate Deep Learning (DL) models for brain age prediction is a very relevant topic in neuroimaging, as it could help better understand neurodegenerative disorders and find new biomarkers. To estimate accurate and generalizable models, large datasets have been collected, which are often multi-site and multi-scanner. This large heterogeneity negatively affects the generalization performance of DL models since they are prone to overfit site-related noise. Recently, contrastive learning approaches have been shown to be more robust against noise in data or labels. For this reason, we propose a novel contrastive learning regression loss for robust brain age prediction using MRI scans. Our method achieves state-of-the-art performance on the OpenBHB challenge, yielding the best generalization capability and robustness to site-related noise.  ( 2 min )
    CoReS: Compatible Representations via Stationarity. (arXiv:2111.07632v2 [cs.CV] UPDATED)
    In this paper, we propose a novel method to learn internal feature representation models that are \textit{compatible} with previously learned ones. Compatible features enable for direct comparison of old and new learned features, allowing them to be used interchangeably over time. This eliminates the need for visual search systems to extract new features for all previously seen images in the gallery-set when sequentially upgrading the representation model. Extracting new features is typically quite expensive or infeasible in the case of very large gallery-sets and/or real time systems (i.e., face-recognition systems, social networks, life-long learning systems, robotics and surveillance systems). Our approach, called Compatible Representations via Stationarity (CoReS), achieves compatibility by encouraging stationarity to the learned representation model without relying on previously learned models. Stationarity allows features' statistical properties not to change under time shift so that the current learned features are inter-operable with the old ones. We evaluate single and sequential multi-model upgrading in growing large-scale training datasets and we show that our method improves the state-of-the-art in achieving compatible features by a large margin. In particular, upgrading ten times with training data taken from CASIA-WebFace and evaluating in Labeled Face in the Wild (LFW), we obtain a 49\% increase in measuring the average number of times compatibility is achieved, which is a 544\% relative improvement over previous state-of-the-art.  ( 3 min )
    Joint Differentiable Optimization and Verification for Certified Reinforcement Learning. (arXiv:2201.12243v2 [cs.LG] UPDATED)
    In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties (e.g., safety, stability) under the learned controller. However, as existing methods typically apply formal verification \emph{after} the controller has been learned, it is sometimes difficult to obtain any certificate, even after many iterations between learning and verification. To address this challenge, we propose a framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is differentiable by the gradients from the value function and certificates. Experiments on a variety of examples demonstrate the significant advantages of our framework over the model-based stochastic value gradient (SVG) method and the model-free proximal policy optimization (PPO) method in finding feasible controllers with barrier functions and Lyapunov functions that ensure system safety and stability.  ( 2 min )
    Fix the Noise: Disentangling Source Feature for Controllable Domain Translation. (arXiv:2303.11545v1 [cs.CV])
    Recent studies show strong generative performance in domain translation especially by using transfer learning techniques on the unconditional generator. However, the control between different domain features using a single model is still challenging. Existing methods often require additional models, which is computationally demanding and leads to unsatisfactory visual quality. In addition, they have restricted control steps, which prevents a smooth transition. In this paper, we propose a new approach for high-quality domain translation with better controllability. The key idea is to preserve source features within a disentangled subspace of a target feature space. This allows our method to smoothly control the degree to which it preserves source features while generating images from an entirely new domain using only a single model. Our extensive experiments show that the proposed method can produce more consistent and realistic images than previous works and maintain precise controllability over different levels of transformation. The code is available at https://github.com/LeeDongYeun/FixNoise.
    Verifiable and Provably Secure Machine Unlearning. (arXiv:2210.09126v2 [cs.LG] UPDATED)
    Machine unlearning aims to remove points from the training dataset of a machine learning model after training; for example when a user requests their data to be deleted. While many machine unlearning methods have been proposed, none of them enable users to audit the procedure. Furthermore, recent work shows a user is unable to verify if their data was unlearnt from an inspection of the model alone. Rather than reasoning about model parameters, we propose to view verifiable unlearning as a security problem. To this end, we present the first cryptographic definition of verifiable unlearning to formally capture the guarantees of a machine unlearning system. In this framework, the server first computes a proof that the model was trained on a dataset $D$. Given a user data point $d$ requested to be deleted, the server updates the model using an unlearning algorithm. It then provides a proof of the correct execution of unlearning and that $d \notin D'$, where $D'$ is the new training dataset. Our framework is generally applicable to different unlearning techniques that we abstract as admissible functions. We instantiate the framework, based on cryptographic assumptions, using SNARKs and hash chains. Finally, we implement the protocol for three different unlearning techniques (retraining-based, amnesiac, and optimization-based) to validate its feasibility for linear regression, logistic regression, and neural networks.  ( 2 min )
    Hierarchical Memory Pool Based Edge Semi-Supervised Continual Learning Method. (arXiv:2303.11952v1 [cs.LG])
    The continuous changes in the world have resulted in the performance regression of neural networks. Therefore, continual learning (CL) area gradually attracts the attention of more researchers. For edge intelligence, the CL model not only needs to overcome catastrophic for-getting, but also needs to face the huge challenge of severely limited resources: the lack of labeled resources and powerful devices. However, the existing classic CL methods usually rely on a large number of labeled samples to maintain the plasticity and stability, and the semi-supervised learning methods often need to pay a large computational and memory overhead for higher accuracy. In response to these prob-lems, a low-cost semi-supervised CL method named Edge Hierarchical Memory Learner (EdgeHML) will be proposed. EdgeHML can effec-tively utilize a large number of unlabeled samples and a small number of labeled samples. It is based on a hierarchical memory pool, lever-age multi-level storage structure to store and replay samples. EdgeHML implements the interaction between different levels through a combination of online and offline strategies. In addition, in order to further reduce the computational overhead for unlabeled samples, EdgeHML leverages a progressive learning method. It reduces the computation cycles of unlabeled samples by controlling the learning process. The experimental results show that on three semi-supervised CL tasks, EdgeHML can improve the model accuracy by up to 16.35% compared with the classic CL method, and the training iterations time can be reduced by more than 50% compared with semi-supervised methods. EdgeHML achieves a semi-supervised CL process with high performance and low overhead for edge intelligence.  ( 2 min )
    A Case Study on Designing Evaluations of ML Explanations with Simulated User Studies. (arXiv:2302.07444v2 [cs.LG] UPDATED)
    When conducting user studies to ascertain the usefulness of model explanations in aiding human decision-making, it is important to use real-world use cases, data, and users. However, this process can be resource-intensive, allowing only a limited number of explanation methods to be evaluated. Simulated user evaluations (SimEvals), which use machine learning models as a proxy for human users, have been proposed as an intermediate step to select promising explanation methods. In this work, we conduct the first SimEvals on a real-world use case to evaluate whether explanations can better support ML-assisted decision-making in e-commerce fraud detection. We study whether SimEvals can corroborate findings from a user study conducted in this fraud detection context. In particular, we find that SimEvals suggest that all considered explainers are equally performant, and none beat a baseline without explanations -- this matches the conclusions of the original user study. Such correspondences between our results and the original user study provide initial evidence in favor of using SimEvals before running user studies. We also explore the use of SimEvals as a cheap proxy to explore an alternative user study set-up. We hope that this work motivates further study of when and how SimEvals should be used to aid in the design of real-world evaluations.  ( 2 min )
    Environmental Sensor Placement with Convolutional Gaussian Neural Processes. (arXiv:2211.10381v3 [stat.ML] UPDATED)
    Environmental sensors are crucial for monitoring weather conditions and the impacts of climate change. However, it is challenging to maximise measurement informativeness and place sensors efficiently, particularly in remote regions like Antarctica. Probabilistic machine learning models can evaluate placement informativeness by predicting the uncertainty reduction provided by a new sensor. Gaussian process (GP) models are widely used for this purpose, but they struggle with capturing complex non-stationary behaviour and scaling to large datasets. This paper proposes using a convolutional Gaussian neural process (ConvGNP) to address these issues. A ConvGNP uses neural networks to parameterise a joint Gaussian distribution at arbitrary target locations, enabling flexibility and scalability. Using simulated surface air temperature anomaly over Antarctica as ground truth, the ConvGNP learns spatial and seasonal non-stationarities, outperforming a non-stationary GP baseline. In a simulated sensor placement experiment, the ConvGNP better predicts the performance boost obtained from new observations than GP baselines, leading to more informative sensor placements. We connect our work with similar machine learning and physics-based approaches and discuss steps towards an operational sensor placement recommendation system.  ( 2 min )
    FedMAE: Federated Self-Supervised Learning with One-Block Masked Auto-Encoder. (arXiv:2303.11339v1 [cs.LG])
    Latest federated learning (FL) methods started to focus on how to use unlabeled data in clients for training due to users' privacy concerns, high labeling costs, or lack of expertise. However, current Federated Semi-Supervised/Self-Supervised Learning (FSSL) approaches fail to learn large-scale images because of the limited computing resources of local clients. In this paper, we introduce a new framework FedMAE, which stands for Federated Masked AutoEncoder, to address the problem of how to utilize unlabeled large-scale images for FL. Specifically, FedMAE can pre-train one-block Masked AutoEncoder (MAE) using large images in lightweight client devices, and then cascades multiple pre-trained one-block MAEs in the server to build a multi-block ViT backbone for downstream tasks. Theoretical analysis and experimental results on image reconstruction and classification show that our FedMAE achieves superior performance compared to the state-of-the-art FSSL methods.
    EPiC: Ensemble of Partial Point Clouds for Robust Classification. (arXiv:2303.11419v1 [cs.CV])
    Robust point cloud classification is crucial for real-world applications, as consumer-type 3D sensors often yield partial and noisy data, degraded by various artifacts. In this work we propose a general ensemble framework, based on partial point cloud sampling. Each ensemble member is exposed to only partial input data. Three sampling strategies are used jointly, two local ones, based on patches and curves, and a global one of random sampling. We demonstrate the robustness of our method to various local and global degradations. We show that our framework significantly improves the robustness of top classification netowrks by a large margin. Our experimental setting uses the recently introduced ModelNet-C database by Ren et al.[24], where we reach SOTA both on unaugmented and on augmented data. Our unaugmented mean Corruption Error (mCE) is 0.64 (current SOTA is 0.86) and 0.50 for augmented data (current SOTA is 0.57). We analyze and explain these remarkable results through diversity analysis.
    Short-length SSVEP data extension by a novel generative adversarial networks based framework. (arXiv:2301.05599v3 [q-bio.NC] UPDATED)
    Steady-state visual evoked potentials (SSVEPs) based brain-computer interface (BCI) has received considerable attention due to its high information transfer rate (ITR) and available quantity of targets. However, the performance of frequency identification methods heavily hinges on the amount of user calibration data and data length, which hinders the deployment in real-world applications. Recently, generative adversarial networks (GANs)-based data generation methods have been widely adopted to create synthetic electroencephalography (EEG) data, holds promise to address these issues. In this paper, we proposed a GAN-based end-to-end signal transformation network for data length extension, termed as TEGAN. TEGAN transforms short-length SSVEP signals into long-length artificial SSVEP signals. By incorporating a novel U-Net generator architecture and an auxiliary classifier into the network architecture, the TEGAN could produce conditioned features in the synthetic data. Additionally, we introduced a two-stage training strategy and the LeCam-divergence regularization term to regularize the training process of GAN during the network implementation. The proposed TEGAN was evaluated on two public SSVEP datasets (a 4-class dataset and a 12-class dataset). With the assistance of TEGAN, the performance of traditional frequency recognition methods and deep learning-based methods have been significantly improved under limited calibration data. And the classification performance gap of various frequency recognition methods has been narrowed. This study substantiates the feasibility of the proposed method to extend the data length for short-time SSVEP signals for developing a high-performance BCI system. The proposed GAN-based methods have the great potential of shortening the calibration time and cutting down the budget for various real-world BCI-based applications.  ( 3 min )
    Rotating without Seeing: Towards In-hand Dexterity through Touch. (arXiv:2303.10880v2 [cs.RO] UPDATED)
    Tactile information plays a critical role in human dexterity. It reveals useful contact information that may not be inferred directly from vision. In fact, humans can even perform in-hand dexterous manipulation without using vision. Can we enable the same ability for the multi-finger robot hand? In this paper, we present Touch Dexterity, a new system that can perform in-hand object rotation using only touching without seeing the object. Instead of relying on precise tactile sensing in a small region, we introduce a new system design using dense binary force sensors (touch or no touch) overlaying one side of the whole robot hand (palm, finger links, fingertips). Such a design is low-cost, giving a larger coverage of the object, and minimizing the Sim2Real gap at the same time. We train an in-hand rotation policy using Reinforcement Learning on diverse objects in simulation. Relying on touch-only sensing, we can directly deploy the policy in a real robot hand and rotate novel objects that are not presented in training. Extensive ablations are performed on how tactile information help in-hand manipulation.Our project is available at https://touchdexterity.github.io.  ( 2 min )
    Unsupervised Cross-Domain Rumor Detection with Contrastive Learning and Cross-Attention. (arXiv:2303.11945v1 [cs.SI])
    Massive rumors usually appear along with breaking news or trending topics, seriously hindering the truth. Existing rumor detection methods are mostly focused on the same domain, and thus have poor performance in cross-domain scenarios due to domain shift. In this work, we propose an end-to-end instance-wise and prototype-wise contrastive learning model with a cross-attention mechanism for cross-domain rumor detection. The model not only performs cross-domain feature alignment but also enforces target samples to align with the corresponding prototypes of a given source domain. Since target labels in a target domain are unavailable, we use a clustering-based approach with carefully initialized centers by a batch of source domain samples to produce pseudo labels. Moreover, we use a cross-attention mechanism on a pair of source data and target data with the same labels to learn domain-invariant representations. Because the samples in a domain pair tend to express similar semantic patterns, especially on the people's attitudes (e.g., supporting or denying) towards the same category of rumors, the discrepancy between a pair of the source domain and target domain will be decreased. We conduct experiments on four groups of cross-domain datasets and show that our proposed model achieves state-of-the-art performance.  ( 2 min )
    Sharpness-aware Quantization for Deep Neural Networks. (arXiv:2111.12273v5 [cs.CV] UPDATED)
    Network quantization is a dominant paradigm of model compression. However, the abrupt changes in quantized weights during training often lead to severe loss fluctuations and result in a sharp loss landscape, making the gradients unstable and thus degrading the performance. Recently, Sharpness-Aware Minimization (SAM) has been proposed to smooth the loss landscape and improve the generalization performance of the models. Nevertheless, directly applying SAM to the quantized models can lead to perturbation mismatch or diminishment issues, resulting in suboptimal performance. In this paper, we propose a novel method, dubbed Sharpness-Aware Quantization (SAQ), to explore the effect of SAM in model compression, particularly quantization for the first time. Specifically, we first provide a unified view of quantization and SAM by treating them as introducing quantization noises and adversarial perturbations to the model weights, respectively. According to whether the noise and perturbation terms depend on each other, SAQ can be formulated into three cases, which are analyzed and compared comprehensively. Furthermore, by introducing an efficient training strategy, SAQ only incurs a little additional training overhead compared with the default optimizer (e.g., SGD or AdamW). Extensive experiments on both convolutional neural networks and Transformers across various datasets (i.e., ImageNet, CIFAR-10/100, Oxford Flowers-102, Oxford-IIIT Pets) show that SAQ improves the generalization performance of the quantized models, yielding the SOTA results in uniform quantization. For example, on ImageNet, SAQ outperforms AdamW by 1.2% on the Top-1 accuracy for 4-bit ViT-B/16. Our 4-bit ResNet-50 surpasses the previous SOTA method by 0.9% on the Top-1 accuracy.  ( 3 min )
    Vision Transformer-based Model for Severity Quantification of Lung Pneumonia Using Chest X-ray Images. (arXiv:2303.11935v1 [eess.IV])
    To develop generic and reliable approaches for diagnosing and assessing the severity of COVID-19 from chest X-rays (CXR), a large number of well-maintained COVID-19 datasets are needed. Existing severity quantification architectures require expensive training calculations to achieve the best results. For healthcare professionals to quickly and automatically identify COVID-19 patients and predict associated severity indicators, computer utilities are needed. In this work, we propose a Vision Transformer (ViT)-based neural network model that relies on a small number of trainable parameters to quantify the severity of COVID-19 and other lung diseases. We present a feasible approach to quantify the severity of CXR, called Vision Transformer Regressor Infection Prediction (ViTReg-IP), derived from a ViT and a regression head. We investigate the generalization potential of our model using a variety of additional test chest radiograph datasets from different open sources. In this context, we performed a comparative study with several competing deep learning analysis methods. The experimental results show that our model can provide peak performance in quantifying severity with high generalizability at a relatively low computational cost. The source codes used in our work are publicly available at https://github.com/bouthainas/ViTReg-IP.  ( 3 min )
    GNN-Ensemble: Towards Random Decision Graph Neural Networks. (arXiv:2303.11376v1 [cs.LG])
    Graph Neural Networks (GNNs) have enjoyed wide spread applications in graph-structured data. However, existing graph based applications commonly lack annotated data. GNNs are required to learn latent patterns from a limited amount of training data to perform inferences on a vast amount of test data. The increased complexity of GNNs, as well as a single point of model parameter initialization, usually lead to overfitting and sub-optimal performance. In addition, it is known that GNNs are vulnerable to adversarial attacks. In this paper, we push one step forward on the ensemble learning of GNNs with improved accuracy, generalization, and adversarial robustness. Following the principles of stochastic modeling, we propose a new method called GNN-Ensemble to construct an ensemble of random decision graph neural networks whose capacity can be arbitrarily expanded for improvement in performance. The essence of the method is to build multiple GNNs in randomly selected substructures in the topological space and subfeatures in the feature space, and then combine them for final decision making. These GNNs in different substructure and subfeature spaces generalize their classification in complementary ways. Consequently, their combined classification performance can be improved and overfitting on the training data can be effectively reduced. In the meantime, we show that GNN-Ensemble can significantly improve the adversarial robustness against attacks on GNNs.
    GAMR: A Guided Attention Model for (visual) Reasoning. (arXiv:2206.04928v5 [cs.AI] UPDATED)
    Humans continue to outperform modern AI systems in their ability to flexibly parse and understand complex visual scenes. Here, we present a novel module for visual reasoning, the Guided Attention Model for (visual) Reasoning (GAMR), which instantiates an active vision theory -- positing that the brain solves complex visual reasoning problems dynamically -- via sequences of attention shifts to select and route task-relevant visual information into memory. Experiments on an array of visual reasoning tasks and datasets demonstrate GAMR's ability to learn visual routines in a robust and sample-efficient manner. In addition, GAMR is shown to be capable of zero-shot generalization on completely novel reasoning tasks. Overall, our work provides computational support for cognitive theories that postulate the need for a critical interplay between attention and memory to dynamically maintain and manipulate task-relevant visual information to solve complex visual reasoning tasks.  ( 2 min )
    Whose Emotion Matters? Speaking Activity Localisation without Prior Knowledge. (arXiv:2211.15377v3 [eess.AS] UPDATED)
    The task of emotion recognition in conversations (ERC) benefits from the availability of multiple modalities, as provided, for example, in the video-based Multimodal EmotionLines Dataset (MELD). However, only a few research approaches use both acoustic and visual information from the MELD videos. There are two reasons for this: First, label-to-video alignments in MELD are noisy, making those videos an unreliable source of emotional speech data. Second, conversations can involve several people in the same scene, which requires the localisation of the utterance source. In this paper, we introduce MELD with Fixed Audiovisual Information via Realignment (MELD-FAIR) by using recent active speaker detection and automatic speech recognition models, we are able to realign the videos of MELD and capture the facial expressions from speakers in 96.92% of the utterances provided in MELD. Experiments with a self-supervised voice recognition model indicate that the realigned MELD-FAIR videos more closely match the transcribed utterances given in the MELD dataset. Finally, we devise a model for emotion recognition in conversations trained on the realigned MELD-FAIR videos, which outperforms state-of-the-art models for ERC based on vision alone. This indicates that localising the source of speaking activities is indeed effective for extracting facial expressions from the uttering speakers and that faces provide more informative visual cues than the visual features state-of-the-art models have been using so far. The MELD-FAIR realignment data, and the code of the realignment procedure and of the emotional recognition, are available at https://github.com/knowledgetechnologyuhh/MELD-FAIR.  ( 3 min )
    Sim-to-Real Transfer for Quadrupedal Locomotion via Terrain Transformer. (arXiv:2212.07740v2 [cs.RO] UPDATED)
    Deep reinforcement learning has recently emerged as an appealing alternative for legged locomotion over multiple terrains by training a policy in physical simulation and then transferring it to the real world (i.e., sim-to-real transfer). Despite considerable progress, the capacity and scalability of traditional neural networks are still limited, which may hinder their applications in more complex environments. In contrast, the Transformer architecture has shown its superiority in a wide range of large-scale sequence modeling tasks, including natural language processing and decision-making problems. In this paper, we propose Terrain Transformer (TERT), a high-capacity Transformer model for quadrupedal locomotion control on various terrains. Furthermore, to better leverage Transformer in sim-to-real scenarios, we present a novel two-stage training framework consisting of an offline pretraining stage and an online correction stage, which can naturally integrate Transformer with privileged training. Extensive experiments in simulation demonstrate that TERT outperforms state-of-the-art baselines on different terrains in terms of return, energy consumption and control smoothness. In further real-world validation, TERT successfully traverses nine challenging terrains, including sand pit and stair down, which can not be accomplished by strong baselines.  ( 2 min )
    D3G: Learning Multi-robot Coordination from Demonstrations. (arXiv:2207.08892v2 [cs.RO] UPDATED)
    This paper develops a Distributed Differentiable Dynamic Game (D3G) framework, which enables learning multi-robot coordination from demonstrations. We represent multi-robot coordination as a dynamic game, where the behavior of a robot is dictated by its own dynamics and objective that also depends on others' behavior. The coordination thus can be adapted by tuning the objective and dynamics of each robot. The proposed D3G enables each robot to automatically tune its individual dynamics and objectives in a distributed manner by minimizing the mismatch between its trajectory and demonstrations. This learning framework features a new design, including a forward-pass, where all robots collaboratively seek Nash equilibrium of a game, and a backward-pass, where gradients are propagated via the communication graph. We test the D3G in simulation with two types of robots given different task configurations. The results validate the capability of D3G for learning multi-robot coordination from demonstrations.  ( 2 min )
    GPT-based Open-Ended Knowledge Tracing. (arXiv:2203.03716v4 [cs.CY] UPDATED)
    In education applications, knowledge tracing refers to the problem of estimating students' time-varying concept/skill mastery level from their past responses to questions and predicting their future performance. One key limitation of most existing knowledge tracing methods is that they treat student responses to questions as binary-valued, i.e., whether they are correct or incorrect. Response correctness analysis/prediction ignores important information on student knowledge contained in the exact content of the responses, especially for open-ended questions. In this paper, we conduct the first exploration into open-ended knowledge tracing (OKT) by studying the new task of predicting students' exact open-ended responses to questions. Our work is grounded in the domain of computer science education with programming questions. We develop an initial solution to the OKT problem, a student knowledge-guided code generation approach, that combines program synthesis methods using language models with student knowledge tracing methods. We also conduct a series of quantitative and qualitative experiments on a real-world student code dataset to validate OKT and demonstrate its promise in educational applications.  ( 2 min )
    Studying Limits of Explainability by Integrated Gradients for Gene Expression Models. (arXiv:2303.11336v1 [q-bio.GN])
    Understanding the molecular processes that drive cellular life is a fundamental question in biological research. Ambitious programs have gathered a number of molecular datasets on large populations. To decipher the complex cellular interactions, recent work has turned to supervised machine learning methods. The scientific questions are formulated as classical learning problems on tabular data or on graphs, e.g. phenotype prediction from gene expression data. In these works, the input features on which the individual predictions are predominantly based are often interpreted as indicative of the cause of the phenotype, such as cancer identification. Here, we propose to explore the relevance of the biomarkers identified by Integrated Gradients, an explainability method for feature attribution in machine learning. Through a motivating example on The Cancer Genome Atlas, we show that ranking features by importance is not enough to robustly identify biomarkers. As it is difficult to evaluate whether biomarkers reflect relevant causes without known ground truth, we simulate gene expression data by proposing a hierarchical model based on Latent Dirichlet Allocation models. We also highlight good practices for evaluating explanations for genomics data and propose a direction to derive more insights from these explanations.  ( 2 min )
    Optimized preprocessing and Tiny ML for Attention State Classification. (arXiv:2303.11371v1 [cs.LG])
    In this paper, we present a new approach to mental state classification from EEG signals by combining signal processing techniques and machine learning (ML) algorithms. We evaluate the performance of the proposed method on a dataset of EEG recordings collected during a cognitive load task and compared it to other state-of-the-art methods. The results show that the proposed method achieves high accuracy in classifying mental states and outperforms state-of-the-art methods in terms of classification accuracy and computational efficiency.  ( 2 min )
    Machine learning-based detection of cardiovascular disease using ECG signals: performance vs. complexity. (arXiv:2303.11429v1 [eess.SP])
    Cardiovascular disease remains a significant problem in modern society. Among non-invasive techniques, the electrocardiogram (ECG) is one of the most reliable methods for detecting abnormalities in cardiac activities. However, ECG interpretation requires expert knowledge and it is time-consuming. Developing a novel method to detect the disease early could prevent death and complication. The paper presents novel various approaches for classifying cardiac diseases from ECG recordings. The first approach suggests the Poincare representation of ECG signal and deep-learning-based image classifiers (ResNet50 and DenseNet121 were learned over Poincare diagrams), which showed decent performance in predicting AF (atrial fibrillation) but not other types of arrhythmia. XGBoost, a gradient-boosting model, showed an acceptable performance in long-term data but had a long inference time due to highly-consuming calculation within the pre-processing phase. Finally, the 1D convolutional model, specifically the 1D ResNet, showed the best results in both studied CinC 2017 and CinC 2020 datasets, reaching the F1 score of 85% and 71%, respectively, and that was superior to the first-ranking solution of each challenge. The paper also investigated efficiency metrics such as power consumption and equivalent CO2 emissions, with one-dimensional models like 1D CNN and 1D ResNet being the most energy efficient. Model interpretation analysis showed that the DenseNet detected AF using heart rate variability while the 1DResNet assessed AF pattern in raw ECG signals.  ( 2 min )
    eP-ALM: Efficient Perceptual Augmentation of Language Models. (arXiv:2303.11403v1 [cs.CV])
    Large Language Models (LLMs) have so far impressed the world, with unprecedented capabilities that emerge in models at large scales. On the vision side, transformer models (i.e., ViT) are following the same trend, achieving the best performance on challenging benchmarks. With the abundance of such unimodal models, a natural question arises; do we need also to follow this trend to tackle multimodal tasks? In this work, we propose to rather direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. In particular, they still train a large number of parameters, rely on large multimodal pretraining, use encoders (e.g., CLIP) trained on huge image-text datasets, and add significant inference overhead. In addition, most of these approaches have focused on Zero-Shot and In Context Learning, with little to no effort on direct finetuning. We investigate the minimal computational effort needed to adapt unimodal models for multimodal tasks and propose a new challenging setup, alongside different approaches, that efficiently adapts unimodal pretrained models. We show that by freezing more than 99\% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning across Image, Video, and Audio modalities, following the proposed setup. The code will be available here: https://github.com/mshukor/eP-ALM.  ( 2 min )
  • Open

    How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer. (arXiv:2303.11454v1 [cs.LG])
    Randomized neural networks (randomized NNs), where only the terminal layer's weights are optimized constitute a powerful model class to reduce computational time in training the neural network model. At the same time, these models generalize surprisingly well in various regression and classification tasks. In this paper, we give an exact macroscopic characterization (i.e., a characterization in function space) of the generalization behavior of randomized, shallow NNs with ReLU activation (RSNs). We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered: the infinite generalized additive model (IGAM). The IGAM is formalized as solution to an optimization problem in function space for a specific regularization functional and a fairly general loss. This work is an extension to multivariate NNs of prior work, where we showed how wide RSNs with ReLU activation behave like spline regression under certain conditions and if the input is one-dimensional.  ( 2 min )
    Exact Non-Oblivious Performance of Rademacher Random Embeddings. (arXiv:2303.11774v1 [cs.LG])
    This paper revisits the performance of Rademacher random projections, establishing novel statistical guarantees that are numerically sharp and non-oblivious with respect to the input data. More specifically, the central result is the Schur-concavity property of Rademacher random projections with respect to the inputs. This offers a novel geometric perspective on the performance of random projections, while improving quantitatively on bounds from previous works. As a corollary of this broader result, we obtained the improved performance on data which is sparse or is distributed with small spread. This non-oblivious analysis is a novelty compared to techniques from previous work, and bridges the frequently observed gap between theory and practise. The main result uses an algebraic framework for proving Schur-concavity properties, which is a contribution of independent interest and an elegant alternative to derivative-based criteria.  ( 2 min )
    Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing and Neural Networks with Quadratic Activations. (arXiv:2303.11453v1 [cs.LG])
    Pruning schemes have been widely used in practice to reduce the complexity of trained models with a massive number of parameters. Several practical studies have shown that pruning an overparameterized model and fine-tuning generalizes well to new samples. Although the above pipeline, which we refer to as pruning + fine-tuning, has been extremely successful in lowering the complexity of trained models, there is very little known about the theory behind this success. In this paper we address this issue by investigating the pruning + fine-tuning framework on the overparameterized matrix sensing problem, with the ground truth denoted $U_\star \in \mathbb{R}^{d \times r}$ and the overparameterized model $U \in \mathbb{R}^{d \times k}$ with $k \gg r$. We study the approximate local minima of the empirical mean square error, augmented with a smooth version of a group Lasso regularizer, $\sum_{i=1}^k \| U e_i \|_2$ and show that pruning the low $\ell_2$-norm columns results in a solution $U_{\text{prune}}$ which has the minimum number of columns $r$, yet is close to the ground truth in training loss. Initializing the subsequent fine-tuning phase from $U_{\text{prune}}$, the resulting solution converges linearly to a generalization error of $O(\sqrt{rd/n})$ ignoring lower order terms, which is statistically optimal. While our analysis provides insights into the role of regularization in pruning, we also show that running gradient descent in the absence of regularization results in models which {are not suitable for greedy pruning}, i.e., many columns could have their $\ell_2$ norm comparable to that of the maximum. Lastly, we extend our results for the training and pruning of two-layer neural networks with quadratic activation functions. Our results provide the first rigorous insights on why greedy pruning + fine-tuning leads to smaller models which also generalize well.  ( 3 min )
    Formulation of Weighted Average Smoothing as a Projection of the Origin onto a Convex Polytope. (arXiv:2303.11958v1 [cs.LG])
    Our study focuses on determining the best weight windows for a weighted moving average smoother under squared loss. We show that there exists an optimal weight window that is symmetrical around its center. We study the class of tapered weight windows, which decrease in weight as they move away from the center. We formulate the corresponding least squares problem as a quadratic program and finally as a projection of the origin onto a convex polytope. Additionally, we provide some analytical solutions to the best window when some conditions are met on the input data.  ( 2 min )
    Fighting Money Laundering with Statistics and Machine Learning. (arXiv:2201.04207v5 [stat.ML] UPDATED)
    Money laundering is a profound global problem. Nonetheless, there is little scientific literature on statistical and machine learning methods for anti-money laundering. In this paper, we focus on anti-money laundering in banks and provide an introduction and review of the literature. We propose a unifying terminology with two central elements: (i) client risk profiling and (ii) suspicious behavior flagging. We find that client risk profiling is characterized by diagnostics, i.e., efforts to find and explain risk factors. On the other hand, suspicious behavior flagging is characterized by non-disclosed features and hand-crafted risk indices. Finally, we discuss directions for future research. One major challenge is the need for more public data sets. This may potentially be addressed by synthetic data generation. Other possible research directions include semi-supervised and deep learning, interpretability, and fairness of the results.  ( 2 min )
    Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles. (arXiv:1703.01347v3 [cs.AI] UPDATED)
    We study contextual linear bandit problems under feature uncertainty; they are noisy with missing entries. To address the challenges of the noise, we analyze Bayesian oracles given observed noisy features. Our Bayesian analysis finds that the optimal hypothesis can be far from the underlying realizability function, depending on the noise characteristics, which are highly non-intuitive and do not occur for classical noiseless setups. This implies that classical approaches cannot guarantee a non-trivial regret bound. Therefore, we propose an algorithm that aims at the Bayesian oracle from observed information under this model, achieving $\tilde{O}(d\sqrt{T})$ regret bound when there is a large number of arms. We demonstrate the proposed algorithm using synthetic and real-world datasets.
    Uniform Risk Bounds for Learning with Dependent Data Sequences. (arXiv:2303.11650v1 [cs.LG])
    This paper extends standard results from learning theory with independent data to sequences of dependent data. Contrary to most of the literature, we do not rely on mixing arguments or sequential measures of complexity and derive uniform risk bounds with classical proof patterns and capacity measures. In particular, we show that the standard classification risk bounds based on the VC-dimension hold in the exact same form for dependent data, and further provide Rademacher complexity-based bounds, that remain unchanged compared to the standard results for the identically and independently distributed case. Finally, we show how to apply these results in the context of scenario-based optimization in order to compute the sample complexity of random programs with dependent constraints.  ( 2 min )
    Lipschitz-bounded 1D convolutional neural networks using the Cayley transform and the controllability Gramian. (arXiv:2303.11835v1 [cs.LG])
    We establish a layer-wise parameterization for 1D convolutional neural networks (CNNs) with built-in end-to-end robustness guarantees. Herein, we use the Lipschitz constant of the input-output mapping characterized by a CNN as a robustness measure. We base our parameterization on the Cayley transform that parameterizes orthogonal matrices and the controllability Gramian for the state space representation of the convolutional layers. The proposed parameterization by design fulfills linear matrix inequalities that are sufficient for Lipschitz continuity of the CNN, which further enables unconstrained training of Lipschitz-bounded 1D CNNs. Finally, we train Lipschitz-bounded 1D CNNs for the classification of heart arrythmia data and show their improved robustness.
    Blow-up Algorithm for Sum-of-Products Polynomials and Real Log Canonical Thresholds. (arXiv:2303.11619v1 [math.ST])
    When considering a real log canonical threshold (RLCT) that gives a Bayesian generalization error, in general, papers replace a mean error function with a relatively simple polynomial whose RLCT corresponds to that of the mean error function, and obtain its RLCT by resolving its singularities through an algebraic operation called blow-up. Though it is known that the singularities of any polynomial can be resolved by a finite number of blow-up iterations, it is not clarified whether or not it is possible to resolve singularities of a specific polynomial by applying a specific blow-up algorithm. Therefore this paper considers the blow-up algorithm for the polynomials called sum-of-products (sop) polynomials and its RLCT.  ( 2 min )
    Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression. (arXiv:2211.07484v2 [cs.LG] UPDATED)
    We consider a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We present a new algorithm that is simple, computationally efficient, and admits vanishing regret. It is statistically optimal for CBwK when an algorithm must stop once some constraint is violated. Our algorithm builds on LagrangeBwK (Immorlica et al., FOCS 2019) , a Lagrangian-based technique for CBwK, and SquareCB (Foster and Rakhlin, ICML 2020), a regression-based technique for contextual bandits. Our analysis leverages the inherent modularity of both techniques.  ( 2 min )
    Solving High-Dimensional Inverse Problems with Auxiliary Uncertainty via Operator Learning with Limited Data. (arXiv:2303.11379v1 [stat.ML])
    In complex large-scale systems such as climate, important effects are caused by a combination of confounding processes that are not fully observable. The identification of sources from observations of system state is vital for attribution and prediction, which inform critical policy decisions. The difficulty of these types of inverse problems lies in the inability to isolate sources and the cost of simulating computational models. Surrogate models may enable the many-query algorithms required for source identification, but data challenges arise from high dimensionality of the state and source, limited ensembles of costly model simulations to train a surrogate model, and few and potentially noisy state observations for inversion due to measurement limitations. The influence of auxiliary processes adds an additional layer of uncertainty that further confounds source identification. We introduce a framework based on (1) calibrating deep neural network surrogates to the flow maps provided by an ensemble of simulations obtained by varying sources, and (2) using these surrogates in a Bayesian framework to identify sources from observations via optimization. Focusing on an atmospheric dispersion exemplar, we find that the expressive and computationally efficient nature of the deep neural network operator surrogates in appropriately reduced dimension allows for source identification with uncertainty quantification using limited data. Introducing a variable wind field as an auxiliary process, we find that a Bayesian approximation error approach is essential for reliable source inversion when uncertainty due to wind stresses the algorithm.  ( 2 min )
    Skeleton Regression: A Graph-Based Approach to Estimation with Manifold Structure. (arXiv:2303.11786v1 [cs.LG])
    We introduce a new regression framework designed to deal with large-scale, complex data that lies around a low-dimensional manifold. Our approach first constructs a graph representation, referred to as the skeleton, to capture the underlying geometric structure. We then define metrics on the skeleton graph and apply nonparametric regression techniques, along with feature transformations based on the graph, to estimate the regression function. In addition to the included nonparametric methods, we also discuss the limitations of some nonparametric regressors with respect to the general metric space such as the skeleton graph. The proposed regression framework allows us to bypass the curse of dimensionality and provides additional advantages that it can handle the union of multiple manifolds and is robust to additive noise and noisy observations. We provide statistical guarantees for the proposed method and demonstrate its effectiveness through simulations and real data examples.  ( 2 min )
    Universal Smoothed Score Functions for Generative Modeling. (arXiv:2303.11669v1 [stat.ML])
    We consider the problem of generative modeling based on smoothing an unknown density of interest in $\mathbb{R}^d$ using factorial kernels with $M$ independent Gaussian channels with equal noise levels introduced by Saremi and Srivastava (2022). First, we fully characterize the time complexity of learning the resulting smoothed density in $\mathbb{R}^{Md}$, called M-density, by deriving a universal form for its parametrization in which the score function is by construction permutation equivariant. Next, we study the time complexity of sampling an M-density by analyzing its condition number for Gaussian distributions. This spectral analysis gives a geometric insight on the "shape" of M-densities as one increases $M$. Finally, we present results on the sample quality in this class of generative models on the CIFAR-10 dataset where we report Fr\'echet inception distances (14.15), notably obtained with a single noise level on long-run fast-mixing MCMC chains.  ( 2 min )
    Adaptive Experimentation at Scale: Bayesian Algorithms for Flexible Batches. (arXiv:2303.11582v1 [cs.LG])
    Standard bandit algorithms that assume continual reallocation of measurement effort are challenging to implement due to delayed feedback and infrastructural/organizational difficulties. Motivated by practical instances involving a handful of reallocation epochs in which outcomes are measured in batches, we develop a new adaptive experimentation framework that can flexibly handle any batch size. Our main observation is that normal approximations universal in statistical inference can also guide the design of scalable adaptive designs. By deriving an asymptotic sequential experiment, we formulate a dynamic program that can leverage prior information on average rewards. State transitions of the dynamic program are differentiable with respect to the sampling allocations, allowing the use of gradient-based methods for planning and policy optimization. We propose a simple iterative planning method, Residual Horizon Optimization, which selects sampling allocations by optimizing a planning objective via stochastic gradient-based methods. Our method significantly improves statistical power over standard adaptive policies, even when compared to Bayesian bandit algorithms (e.g., Thompson sampling) that require full distributional knowledge of individual rewards. Overall, we expand the scope of adaptive experimentation to settings which are difficult for standard adaptive policies, including problems with a small number of reallocation epochs, low signal-to-noise ratio, and unknown reward distributions.  ( 2 min )
    Valid Inference after Causal Discovery. (arXiv:2208.05949v2 [stat.ME] UPDATED)
    Causal discovery and causal effect estimation are two fundamental tasks in causal inference. While many methods have been developed for each task individually, statistical challenges arise when applying these methods jointly: estimating causal effects after running causal discovery algorithms on the same data leads to "double dipping," invalidating the coverage guarantees of classical confidence intervals. To this end, we develop tools for valid post-causal-discovery inference. Across empirical studies, we show that a naive combination of causal discovery and subsequent inference algorithms leads to highly inflated miscoverage rates; on the other hand, applying our method provides reliable coverage while achieving more accurate causal discovery than data splitting.  ( 2 min )
    Geometrical aspects of lattice gauge equivariant convolutional neural networks. (arXiv:2303.11448v1 [hep-lat])
    Lattice gauge equivariant convolutional neural networks (L-CNNs) are a framework for convolutional neural networks that can be applied to non-Abelian lattice gauge theories without violating gauge symmetry. We demonstrate how L-CNNs can be equipped with global group equivariance. This allows us to extend the formulation to be equivariant not just under translations but under global lattice symmetries such as rotations and reflections. Additionally, we provide a geometric formulation of L-CNNs and show how convolutions in L-CNNs arise as a special case of gauge equivariant neural networks on SU($N$) principal bundles.  ( 2 min )
    Environmental Sensor Placement with Convolutional Gaussian Neural Processes. (arXiv:2211.10381v3 [stat.ML] UPDATED)
    Environmental sensors are crucial for monitoring weather conditions and the impacts of climate change. However, it is challenging to maximise measurement informativeness and place sensors efficiently, particularly in remote regions like Antarctica. Probabilistic machine learning models can evaluate placement informativeness by predicting the uncertainty reduction provided by a new sensor. Gaussian process (GP) models are widely used for this purpose, but they struggle with capturing complex non-stationary behaviour and scaling to large datasets. This paper proposes using a convolutional Gaussian neural process (ConvGNP) to address these issues. A ConvGNP uses neural networks to parameterise a joint Gaussian distribution at arbitrary target locations, enabling flexibility and scalability. Using simulated surface air temperature anomaly over Antarctica as ground truth, the ConvGNP learns spatial and seasonal non-stationarities, outperforming a non-stationary GP baseline. In a simulated sensor placement experiment, the ConvGNP better predicts the performance boost obtained from new observations than GP baselines, leading to more informative sensor placements. We connect our work with similar machine learning and physics-based approaches and discuss steps towards an operational sensor placement recommendation system.  ( 2 min )
    Doubly Regularized Entropic Wasserstein Barycenters. (arXiv:2303.11844v1 [math.OC])
    We study a general formulation of regularized Wasserstein barycenters that enjoys favorable regularity, approximation, stability and (grid-free) optimization properties. This barycenter is defined as the unique probability measure that minimizes the sum of entropic optimal transport (EOT) costs with respect to a family of given probability measures, plus an entropy term. We denote it $(\lambda,\tau)$-barycenter, where $\lambda$ is the inner regularization strength and $\tau$ the outer one. This formulation recovers several previously proposed EOT barycenters for various choices of $\lambda,\tau \geq 0$ and generalizes them. First, in spite of -- and in fact owing to -- being \emph{doubly} regularized, we show that our formulation is debiased for $\tau=\lambda/2$: the suboptimality in the (unregularized) Wasserstein barycenter objective is, for smooth densities, of the order of the strength $\lambda^2$ of entropic regularization, instead of $\max\{\lambda,\tau\}$ in general. We discuss this phenomenon for isotropic Gaussians where all $(\lambda,\tau)$-barycenters have closed form. Second, we show that for $\lambda,\tau>0$, this barycenter has a smooth density and is strongly stable under perturbation of the marginals. In particular, it can be estimated efficiently: given $n$ samples from each of the probability measures, it converges in relative entropy to the population barycenter at a rate $n^{-1/2}$. And finally, this formulation lends itself naturally to a grid-free optimization algorithm: we propose a simple \emph{noisy particle gradient descent} which, in the mean-field limit, converges globally at an exponential rate to the barycenter.  ( 2 min )
    Stochastic regularized majorization-minimization with weakly convex and multi-convex surrogates. (arXiv:2201.01652v3 [math.OC] UPDATED)
    Stochastic majorization-minimization (SMM) is a class of stochastic optimization algorithms that proceed by sampling new data points and minimizing a recursive average of surrogate functions of an objective function. The surrogates are required to be strongly convex and convergence rate analysis for the general non-convex setting was not available. In this paper, we propose an extension of SMM where surrogates are allowed to be only weakly convex or block multi-convex, and the averaged surrogates are approximately minimized with proximal regularization or block-minimized within diminishing radii, respectively. For the general nonconvex constrained setting with non-i.i.d. data samples, we show that the first-order optimality gap of the proposed algorithm decays at the rate $O((\log n)^{1+\epsilon}/n^{1/2})$ for the empirical loss and $O((\log n)^{1+\epsilon}/n^{1/4})$ for the expected loss, where $n$ denotes the number of data samples processed. Under some additional assumption, the latter convergence rate can be improved to $O((\log n)^{1+\epsilon}/n^{1/2})$. As a corollary, we obtain the first convergence rate bounds for various optimization methods under general nonconvex dependent data setting: Double-averaging projected gradient descent and its generalizations, proximal point empirical risk minimization, and online matrix/tensor decomposition algorithms. We also provide experimental validation of our results.  ( 2 min )
    Graph Kalman Filters. (arXiv:2303.12021v1 [cs.LG])
    The well-known Kalman filters model dynamical systems by relying on state-space representations with the next state updated, and its uncertainty controlled, by fresh information associated with newly observed system outputs. This paper generalizes, for the first time in the literature, Kalman and extended Kalman filters to discrete-time settings where inputs, states, and outputs are represented as attributed graphs whose topology and attributes can change with time. The setup allows us to adapt the framework to cases where the output is a vector or a scalar too (node/graph level tasks). Within the proposed theoretical framework, the unknown state-transition and the readout functions are learned end-to-end along with the downstream prediction task.  ( 2 min )

  • Open

    [D] what stories or literature best illustrate the importance of choosing the right objective function?
    I don’t mean L1 or L2 error but more capturing what matters most. I’m trying to cite sources and preparing for a few presentations where we test different target variables that are forms of the same variable but some are regressed easier than others due to how they are normalized, transformed, relative to the time of prediction, or seasonality removed. I’m looking to find illustrative stories and literature that describe the importance of carefully selecting the target variable or an intermediate in open-ended problem solving. Thanks for any thoughts. submitted by /u/PShermanDDS [link] [comments]  ( 43 min )
    SmartyGPT: now with ChatGPT and GPT4 [P]
    I want to announce that we have released v1.1.0 which includes access for ChatGPT and GPT4 for Plus suscribers! :) https://github.com/citiususc/Smarty-GPT submitted by /u/usc-ur [link] [comments]  ( 43 min )
    [D] Is attention ALIBI Attention with Linear Biases implemented in both decoder and encoder?
    I've read the paper, yet I cannot seem to find an explicit statement of where in the transformers architecture the ALIBI mask is implemented. Thanks submitted by /u/AlternativeDish5596 [link] [comments]  ( 43 min )
    [D] Implementation of NER Model with relationship extraction?
    I want to train a model, which can extract information from a perfume description What it should extract are the perfume notes and the type (top, heart, base notes) related to the note submitted by /u/uncutzwiebel [link] [comments]  ( 6 min )
    [D] [P] Curating open-source projects and community demos around GPT-4
    There are many open-source projects and indie-built demos around the GPT-4 API. Despite the recent shift of OpenAI toward closure, open demos are always advancing the field and inspiring creativity. Here are some community projects that I find particularly interesting: https://github.com/radi-cho/awesome-gpt4. Feel free to share the things you've been building or something you've been fascinated about on social media either by joining the discussion here or by contributing to the repository:) submitted by /u/radi-cho [link] [comments]  ( 43 min )
    [D] fx= (1/sigma√2π)*e^(-(x-u)^2/2sigma^2)
    How is this formula to calculate likelihood used in logistic regression to find the maximum likelihood ? Because I have seen videos where people calculated the likelihood by converting log(odds) to probabilities using the sigmoid function. Is this formula used in anyway to find the likelihood in logistic regression. submitted by /u/emo-9 [link] [comments]  ( 43 min )
    [P] Run LLaMA LLM chatbots on any cloud with one click
    We made a *basic* chatbot based on LLaMA models; code here: https://github.com/skypilot-org/skypilot/tree/master/examples/llama-llm-chatbots https://github.com/skypilot-org/sky-llama A detailed post on how to run it on the cloud (Lambda Cloud, AWS, GCP, Azure) with 1 command: https://blog.skypilot.co/llama-llm-chatbots-on-any-cloud/ Would love to hear your thoughts. Although people are making LLMs run on laptops and other devices ({llama,alpaca}.cpp}, we think that as more open and compute-hungry LLMs emerge, it's increasingly important to finetune them and that's where getting powerful cloud compute in flexible locations comes into play. submitted by /u/z_yang [link] [comments]  ( 43 min )
    [R] SPDF - Sparse Pre-training and Dense Fine-tuning for Large Language Models
    Hey everyone! Cerebras is excited to share that our sparsity paper is now available on arxiv and has been accepted into the ICLR 2023 Sparsity in Neural Networks workshop! This research demonstrates the ability to pre-train large GPT models with high levels of sparsity followed by dense fine-tuning to maintain accuracy on downstream tasks. We achieved this using Cerebras CS-2, a system that accelerates unstructured sparsity and allows exploration of machine learning techniques at a larger scale than previously possible. The researchers used simple, static sparsity and evaluated model sizes up to GPT-3 XL with 1.3B parameters. We were able to pre-train GPT-3 XL with up to 75% unstructured sparsity, and 60% fewer training FLOPS on Cerebras CS-2. These findings show the promise of sparse training and motivate exploration of more advanced sparse techniques for even larger models. This is the first time a large GPT model has been pre-trained with high sparsity without significant loss in downstream task metrics, and the results are exciting for the industry as it offers a fundamental enabler to reduce the compute to train these models. submitted by /u/CS-fan-101 [link] [comments]  ( 44 min )
    [D] Running an LLM on "low" compute power machines?
    It's understandable that companies like OpenAI would want to charge for access to their projects due to the ongoing cost to train then run them, I assume most other projects that require as much power and have to run in the cloud will do the same. I was wondering if there were any projects to run/train some kind of language model/AI chatbot on consumer hardware (like a single GPU)? I heard that since Facebook's LLama leaked people managed to get it running on even hardware like an rpi, albeit slowly, I'm not asking to link to leaked data but if there are any projects attempting to achieve a goal like running locally on consumer hardware. submitted by /u/Qwillbehr [link] [comments]  ( 45 min )
    Save money with Spot - blog [News]
    https://medium.com/coiled-computing/save-money-with-spot-d499edd46ae7 submitted by /u/dask-jeeves [link] [comments]  ( 43 min )
    [D] Largest available encoder-only model for text?
    We are witnessing a lot of progress on the encoder-decoder or decoder only settings. Any progress on encoder-only model such as BERT ? Is BERT-large the only publicly available largest encoder only model that is widely tested ? ​ Edit: I'm aware of Megatron BERT by NVIDIA that was supposedly a 3B parameter encoder model but that doesn't seem to be publicly available or heavily cited. Not sure how reliable that is. submitted by /u/Emergency_Apricot_77 [link] [comments]  ( 43 min )
    [D] Overview of advancements in Graph Neural Networks
    Hi all, Recently I’ve been getting up to speed on Graph Neural Networks (GNN) and the results of applying them vs. other methods. I’ve found that GNN approaches made impressive strides and have led to some really substantial performance jumps in actual production models in the industry. A new GNN-based model for estimating the time of arrival within Google Maps (with accuracy improvements up to 50%) is just one of many interesting examples. This pushed me to summarize my findings and write up a blog post on the state of Graph Neural Networks in 2023. It provides an overview of the recent advancements that GNNs have made in industry, as well as a couple of impressive statistics about their current place in the AI research landscape. I plan to write more articles on GNNs, like a short series that will dive into technical details of how GNNs work and more, so if you enjoy this article please let me know so I can be sure to develop the rest of the series and publish soon! Would be very helpful to hear your thoughts on this. submitted by /u/mrx-ai [link] [comments]  ( 47 min )
    [p] detrex v0.3.0: A Major Update to Our Detection Transformer Codebase!
    Happy to release our detrex v0.3.0, after two months, there're lots of new algorithms, baselines and useful tools have been supported in detrex. We will introduce them later. Here's our github link: https://github.com/IDEA-Research/detrex If our repo can help you for your research, please consider give us a star~ What's New New Algorithms and Pre-Trained Models The newly supported algorithms in detrex v0.3.0: - PnP-DETR (ICCV' 2021) - Anchor-DETR (AAAI' 2022) - DETA (ArXiv' 2022) - MaskDINO (CVPR' 2023) As for now, our detrex support 14 detection transformer algorithms. And after lots of experiments, we release a set of new baselines, which are all better than their official implementation Model Box AP (detrex) Box AP (official) DETA-R50 50.2 (+0.1) 50.1 H-Deformable-DETR-R50 49.1 (+0.4) 48.7 DINO-R50 49.4 (+0.4) 49.0 And we also release more than 30+ pretrained weights (including the converted weights) for the community research. All the pretrained model can be downloaded from our Model Zoo DINO detector with ViT backbone Model Lr Sched Box AP DINO-ViTDet-Base (4scale) 12 50.2 DINO-ViTDet-Base (4scale) 50 55.0 DINO-ViTDet-Large (4scale) 12 52.9 DINO-ViTDet-Large (4scale) 50 57.5 DINO detector with strong FocalNet backbone Model Lr Sched Box AP DINO-FocalNet-Large-3level (4scale) 12 57.5 DINO-FocalNet-Large-4level (4scale) 12 58.0 DINO-FocalNet-Large-4level (5scale) 12 58.5 New Training Techniques Supported Mixed Precision Training by setting train.amp.enabled=True: reduce 20% to 30% GPU memory usage. Supported EMAHook by setting train.model_ema.enabled=True: which can further enhance the model performance. Model Lr Sched Box AP DINO-R50 w/o EMA 12 49.0 DINO-R50 with EMA 12 49.4 (+0.4) Supported Encoder-Decoder Gradient Checkpoint in DINO detector which can further reduce 30% GPU memory usage. Support wandb logger Support a great slurm training scripts submitted by /u/Technical-Vast1314 [link] [comments]  ( 44 min )
    [R] Comparison of XAI libraries for Computer Vision
    I am working on an open-source XAI library aggregator and made a comparison of currently active XAI repositories. I am sure somebody will find this useful :) Feature pytorch-grad-cam Captum pytorch-cnn-visualizations torch-cam innvestigate tf-explain OmniXAI Xplique M3d-Cam Repository has library API ✔ ✔ ✘ ✔ ✔ ✔ ✔ ✔ ✔ Contains tutorials or examples ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ Documentation contains API reference ✘ ✔ ✘ ✔ ✔ ✔✘ ✔ ✔ ✔ Works with 2D images ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ Works with 3D images ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✔ Activation visualization ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ Overlaying heatmap visualization ✔ ✔ ✔ ✔ ✘ ✔ ✔ ✔ ✔ Explain image classification ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ Explain object detection ✔ ✔ ✘ ✘ ✘ ✘ ✘ ✘ ✘ Explain semantic segmentation ✔ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✔ Explain panoptic segmentation ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ GPU acceleration ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ Trust metrics ✔ ✔ ✘ ✘ ✘ ✘ ✘ ✔ ✘ PyTorch support ✔ ✔ ✔ ✔ ✘ ✘ ✔ ✘ ✔ TensorFlow support ✘ ✘ ✘ ✘ ✔ ✔ ✔ ✔ ✘ Concept Activation Vector ✘ ✔ ✘ ✘ ✘ ✘ ✘ ✔ ✘ Feature visualization ✘ ✘ ✔ ✘ ✘ ✘ ✘ ✔ ✘ DeepDream ✘ ✘ ✔ ✘ ✘ ✘ ✘ ✘ ✘ Influential Examples ✘ ✔ ✘ ✘ ✘ ✘ ✘ ✘ ✘ Interactive explainer app ✘ ✔ ✘ ✔ ✘ ✘ ✔ ✘ ✘ Adversarial attacks ✘ ✔ ✘ ✘ ✘ ✘ ✔ ✘ ✘ Model’s attributes ✔ ✔ ✔ ✘ ✔ ✔ ✔ ✔ ✔ Layer’s attributes ✔ ✔ ✔ ✔ ✔ ✘ ✔ ✔ ✔ Neuron’s attributes ✘ ✔ ✘ ✘ ✔ ✘ ✘ ✔ ✘ Integration with experiment tracker ✘ ✘ ✘ ✘ ✘ ✔ ✘ ✘ ✘ submitted by /u/adamwawrzynski [link] [comments]  ( 43 min )
    [D] Does label distribution affect models performance, especially for deep learning models?
    Currently I'm working on a project where the dataset are quite imbalanced (most and least class can be 10 times in size). The data and labels of the project are actually a time series, so naturally, when I'm splitting the samples according to time, the label distribution of different class are quite different. For example, in validation set, class 2 can be 10 times of class 1, but only 5 times in testing set. My supervisor asked me to make every sets to have the same label distribution (e.g. class 2 should always have roughly 5 times of class 1). I'm quite frustrated by this argument. Not only this makes cross-validation very annoying as it means every splits I must readjust the label distribution quite bruteforcely; this redistributing also meant that I will have to alter the testing set which in my undestanding is quite a big no-no; (and also the fact that I can't really turn down this argument as the problem does exist and I couldn't provide a better solution). So I want to know do people actually re-adjust the distribution of labels? Because I can't quite find any readings regarding such problems. In my current understanding, a good loss should make the models to "immune" to difference in label distribution. In the project, we have used label-distribution-aware margin (LDAM) loss. But in the paper, the author only experiments with imbalanced data. submitted by /u/Jacfger [link] [comments]  ( 46 min )
    [D] Stable Diffusion + 3D model texture generate
    Transform the way you create textures for your 3D models with our AI-based texture generator plugin for Stable Diffusion! Support our Kickstarter campaign now and be among the first to experience this game-changing software. https://www.kickstarter.com/projects/cyberomance/3diffusetexai-ai-powered-texture-generator-for-3d-models?ref=project_build submitted by /u/DarkVam0 [link] [comments]  ( 6 min )
    Zero-1-to-3: Zero-shot One Image to 3D Object
    submitted by /u/Biosphere_Collapse [link] [comments]  ( 6 min )
    [Project] Machine Learning for Audio: A library for audio analysis, feature extraction, etc
    audioflux is a deep learning tool library for audio and music analysis, feature extraction. It supports dozens of time-frequency analysis transformation methods and hundreds of corresponding time-domain and frequency-domain feature combinations. It can be provided to deep learning networks for training, and is used to study various tasks in the audio field such as Classification, Separation, Music Information Retrieval(MIR) and ASR etc. Source Code: https://github.com/libAudioFlux/audioFlux submitted by /u/Leo_D517 [link] [comments]  ( 47 min )
  • Open

    There’s a woset in my reposit
    The other day I was talking with someone I met while I was doing my postdoc at Vanderbilt and Eric Schechter’s book Handbook of Analysis and its Foundations came up. Eric was writing that book while we were there. He kindly listed me in the acknowledgements for having reviewed a few pages of the book. […] There’s a woset in my reposit first appeared on John D. Cook.  ( 5 min )
  • Open

    7 powerful tips to make AI your most PROFITABLE employee
    GPT-4 is insanely powerful. 99% are using it the wrong way. Context: I’ll be using ChatGPT’s version of GPT-4 to show its power. These prompt principles will change the game for your writing. 1/ Make AI an actor Give the AI a role. Tell it the persona you need it to emulate. If you leave this up to chance, AI will make too many assumptions. For example: "Act like you're a first-time founder of a software company." 2/ Give unique instructions ChatGPT can imitate common styles of writing. It can even follow well-known formulas and frameworks. For example: Tell ChatGPT to write a headline using principles from "Breakthrough Advertising". 3/ Go heavy on context Bad prompt, bad result. Simple as that. The more context you give, the better your responses will be. Try giving basic context like: → Tone → Length → Target audience → Desired outcome → Where the content will go 4/ Break your request into smaller parts If you ask for a whole blog post at once, it won't be as good. If you break the request into smaller parts, you can get great content out of it. A good rule of thumb is to ask for a max of 300 words at a time. 5/ Segment your asks Most requests in ChatGPT fall into 1/3 buckets: → Research → Creation → Improvement Research prompts can have less context. Creation prompts need the most context. Improvement prompts require the most back and forth. 6/ Ask for improvements If you don't correct the AI explicitly, it will assume it's on the right track. If outputs are aligning with expectations, it's the prompt's fault. Fix it by stating what you want changed in the chat. Your prompts determine your results. 7/ Make AI ask you the questions Maybe the most underrated tip here. If you're really stuck, or just want to look at a problem through different angles... Make AI think of all the angles for you. Working from a list of questions helps you generate more ideas yourself. submitted by /u/MrJamesOrr [link] [comments]  ( 43 min )
    Watch while you're High - Trippy Mushrooms - Art Nouveau (AI Dream 200)
    submitted by /u/LordPewPew777 [link] [comments]  ( 41 min )
    GPT4 Lied to Me!! 🤣 And Then Apologized for the ERROR!
    submitted by /u/manhesh [link] [comments]  ( 41 min )
    Can bard ai solve new competitive programming questions?
    I can't access it since I live in an unsupported country, and yes I tried using a vpn, it does not solve this issue submitted by /u/SohaibTheHuman [link] [comments]  ( 41 min )
    Multimodal Models are the Future (Deep Dive)
    submitted by /u/WaffleHouseBaby [link] [comments]  ( 49 min )
    Bard waitlist time?
    Anyone has gotten in? How long does it take? submitted by /u/heiferwithcheese [link] [comments]  ( 41 min )
    GPT impact on Micron ($MU) Stock.
    As I understand it, ChatGPT 3.0 could be run fairly efficiently on any system that has 100GB of VRAM. The amount of processing power isn't nearly as important as being able to fit the entire model in VRAM. It seems to me that very soon models will require something on the order of a terrabyte of VRAM to inference. It's confusing to me then why Nvidia stock has doubled as GPT3 and GPT4 have come out but Micron stock has stagnated. The memory (Micron) is what makes a bigger model possible, not the compute (Nvidia). Disclaimer: When I say "Micron" I mean any DRAM manufacturer (including SK hynix or Samsung). Micron is just easy to trade. submitted by /u/sasksean [link] [comments]  ( 7 min )
    David Muir: World's Most Daring Reporter? | Abc World News Tonight
    submitted by /u/Calatravo [link] [comments]  ( 41 min )
    College checking essays for AI?
    My (community) college stated that it would be checking our assignments for the use of AI. I thought that'd be fine since I only use Chatgpt (and Jenni) in school settings for grammar checks since Grammarly is out of my budget at $30/month. I've always enjoyed writing so obviously I'd like to do it on my own and only use AI as a "beta-reader" of sorts. Anyways, I had the idea to ask Chatgpt if it had written my assignments that I had done before I had access to the AI. I recognized a massive problem as it's saying "yes I wrote that passage" to things I wrote before ChatGPT even existed!! I am going to college to be a PSW and medical receptionist but I also am taking a Communications course. For context, my class only has 2 students, my campus is in a small town, and the size of the campus is probably around 3000 square feet AKA VERY SMALL. What do you guys think they're using to accurately check for AI? I'd like to test my (provably human-made) previous assignments with other AI programs that check... Also, my program officer did tell me that they are aware that Turnitin wrongly flags things as plagiarised when it's not. Can they really do this to students if the AI they use to check with isn't accurate? submitted by /u/Glad_Ease_8492 [link] [comments]  ( 42 min )
    First ever sentient artificial rhyme bot!
    submitted by /u/Powamotogang [link] [comments]  ( 41 min )
    I made J.A.R.V.I.S. using Chat GPT and Eleven labs
    submitted by /u/RideFuture [link] [comments]  ( 41 min )
    Crowd Computer LLM
    How feasible would a crowd computed large language model be from a technical aspect if someone were to obtain a large enough training data set? submitted by /u/UberSuperBoss [link] [comments]  ( 42 min )
    How to use Control net Batch processing to make Videos with Colab, Defor...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 6 min )
    Superflows is ChatGPT for 1-click email replies. Working in tech, I got annoyed at spending all my time in my inbox, so thought I'd build something to fix it
    submitted by /u/Ok-Craft-9908 [link] [comments]  ( 42 min )
    This New Text to Video AI is The Future of Movies!
    submitted by /u/PuppetHere [link] [comments]  ( 41 min )
    Pink Alcohol Ink Art | Pink wallpaper iphone
    submitted by /u/GaylordTurner [link] [comments]  ( 41 min )
    Nvidia's Picasso brings generative AI to Adobe, Getty Images and Shutterstock
    submitted by /u/Number_5_alive [link] [comments]  ( 41 min )
    RESEARCHERS REVEAL THAT PAPER ABOUT ACADEMIC CHEATING WAS GENERATED USING CHATGPT
    submitted by /u/TallSide7746 [link] [comments]  ( 41 min )
    From Narrow AI to Self-Improving AI: Are We Getting Closer to AGI?
    submitted by /u/RushingRobotics_com [link] [comments]  ( 6 min )
    We gotta go bigger…!
    submitted by /u/sharkymcstevenson2 [link] [comments]  ( 43 min )
    Fourier Transformations Reveal How AI Learns Complex Physics
    submitted by /u/Tao_Dragon [link] [comments]  ( 42 min )
    ChatGPT Glitch Exposes Private Chats To Random Users
    submitted by /u/vadhavaniyafaijan [link] [comments]  ( 41 min )
    Alpaca.cpp | Llama + Alpaca-13b + 64 COARS | ./Release/chat -t 120 -m ggml-alpaca-13b-q4
    Llama + Alpaca-13b + 64 COARS | ./Release/chat -t 120 -m ggml-alpaca-13b-q4 - YouTube Alpaca.cpp demo https://github.com/antimatter15/alpaca.cpp submitted by /u/APUsilicon [link] [comments]  ( 42 min )
    90s Grandma Illuminated In A Captivating Photorealistic Image - Photographic Moment Of '30s Dorothea Lange
    submitted by /u/Calatravo [link] [comments]  ( 41 min )
  • Open

    Any work try to play PC game with screenshot as obs and key/mouse as action?
    I am quite curious about whether there exist projects trying to solve real gaming problem with the very raw observation (screenshot) and raw action (imitating human player controlling key and mouse). I know a work from OpenAI called Video PreTraining (VPT) try to play Minecraft with screen/key/mouse. Can anyone share more works on this? submitted by /u/pengzhenghao [link] [comments]  ( 42 min )
    is there a community for RL? im looking at potentially asking others to join my research project but its hard to find anywhere thats appropriate to ask.
    I have been working on a very interesting project that aims to create an ensemble of models for a range of tasks in the Meta-ML domain. As someone who has had limited exposure to others interested in AI and has recently started exploring the field, I've made considerable progress on my own. However, I'm reaching out to find people with diverse perspectives and backgrounds who might be interested in joining me. The project involves designing models, developing workflows, and identifying data sources. Despite my relatively short time in the AI realm, I've come up with some novel approaches, such as custom hyperparameter tuning systems and convolutional layering methods, that I believe will help improve the models' ability to learn relationships in clean data while also allowing them to func…  ( 45 min )
    (Noob Question) Action-State Space Representation for Flappy Bird game
    Hello, I am new to the field, as I am about to finish my first course about it at school. As an assignment, the professor gave us to implement two Agents for playing the Flappy Bird Game in an environment provided by a fork to the gym library. At each step, the environment ouputs a tuple with the horizontal and vertical distances to the center of the gap between the pipes, as (h_dist, v_dist), and an integer height, representing the current height of the bird (i.e.: distance from the ground). Another important variable is the pipe_offset one, which represents the width of the pipe to pass through as an integer, used to initialize the environment. I want to implement the SARSA algorithm for TD control (both on and off policy) and I have some doubts on how to represent the Action-Stat…  ( 46 min )
    My agent seems to be learning but not on a stable way
    Hi! I am training a single agent on a environment based on Minigrid and i have run an experiment using Conv2D, and i got this avg reward on the last 20 episodes. It seems that my agent learns but at some point it is falling on local minimums? The structure of my neural network, both actor and critic are: Conv2D (16 filters) --> Conv2D (32 filters) --> Flatten --> Dense(128) --> Dense(64) --> Output All activated by ReLU and using Adam. This is the plot: https://preview.redd.it/iqxw7k8002pa1.jpg?width=622&format=pjpg&auto=webp&s=4ec2eaef54af80f4a13bdb28f20bdd29eeda5f74 Thanks in advance! submitted by /u/byMgmz [link] [comments]  ( 43 min )
  • Open

    Remote monitoring of raw material supply chains for sustainability with Amazon SageMaker geospatial capabilities
    Deforestation is a major concern in many tropical geographies where local rainforests are at severe risk of destruction. About 17% of the Amazon rainforest has been destroyed over the past 50 years, and some tropical ecosystems are approaching a tipping point beyond which recovery is unlikely. A key driver for deforestation is raw material extraction […]  ( 11 min )
    Best practices for viewing and querying Amazon SageMaker service quota usage
    Amazon SageMaker customers can view and manage their quota limits through Service Quotas. In addition, they can view near real-time utilization metrics and create Amazon CloudWatch metrics to view and programmatically query SageMaker quotas. SageMaker helps you build, train, and deploy machine learning (ML) models with ease. To learn more, refer to Getting started with […]  ( 5 min )
    Build custom code libraries for your Amazon SageMaker Data Wrangler Flows using AWS Code Commit
    As organizations grow in size and scale, the complexities of running workloads increase, and the need to develop and operationalize processes and workflows becomes critical. Therefore, organizations have adopted technology best practices, including microservice architecture, MLOps, DevOps, and more, to improve delivery time, reduce defects, and increase employee productivity. This post introduces a best practice […]  ( 12 min )
  • Open

    NVIDIA to Bring AI to Every Industry, CEO Says
    ChatGPT is just the start. With computing now advancing at what he called “lightspeed,” NVIDIA founder and CEO Jensen Huang today announced a broad set of partnerships with Google, Microsoft, Oracle and a range of leading businesses that bring new AI, simulation and collaboration capabilities to every industry. “The warp drive engine is accelerated computing, Read article >  ( 11 min )
    Fresh-Faced AI: NVIDIA Avatar Solutions Enhance Customer Service and Virtual Assistants
    Companies across industries are looking to use interactive avatars to enhance digital experiences. But creating them is a complex, time-consuming process requiring state-of-the-art AI models that can see, hear, understand and communicate with end users. To ease this process, NVIDIA is providing creators and developers with real-time AI solutions through Omniverse Avatar Cloud Engine (ACE), Read article >  ( 5 min )
    NVIDIA Metropolis Ecosystem Grows With Advanced Development Tools to Accelerate Vision AI
    With AI at its tipping point, AI-enabled computer vision is being used to address the world’s most challenging problems in nearly every industry. At GTC, a global conference for the era of AI and the metaverse running through Thursday, March 23, NVIDIA announced technology updates poised to drive the next wave of vision AI adoption. Read article >  ( 6 min )
    NVIDIA Studio at GTC: New AI-Powered Artistic Tools, Feature Updates, NVIDIA RTX Systems for Creators
    Powerful AI technologies are revolutionizing 3D content creation — whether by enlivening realistic characters that show emotion or turning simple texts into imagery. The brightest minds, artists and creators are gathering at NVIDIA GTC, a free, global conference on AI and the metaverse, taking place online through Thursday, March 23.  ( 9 min )
    From Concept to Production to Sales, NVIDIA AI and Omniverse Enable Automakers to Transform Their Entire Workflow
    The automotive industry is undergoing a digital revolution, driven by breakthroughs in accelerated computing, AI and the industrial metaverse. Automakers are digitalizing every phase of the product lifecycle — including concept and styling, design and engineering, software and electronics, smart factories, autonomous driving and retail — using the NVIDIA Omniverse platform and AI. Based on Read article >  ( 7 min )
    From Training AI in the Cloud to Running It on the Road, Transportation Leaders Trust NVIDIA DRIVE
    Transportation industry trailblazers are propelling their next-generation vehicles by building on NVIDIA DRIVE end-to-end solutions, which span the cloud to the car. The world’s best-selling new energy vehicle (NEV) brand BYD announced at NVIDIA GTC that it’s using the NVIDIA DRIVE Orin centralized compute platform to power an even wider range of vehicles within its Read article >  ( 6 min )
    Mitsui and NVIDIA Announce World’s First Generative AI Supercomputer for Pharmaceutical Industry
    Mitsui & Co., Ltd., one of Japan’s largest business conglomerates, is collaborating with NVIDIA on Tokyo-1 — an initiative to supercharge the nation’s pharmaceutical leaders with technology, including high-resolution molecular dynamics simulations and generative AI models for drug discovery. Announced today at the NVIDIA GTC global AI conference, the Tokyo-1 project features an NVIDIA DGX Read article >  ( 7 min )
    Omniverse at Scale: NVIDIA Announces Third-Generation OVX Computing Systems to Power Industrial Metaverse Applications
    Digitalization that combines AI and simulation is redefining how industrial products are created and transforming how people interact with the digital world. To help enterprises tackle complex new workloads, NVIDIA has unveiled the third generation of its NVIDIA OVX computing system. OVX is designed to power large-scale digital twins built on NVIDIA Omniverse Enterprise, a Read article >  ( 5 min )
    100+ Partners Bring NVIDIA Clara AI Healthcare Platform to Enterprises Worldwide
    Healthcare enterprises globally are working with NVIDIA to drive AI-accelerated solutions that are detecting diseases earlier from medical images, delivering critical insights to care teams and revolutionizing drug discovery workflows. NVIDIA Clara, a suite of software and services that powers AI healthcare solutions, is enabling this transformation industry-wide. The Clara ecosystem includes BioNeMo for drug Read article >  ( 7 min )
    NVIDIA Omniverse Accelerates Game Content Creation With Generative AI Services and Game Engine Connectors
    Powerful AI technologies are making a massive impact in 3D content creation and game development. Whether creating realistic characters that show emotion or turning simple texts into imagery, AI tools are becoming fundamental to developer workflows — and this is just the start. At NVIDIA GTC and the Game Developers Conference (GDC), learn how the Read article >  ( 7 min )
    BMW Group Starts Global Rollout of NVIDIA Omniverse
    BMW Group is at the forefront of a key new manufacturing trend — going digital-first by using the virtual world to optimize layouts, robotics and logistics systems years before production really starts. The automaker announced today with NVIDIA at GTC that it’s expanding its use of the NVIDIA Omniverse platform for building and operating industrial Read article >  ( 6 min )
    NVIDIA and Partners Ecosystem Release New Omniverse Connections, Expanding Foundation for Artists and Developers to Advance 3D Workflows
    Developers and creators can better realize the massive potential of generative AI, simulation and the industrial metaverse with new Omniverse Connectors and other updates to NVIDIA Omniverse, a platform for creating and operating metaverse applications. Omniverse Cloud, a platform-as-a-service unveiled today at NVIDIA GTC, equips users with a range of simulation and generative AI capabilities Read article >  ( 7 min )
    NVIDIA Expands Isaac Software Access and Jetson Platform Availability, Accelerating Robotics From Cloud to Edge
    NVIDIA announced today at GTC that Omniverse Cloud will be hosted on Microsoft Azure, increasing access to Isaac Sim, the company’s platform for developing and managing AI-based robots. The company also said that a full lineup of Jetson Orin modules is now available, offering a performance leap for edge AI and robotics applications. “The world’s Read article >  ( 6 min )
    AI Speeds Insurance Claims Estimates for Better Policyholder Experiences
    CCC Intelligent Solutions (CCC) has become the first company in the auto insurance industry to deliver an AI-powered repair estimating solution, called CCC Estimate – STP, short for straight-through processing. The Chicago-based auto-claims technology powerhouse uses AI, insurer-driven rules and CCC’s vast ecosystem to deliver repair estimates in seconds, instead of days. It’s a technological Read article >  ( 6 min )
    Moving Pictures: NVIDIA, Getty Images Collaborate on Generative AI
    As a sports commentator for a professional lacrosse team, Grant Farhall knows the value in having the right teammates. As the chief product officer for Getty Images, a global visual-content creator and marketplace, he believes the collaboration between his company and NVIDIA is an excellent pairing for taking generative AI to the next level. The Read article >  ( 5 min )
    Mind the Gap: Large Language Models Get Smarter With Enterprise Data
    Large language models available today are incredibly knowledgeable, but act like time capsules — the information they capture is limited to the data available when they were first trained. If trained a year ago, for example, an LLM powering an enterprise’s AI chatbot won’t know about the latest products and services at the business. With Read article >  ( 6 min )
    Green Light: NVIDIA Grace CPU Paves Fast Lane to Energy-Efficient Computing for Every Data Center
    The results are in, and they point to a new era in energy-efficient computing. In tests of real workloads, the NVIDIA Grace CPU Superchip scored 2x performance gains over x86 processors at the same power envelope across major data center CPU applications. That opens up a whole new set of opportunities. It means data centers Read article >  ( 6 min )
    NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
    Microsoft, Tencent and Baidu are adopting NVIDIA CV-CUDA for computer vision AI. NVIDIA CEO Jensen Huang highlighted work in content understanding, visual search and deep learning Tuesday as he announced the beta release for NVIDIA’s CV-CUDA — an open-source, GPU-accelerated library for computer vision at cloud scale. “Eighty percent of internet traffic is video, user-generated Read article >  ( 6 min )
  • Open

    Hyperparameter Tuning Techniques in Machine Learning Engineering
    Image designed by the author – Shanthababu Introduction Every ML Engineer and Data Scientist must understand the significance of “Hyperparameter Tuning (HPs-T)” while selecting the right machine/deep learning model and improving the performance of the model(s). To make it simple, for every single machine learning model selection is a major exercise and it is purely dependent on… Read More »Hyperparameter Tuning Techniques in Machine Learning Engineering The post Hyperparameter Tuning Techniques in Machine Learning Engineering appeared first on Data Science Central.  ( 27 min )
    To Build a Future-Focused Career, Learn Blockchain Technology
    Blockchain is a trending topic in the industry as more organizations are using the technology to improve their business operations and process payments between various clients. Modern organizations are preparing to use blockchain technology amid the global pandemic. Many countries are exploring the option of blockchain in navigating the COVID-19 pandemic, especially in the supply… Read More »To Build a Future-Focused Career, Learn Blockchain Technology The post To Build a Future-Focused Career, Learn Blockchain Technology appeared first on Data Science Central.  ( 20 min )
  • Open

    GPT 5 next gen : 7 upcoming abilities to transform AI and the future of tech
    submitted by /u/HastyNationality [link] [comments]  ( 41 min )
    Zero-1-to-3: Zero-shot One Image to 3D Object
    submitted by /u/nickb [link] [comments]  ( 6 min )

  • Open

    Extreme Trippy Jungle Fever Hallucinations (AI Dream 154)
    submitted by /u/LordPewPew777 [link] [comments]  ( 41 min )
    MaMi 💄 Your Personal Assistant
    submitted by /u/freshthreadshop [link] [comments]  ( 41 min )
    GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
    submitted by /u/Fabrizio89 [link] [comments]  ( 41 min )
    I used Kaiber AI for my new song's music video, and the result made me super emotional!
    My name is Meirav Hellinger and I am a singer songwriter from Israel. Around two weeks prior to the release of my new song, I started to feel like this super personal pop ballad song i wrote a while ago, needed an animated visual element to is. it is like this song was asking for an animated music video, But as an indie pop musician you often find yourself need to think different and find a way to do the right thing artistically end on a budget. So I found myself put in my trust in artificial intelligence, or as most of you call it AI. I used kaiber AI to create this music video and the result caught me by surprise. Hope you will find it interesting as well:) Link to video: https://www.youtube.com/watch?v=A2dZhKtyvJY submitted by /u/meirav_hellinger [link] [comments]  ( 42 min )
    Reliable text-to-image AI statistics
    Hi. I'm in the process of writing my bachelor thesis about Text-to-image AI applications, but I can't find any reliable source of statistical informations (maybe active users, etc.) about most popular apps (Dall-E 2, Midjourney, Stable Diffusion). Are they available anywhere? What other data I can include and where to find it? Thank You and have a nice day! submitted by /u/HiguU [link] [comments]  ( 43 min )
    Help finding prompt writing guide for blender-style dessert food art
    I recall seeing a step-by-step guide on how to generate this dynamic beautiful blender-style food art. He generated multiple images with different desserts in different scenes but all had a very consistent style. One image might’ve had a donut falling into a cup of milk with the milk splashing up. I’ve searched Medium, Google Images, Instagram and even LinkedIn - anyone I thought I’ve might’ve seen the post but can’t locate it. Does anyone know what I’m talking about? If you, please could you provide a link? submitted by /u/Mundane_Plenty8305 [link] [comments]  ( 7 min )
    Custom AI Model creation question
    I'm trying to learn how to make a custom AI model but everywhere I search for it is using an AI model that was already created. My goal is to make a car model detection AI model but I can't find anything on the topic of car recognition besides already created models that only recognize a car in the image but not the model. How should I go about making this project? submitted by /u/Alphac3ll [link] [comments]  ( 42 min )
    Weekly China AI News: Baidu’s ERNIE Bot, Kai-Fu Lee’s AI Venture, ChatGLM, and Ren Zhengfei's Opinion on ChatGPT
    submitted by /u/kizumada [link] [comments]  ( 41 min )
    World's first full text to video a.i music video madeusing Model Scope a.i
    submitted by /u/kingcon2k11 [link] [comments]  ( 6 min )
    Realtime conversational email assistant with lifelike voice
    submitted by /u/justLV [link] [comments]  ( 42 min )
    Fans Are Using Artificial Intelligence to Turn Rap Snippets Into Full Songs
    submitted by /u/fiishyfiishy [link] [comments]  ( 41 min )
    Weekend Project - Invent List
    I built this website. http://inventlist.com — Hand-curated AI projects to explore and learn from. Let's dive in and discover the possibilities of this exciting field together! Which AI product are you using nowadays and find interesting? submitted by /u/mddanishyusuf [link] [comments]  ( 41 min )
    AI Detector
    hey. does anyone have a paraphrasing website that efficiently bypasses ai detection? submitted by /u/topvisuals [link] [comments]  ( 41 min )
    Is GPT-4 coming for your job? OpenAI has a prediction
    submitted by /u/Peaking_AI [link] [comments]  ( 41 min )
    How are websites going to prevent AI's from completing captcha's?
    Given how advanced AI's are becoming, I'm foreseeing loads of bad actors creating massive amounts of accounts on various websites that require the completion of a captcha to register. Not just registering, numerous other things that websites use captchas for, such as payments, postings, etc. Genuinely alarming times we are about to be in. submitted by /u/Nanochip [link] [comments]  ( 42 min )
    What happens when every app out there has a chat feature?
    So we are reliving the era where every website/app suddenly had 'stories' and copied Snap. Except this time around everyone is adding chat. Part of me thinks this is an exciting feature where every IoT device has voice control, and every app has chat. So we can basically interact in a very natural form with computers. But another part of me thinks it'd suck to have all these different conversations with each app. Ideally I'd just talk to my phone/AI/computer and it'd have full context of all conversations. What do you think the future of these AI chat features will look like? submitted by /u/geepytee [link] [comments]  ( 7 min )
    StabilityAI built a plugin for Photoshop, what an absolute game changer! Designing just got so much more interesting. Every week, there's something new in AI, got this lead from an AI Roundup here:
    submitted by /u/adititalksai [link] [comments]  ( 41 min )
    Text-To-Video Generation Model Gen-2 is introduced by Runway as multi-modal AI system that can generate novel videos with text, images, or video clips.
    submitted by /u/CyberKaliyugiNepali [link] [comments]  ( 44 min )
    ChatGPT has been learning through responses over time
    before it was iffy about the current month https://aiarchives.org/id/oCVHGNHq0ryNXHexaTbv now very confident about date and month https://aiarchives.org/id/2RVFVcrwDlCFGeXcjXZT submitted by /u/Manuallyforeshow134 [link] [comments]  ( 6 min )
    Everything is a Remix
    submitted by /u/smorga [link] [comments]  ( 41 min )
    People still behaving like everything is normal!
    submitted by /u/1024cities [link] [comments]  ( 41 min )
    Looking for article recommendations to continue exploring the future of commercialization of and business models behind generative AI businesses (e.g., commercializing infrastructure, foundation models, and applications built with generative AI). Please list any suggestions in the comments. Thanks!
    submitted by /u/----bubba---- [link] [comments]  ( 41 min )
    I built a GPT web app to automate searching and parsing answers from Google
    submitted by /u/geepytee [link] [comments]  ( 41 min )
    AI voice changing is a game changer
    submitted by /u/Xerxos [link] [comments]  ( 41 min )
    I Made an AI to analyse earnings call
    I've developed an AI to analyze earnings call sentiment, which can help us cut through the noise. Here are some examples of popular stocks Apple Tesla silicon valley bank ​ my goal is provide a friendly and professional approach to investing, so anyone can do it, nomatter the background and i look forward to hearing what you guys think submitted by /u/SayNo2Tennis [link] [comments]  ( 41 min )
  • Open

    [D] 3070 Ti (8GB) vs 3060 (12GB) for Machine Learning
    Hello, I'm interested in learning about machine learning and using my GPU for it. I bought a 3070 Ti since it is a pretty good GPU and I got it at a good price. (aside from AI, I'll use it for Blender rendering and some games, for which it'll be better at than the 3060) But after getting it I realized that for AI applications people prefer cards with more VRAM, and the 3070 Ti has only 8GB while the 3060 has 16GB. For example, this seemed interesting but they didn't mention it working with the 3070 Ti or even the 3070. Should I have gotten a 3060 instead? A 3080 is possible too but I didn't really think it was worth it. submitted by /u/ScienceByte [link] [comments]  ( 44 min )
    [D] Is ML doomed to end up closed-source?
    So basically, OpenAI is keeping its models a secret, Hugging Face added a new gated feature, and LLaMA is using a non-commercial license. It looks like companies are all moving towards closed-source and monopolizing ML. I've always loved Hugging Face, but now they are doing the opposite of what they preach with this new gated feature thing, this is just not open-source and shouldn't be encouraged in the first place. Open AI clearly stated that you can't "use output from the Services to develop models that compete with OpenAI" Google shared its paper Attention Is All You Need transparently which was a breakthrough in NLP and got utilized by OpenAI (with many other papers) to build GPT-4 which is adopted by Bing and now posing risk to Google's business. As a consequence, could companies start to avoid sharing research openly and rather monopolize their work for the sake of their own business safety? Also, assuming we will witness more of these closed-source models. is it safe to just trust them without understanding what data they got exactly trained on? This doesn't seem to make sense, not sure how this would end up. submitted by /u/SuccessfulLeadership [link] [comments]  ( 50 min )
    [R][D] Papers on Transductive Learning
    Hi all, I'm trying to find some good papers on transductive learning. I'm looking for newly published ones in general however papers which aren't that recent but had a good impact would also be nice to read. I've been searching for some papers however I do not want to miss out on the really good ones. So could anyone suggest papers on transductive learning which you think that I should not miss out? Also I'm not sure if this is the right subreddit for this but, there is something which I'm struggling with recently. I have to conduct my literature review but it's too difficult really. And it takes too long to understand an article. Do you guys also have some suggestions on how I could read an article more efficiently so that I could read multiple articles in a single day? submitted by /u/theMightyAvokado [link] [comments]  ( 43 min )
    [P] OpenAssistant is now live on reddit (Open Source ChatGPT alternative)
    OpenAssistant bot is live on /r/ask_open_assistant. There are some limitations to the reddit bot; you can also try on the model in chat mode at https://huggingface.co/spaces/olivierdehaene/chat-llm-streaming. Model is available for free download at https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b. Prompt it by creating a new text post (responds to text body of post), starting a comment with !OpenAssistant, or by replying directly to it. submitted by /u/pixiegirl417 [link] [comments]  ( 44 min )
    [D] Hyperparameter robustness of RL algorithms
    I've gone through a lot of RL algorithms recently and a lot of them seem to be very sensitive to hyperparameters with performances varying by degrees of +/-10 in some cases in a scale of 100. Do reviewers consider them as limitations when evaluating these algorithms and yet they still get published, what's the way forward in the field of RL to reduce this? submitted by /u/Cool_Abbreviations_9 [link] [comments]  ( 43 min )
    [D] Determining quality of training images with some metrics
    Hello ML sub, How does one evaluate the quality of training images before actually training a model ? Training a model is surely expensive. What if one had a way of sort of ascertaining that the image quality of a training set for a particular task (say object detection or semantic segmentation etc) ? It doesn't have to be perfect but some kind of hint... Could you please point me to some papers or studies or discussions on this ? There are some objective metrics like PSNR or SSIM but they need a reference image submitted by /u/i_sanitize_my_hands [link] [comments]  ( 7 min )
    [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset
    How to fine-tune Facebooks 30 billion parameter LLaMa on the Alpaca data set. Blog post: https://abuqader.substack.com/p/releasing-alpaca-30b Weights: https://huggingface.co/baseten/alpaca-30b submitted by /u/imgonnarelph [link] [comments]  ( 45 min )
    [P] Make AI Robust and Trustworthy with CAPSA!
    Modern AI models show great potential across various applications, but their deployment in everyday life is limited due to a lack of trustworthiness. While accuracy is crucial, AI models must also recognize when they can and cannot be trusted to make decisions, especially in safety-critical systems. To bridge this gap, it’s essential to develop AI models with built-in trust mechanisms for reliable decision-making in real-world scenarios. Trustworthiness in AI models can be improved by addressing three risk sources: Representation Bias, Epistemic Uncertainty, and Aleatoric Uncertainty. ​ Representation Bias refers to the potential for the model to favor certain groups or types of data over others, leading to inaccuracies in its predictions with under-represented data. Epistemic Uncert…  ( 45 min )
    CoLT5: Faster Long-Range Transformers with Conditional Computation
    submitted by /u/nlprogress [link] [comments]  ( 43 min )
    IJCAI 2023 Reviews discussion [D]
    This is my first time submitting to IJCAI. Any comments on how to respond to the reviews are welcome. Any help is appreciated. submitted by /u/BigJuggernaut7380 [link] [comments]  ( 43 min )
    [D]: Vanishing Gradients and Resnets
    I am working with Resnets consisting of feedforward networks. Additionally, I am using Kaiming-He weight initialisation and ReLU as an activation function. Extending the network to more than 10 layers leads to vanishing gradients. I cannot use batch normalization because that would violate assumptions of a gradient penalty. What should I do? Should I form residual connections over longer steps? Should I implement artificial derivatives? What's the common remedy here? submitted by /u/Blutorangensaft [link] [comments]  ( 43 min )
    [R] CodeAlpaca - Instruction following model to generate code
    Finetuned Stanford Alpaca on code generation instructions. Demo - https://code-alpaca-demo.vercel.app/ Working on open sourcing the code and data. submitted by /u/immune_star [link] [comments]  ( 45 min )
    ICML rebuttals still not visible to reviewers? [D]
    The rebuttals for ICML submissions were supposed to become visible to reviewers at 3pm ET on Sunday, with the author-reviewer discussion period beginning today at 10am ET, but OpenReview says my rebuttals are not visible to the reviewers yet. Is anyone else having this problem? Am worried I somehow made them only visible to the chairs submitted by /u/snaquiche [link] [comments]  ( 44 min )
    [P] Action Recognition in Computer Vision
    Action recognition is a difficult task. Mainly because of its vagueness. Type of actions, duration, ontology, etc. I made a complete overview of the problem relevant today. Here is a collection of approaches, networks, and problem statements. It will help you clarify the task the next time you meet it. https://medium.com/@zlodeibaal/action-recognition-in-the-wild-9eb7f12b4d12 submitted by /u/Wormkeeper [link] [comments]  ( 43 min )
    Smarty-GPT: wrapper of prompts/contexts [P]
    This is a simple wrapper that introduces any imaginable complex context to each question submitted to Open AI API. The main goal is to enhance the accuracy obtained in its answers in a TRANSPARENT way to end users. https://github.com/citiususc/Smarty-GPT submitted by /u/usc-ur [link] [comments]  ( 43 min )
    [Discussion] In which way could Machine Learning be useful for a journaling app?
    As per Title, I would like to use Machine Learning to make the journaling expierence better. I got some Questions in that Regard. Firstly, is it meaningful to host the model on the users device, to keep the Data safe? What do you suggest would be a useful way? A chat that response to the entries? Or meaningful prompts? Should the Model learn only from what the user has written in his journal or be pretrained of scientific data or other data to respond to what the user has written accordingly? submitted by /u/Mojomoto93 [link] [comments]  ( 44 min )
    [News] Prompt engineering blog from OpenAI applied AI head
    submitted by /u/LouisAckerman [link] [comments]  ( 43 min )
    [D] IJCAI 2023 Rebuttal Discussion
    Title submitted by /u/snu95 [link] [comments]  ( 44 min )
    [D] Best ChatBot that can be run locally?
    What do you guys think is currently the best ChatBot that you can download and run offline? After hearing that Alpaca has results similar to GPT-3, I was curious if anything else competes. submitted by /u/rustymonster2000 [link] [comments]  ( 45 min )
  • Open

    What are tuning improvements that can be made to a basic DQN?
    I have followed this YouTube tutorial that can be run with my environment, but it seems relatively basic (for context, the game in the video is quite simple while the environment I am using is like a more complex chess) I have heard of DDQN and other improvements to DQN, but was wondering if there is anything within basic DQN (like maybe stuff to do with the network) that can be tuned to produce better results (Thanks for taking the time to look at this) submitted by /u/PainisPingas [link] [comments]  ( 42 min )
    What has replaced OpenAI Retro Gym?
    OpenAI Retro Gym hasn't been updated in years, despite being high profile enough to garner 3k stars. It doesn't even support Python 3.9, and needs old versions of setuptools and gym to get installed. Can anything else replaced it? The closest thing I could find is MAMEToolkit, which also hasn't been updated in years. Arcade Learning Environment is OK but it's just for Atari 2600. I'm looking for an RL framework which can generate envs that interact with game emulators. submitted by /u/PlasticShape [link] [comments]  ( 43 min )
    Skills expected from RL folks
    Hi everyone. I want to get into roles that are focussed on RL (can be research labs also). I dont have a PhD. I have a background in CV and DL applied to CV problems. Few of my interest areas are Robotics, Recommendation systems. I am open to explore other fields. I would like to what skills should I cultivate that will help get such roles or maybe standout from others. I have completed video lectures from David Silver and I am reading Sutton and Barto along with working on exercises and implementing various algorithms (help from CleanRL). I would want to know what else should I do to level up with people who have done Masters or PhD. Thanks in advance for your help. PS : I am working in a full time role in India and opting for masters or PhD is not viable because of financial constraints submitted by /u/chhaya_35 [link] [comments]  ( 43 min )
    How to deal with situations where the RL agent cannot act at every time step?
    I have a MDP where an RL agent is embedded inside a larger controller. The controller emits actions at every time step. But the RL agent can override the controller's action (and pick its own according to some implicit DQN policy) only at some time steps. Whether the action comes from the controller or the RL agent doesn't matter to the environment—it will return a new state and possibly some reward at every time step. Is the right way to model this to treat the controller as part of the environment of the RL agent? And operate according to a "partial" MDP where only the RL agent's actions are considered "true" actions, whereas the controller's actions are subsumed as part of the dynamics? In that case, I would have time steps of non-uniform duration, which seems potentially problematic. Or is there a better way to tackle this? Any help would be appreciated, thanks! submitted by /u/wardellinthehouse [link] [comments]  ( 46 min )
  • Open

    Detailed images from space offer clearer picture of drought effects on plants
    J-WAFS researchers are using remote sensing observations to build high-resolution systems to monitor drought.  ( 9 min )
  • Open

    Accelerate Amazon SageMaker inference with C6i Intel-based Amazon EC2 instances
    This is a guest post co-written with Antony Vance from Intel. Customers are always looking for ways to improve the performance and response times of their machine learning (ML) inference workloads without increasing the cost per transaction and without sacrificing the accuracy of the results. Running ML workloads on Amazon SageMaker running Amazon Elastic Compute […]  ( 8 min )
  • Open

    Tackling the Evolving Tech Landscape the Smarter Way
    We thought of listing down some tactics that can help you tackle digital challenges smartly. Check out the blog to know what we are talking about. The post Tackling the Evolving Tech Landscape the Smarter Way appeared first on Data Science Central.  ( 21 min )
    Top 6 Data Science Books for Beginners – 2023
    The data science field is growing fast, and the field has the vast potential to revolutionize how people live and work. With the increasing amount of data being produced, it has become more crucial for data science professionals to understand the tools and techniques to interrupt data. If you are a beginner or experienced data… Read More »Top 6 Data Science Books for Beginners – 2023 The post Top 6 Data Science Books for Beginners – 2023 appeared first on Data Science Central.  ( 21 min )
    Enterprise use cases for GPT-3: How to chat with your own data
    It’s easy to think of LLMs (large language models) as just ‘hallucinating’ or mere generators of text. A glorified LSTM so to speak. While there are some limitations of LLMs (and indeed they are evolving), a far more interesting question to explore is: How can LLMs be used in enterprise applications?  In many ways, enterprise… Read More »Enterprise use cases for GPT-3: How to chat with your own data The post Enterprise use cases for GPT-3: How to chat with your own data appeared first on Data Science Central.  ( 19 min )
    Export EDB to PST Exchange 2010 Powershell
    Exporting Exchange Database (EDB) files to Personal Storage Table (PST) files is a frequent job in Exchange Server management. This is done for a variety of purposes, including data backup, archiving, and moving to a new email server.  In this blog article, we will walk you through the process of how to export edb to… Read More »Export EDB to PST Exchange 2010 Powershell The post Export EDB to PST Exchange 2010 Powershell appeared first on Data Science Central.  ( 21 min )
    Future of Education: Application not Regurgitation of Knowledge – Part II
    AI technologies like ChatGPT are necessitating a fundamental overhaul of our educational systems and institutions.  Getting the right answers to predetermined tests is no longer sufficient in an age where AI can access, integrate, and recite knowledge billions if not trillions of times faster than the human mind. So, what are the skills, capabilities, and… Read More »Future of Education: Application not Regurgitation of Knowledge – Part II The post Future of Education: Application not Regurgitation of Knowledge – Part II appeared first on Data Science Central.  ( 23 min )
    E-commerce in 2023 — Top 5 Tech Trends that will Reshape the Industry
    The E-commerce industry has been at the forefront of the transformation in the era of technology — it has reshaped everything from how customers shop to how the whole e-commerce things operate. Technology has led to significant changes in the e-commerce and retail industries over the past few years. Consumers now have more access to… Read More »E-commerce in 2023 — Top 5 Tech Trends that will Reshape the Industry The post E-commerce in 2023 — Top 5 Tech Trends that will Reshape the Industry appeared first on Data Science Central.  ( 26 min )
  • Open

    Solid angle of a star
    The apparent size of a distant object can be measured by projecting the object onto a unit sphere around the observer and calculating the area of the projected image. A unit sphere has area 4π. If you’re in a ship far from land, the solid angle of the sky is 2π steradians because it takes […] Solid angle of a star first appeared on John D. Cook.  ( 6 min )
  • Open

    Smarty-GPT: wrapper of prompts/contexts
    This is a simple wrapper that introduces any imaginable complex context to each question submitted to Open AI API. The main goal is to enhance the accuracy obtained in its answers in a TRANSPARENT way to end users. https://github.com/citiususc/Smarty-GPT submitted by /u/usc-ur [link] [comments]  ( 41 min )
  • Open

    Geometric Deep Learning for Molecular Crystal Structure Prediction. (arXiv:2303.10140v1 [cond-mat.mtrl-sci])
    We develop and test new machine learning strategies for accelerating molecular crystal structure ranking and crystal property prediction using tools from geometric deep learning on molecular graphs. Leveraging developments in graph-based learning and the availability of large molecular crystal datasets, we train models for density prediction and stability ranking which are accurate, fast to evaluate, and applicable to molecules of widely varying size and composition. Our density prediction model, MolXtalNet-D, achieves state of the art performance, with lower than 2% mean absolute error on a large and diverse test dataset. Our crystal ranking tool, MolXtalNet-S, correctly discriminates experimental samples from synthetically generated fakes and is further validated through analysis of the submissions to the Cambridge Structural Database Blind Tests 5 and 6. Our new tools are computationally cheap and flexible enough to be deployed within an existing crystal structure prediction pipeline both to reduce the search space and score/filter crystal candidates.  ( 2 min )
    Neighborhood Averaging for Improving Outlier Detectors. (arXiv:2303.09972v1 [cs.LG])
    We hypothesize that similar objects should have similar outlier scores. To our knowledge, all existing outlier detectors calculate the outlier score for each object independently regardless of the outlier scores of the other objects. Therefore, they do not guarantee that similar objects have similar outlier scores. To verify our proposed hypothesis, we propose an outlier score post-processing technique for outlier detectors, called neighborhood averaging(NA), which pays attention to objects and their neighbors and guarantees them to have more similar outlier scores than their original scores. Given an object and its outlier score from any outlier detector, NA modifies its outlier score by combining it with its k nearest neighbors' scores. We demonstrate the effectivity of NA by using the well-known k-nearest neighbors (k-NN). Experimental results show that NA improves all 10 tested baseline detectors by 13% (from 0.70 to 0.79 AUC) on average evaluated on nine real-world datasets. Moreover, even outlier detectors that are already based on k-NN are also improved. The experiments also show that in some applications, the choice of detector is no more significant when detectors are jointly used with NA, which may pose a challenge to the generally considered idea that the data model is the most important factor. We open our code on www.outlierNet.com for reproducibility.  ( 2 min )
    Self-Distillation for Further Pre-training of Transformers. (arXiv:2210.02871v2 [cs.CV] UPDATED)
    Pre-training a large transformer model on a massive amount of unlabeled data and fine-tuning it on labeled datasets for diverse downstream tasks has proven to be a successful strategy, for a variety of vision and natural language processing tasks. However, direct fine-tuning of the pre-trained model may be suboptimal if there exist large discrepancies across data domains for pre-training and fine-tuning. To tackle this issue, several previous studies have proposed further pre-training strategies, where we continue to pre-train the model on the target unlabeled dataset before fine-tuning. However, all of them solely focus on language models and we empirically find that a Vision Transformer is vulnerable to overfitting as we continue to pretrain the model on target unlabeled data. In order to tackle this limitation, we propose self-distillation as a regularization for a further pre-training stage. Specifically, we first further pre-train the initial pre-trained model on the target unlabeled data and then consider it as a teacher for self-distillation. Then we take the same initial pre-trained model as a student and enforce its hidden representations to be close to those of the teacher while optimizing the student with a masked auto-encoding objective. We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks. Experimentally, we show that our proposed method outperforms all the relevant baselines. Theoretically, we analyze the proposed method with a simplified model to understand how self-distillation for further pre-training can potentially help improve the performance of the downstream tasks.  ( 3 min )
    RGMIM: Region-Guided Masked Image Modeling for COVID-19 Detection. (arXiv:2211.00313v2 [cs.CV] UPDATED)
    Purpose: Self-supervised learning is rapidly advancing computer-aided diagnosis in the medical field. Masked image modeling (MIM) is one of the self-supervised learning methods that masks a subset of input pixels and attempts to predict the masked pixels. Traditional MIM methods often employ a random masking strategy. In comparison to ordinary images, medical images often have a small region of interest for disease detection. Consequently, we focus on fixing the problem in this work, which is evaluated by automatic COVID-19 identification. Methods: In this study, we propose a novel region-guided masked image modeling method (RGMIM) for COVID-19 detection in this paper. In our method, we devise a new masking strategy that employed lung mask information to identify valid regions to learn more useful information for COVID-19 detection. The proposed method was contrasted with five self-supervised learning techniques (MAE, SKD, Cross, BYOL, and, SimSiam). We present a quantitative evaluation of open COVID-19 CXR datasets as well as masking ratio hyperparameter studies. Results: When using the entire training set, RGMIM outperformed other comparable methods, achieving 0.962 detection accuracy. Specifically, RGMIM significantly improved COVID-19 detection in small data volumes, such as 5% and 10% of the training set (846 and 1,693 images) compared to other methods, and achieved 0.957 detection accuracy even when only 50% of the training set was used. Conclusions: RGMIM can mask more valid lung-related regions, facilitating the learning of discriminative representations and the subsequent high-accuracy COVID-19 detection. RGMIM outperforms other state-of-the-art self-supervised learning methods in experiments, particularly when limited training data is used. RGMIM also has the potential to be applied to other medical images.  ( 3 min )
    HIVE: Harnessing Human Feedback for Instructional Visual Editing. (arXiv:2303.09618v1 [cs.CV])
    Incorporating human feedback has been shown to be crucial to align text generated by large language models to human preferences. We hypothesize that state-of-the-art instructional image editing models, where outputs are generated based on an input image and an editing instruction, could similarly benefit from human feedback, as their outputs may not adhere to the correct instructions and preferences of users. In this paper, we present a novel framework to harness human feedback for instructional visual editing (HIVE). Specifically, we collect human feedback on the edited images and learn a reward function to capture the underlying user preferences. We then introduce scalable diffusion model fine-tuning methods that can incorporate human preferences based on the estimated reward. Besides, to mitigate the bias brought by the limitation of data, we contribute a new 1M training dataset, a 3.6K reward dataset for rewards learning, and a 1K evaluation dataset to boost the performance of instructional image editing. We conduct extensive empirical experiments quantitatively and qualitatively, showing that HIVE is favored over previous state-of-the-art instructional image editing approaches by a large margin.  ( 2 min )
    A Non-Asymptotic Framework for Approximate Message Passing in Spiked Models. (arXiv:2208.03313v2 [math.ST] UPDATED)
    Approximate message passing (AMP) emerges as an effective iterative paradigm for solving high-dimensional statistical problems. However, prior AMP theory -- which focused mostly on high-dimensional asymptotics -- fell short of predicting the AMP dynamics when the number of iterations surpasses $o\big(\frac{\log n}{\log\log n}\big)$ (with $n$ the problem dimension). To address this inadequacy, this paper develops a non-asymptotic framework for understanding AMP in spiked matrix estimation. Built upon new decomposition of AMP updates and controllable residual terms, we lay out an analysis recipe to characterize the finite-sample behavior of AMP in the presence of an independent initialization, which is further generalized to allow for spectral initialization. As two concrete consequences of the proposed analysis recipe: (i) when solving $\mathbb{Z}_2$ synchronization, we predict the behavior of spectrally initialized AMP for up to $O\big(\frac{n}{\mathrm{poly}\log n}\big)$ iterations, showing that the algorithm succeeds without the need of a subsequent refinement stage (as conjectured recently by \citet{celentano2021local}); (ii) we characterize the non-asymptotic behavior of AMP in sparse PCA (in the spiked Wigner model) for a broad range of signal-to-noise ratio.  ( 2 min )
    Leveraging Large Language Models for Multiple Choice Question Answering. (arXiv:2210.12353v3 [cs.CL] UPDATED)
    While large language models (LLMs) like GPT-3 have achieved impressive results on multiple choice question answering (MCQA) tasks in the zero, one, and few-shot settings, they generally lag behind the MCQA state of the art (SOTA). MCQA tasks have traditionally been presented to LLMs like cloze tasks. An LLM is conditioned on a question (without the associated answer options) and its chosen option is the one assigned the highest probability after normalization (for length, etc.). A more natural prompting approach is to present the question and answer options to the LLM jointly and have it output the symbol (e.g., "A") associated with its chosen answer option. This approach allows the model to explicitly compare answer options, reduces computational costs, and mitigates the effects of tokenization scheme and answer option representations on answer selection. For the natural approach to be effective, the LLM it is used with must be able to associate answer options with the symbols that represent them. The LLM needs what we term multiple choice symbol binding (MCSB) ability. This ability varies greatly by model. We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach across 20 diverse datasets and largely closes the gap with the SOTA, suggesting that the MCQA ability of LLMs has been previously underestimated.  ( 3 min )
    Tiny, always-on and fragile: Bias propagation through design choices in on-device machine learning workflows. (arXiv:2201.07677v4 [cs.LG] UPDATED)
    Billions of distributed, heterogeneous and resource constrained IoT devices deploy on-device machine learning (ML) for private, fast and offline inference on personal data. On-device ML is highly context dependent, and sensitive to user, usage, hardware and environment attributes. This sensitivity and the propensity towards bias in ML makes it important to study bias in on-device settings. Our study is one of the first investigations of bias in this emerging domain, and lays important foundations for building fairer on-device ML. We apply a software engineering lens, investigating the propagation of bias through design choices in on-device ML workflows. We first identify reliability bias as a source of unfairness and propose a measure to quantify it. We then conduct empirical experiments for a keyword spotting task to show how complex and interacting technical design choices amplify and propagate reliability bias. Our results validate that design choices made during model training, like the sample rate and input feature type, and choices made to optimize models, like light-weight architectures, the pruning learning rate and pruning sparsity, can result in disparate predictive performance across male and female groups. Based on our findings we suggest low effort strategies for engineers to mitigate bias in on-device ML.  ( 3 min )
    Learning-Based Adaptive Control for Stochastic Linear Systems with Input Constraints. (arXiv:2209.07040v3 [eess.SY] UPDATED)
    We propose a certainty-equivalence scheme for adaptive control of scalar linear systems subject to additive, i.i.d. Gaussian disturbances and bounded control input constraints, without requiring prior knowledge of the bounds of the system parameters, nor the control direction. Assuming that the system is at-worst marginally stable, mean square boundedness of the closed-loop system states is proven. Lastly, numerical examples are presented to illustrate our results.  ( 2 min )
    Detecting Political Biases of Named Entities and Hashtags on Twitter. (arXiv:2209.08110v2 [cs.SI] UPDATED)
    Ideological divisions in the United States have become increasingly prominent in daily communication. Accordingly, there has been much research on political polarization, including many recent efforts that take a computational perspective. By detecting political biases in a corpus of text, one can attempt to describe and discern the polarity of that text. Intuitively, the named entities (i.e., the nouns and the phrases that act as nouns) and hashtags in text often carry information about political views. For example, people who use the term "pro-choice" are likely to be liberal, whereas people who use the term "pro-life" are likely to be conservative. In this paper, we seek to reveal political polarities in social-media text data and to quantify these polarities by explicitly assigning a polarity score to entities and hashtags. Although this idea is straightforward, it is difficult to perform such inference in a trustworthy quantitative way. Key challenges include the small number of known labels, the continuous spectrum of political views, and the preservation of both a polarity score and a polarity-neutral semantic meaning in an embedding vector of words. To attempt to overcome these challenges, we propose the Polarity-aware Embedding Multi-task learning (PEM) model. This model consists of (1) a self-supervised context-preservation task, (2) an attention-based tweet-level polarity-inference task, and (3) an adversarial learning task that promotes independence between an embedding's polarity dimension and its semantic dimensions. Our experimental results demonstrate that our PEM model can successfully learn polarity-aware embeddings that perform well classification tasks. We examine a variety of applications and we thereby demonstrate the effectiveness of our PEM model. We also discuss important limitations of our work and encourage caution when applying the it to real-world scenarios.  ( 3 min )
    Keypoint-GraspNet: Keypoint-based 6-DoF Grasp Generation from the Monocular RGB-D input. (arXiv:2209.08752v3 [cs.RO] UPDATED)
    Great success has been achieved in the 6-DoF grasp learning from the point cloud input, yet the computational cost due to the point set orderlessness remains a concern. Alternatively, we explore the grasp generation from the RGB-D input in this paper. The proposed solution, Keypoint-GraspNet, detects the projection of the gripper keypoints in the image space and then recover the SE(3) poses with a PnP algorithm. A synthetic dataset based on the primitive shape and the grasp family is constructed to examine our idea. Metric-based evaluation reveals that our method outperforms the baselines in terms of the grasp proposal accuracy, diversity, and the time cost. Finally, robot experiments show high success rate, demonstrating the potential of the idea in the real-world applications.  ( 2 min )
    Scaling Novel Object Detection with Weakly Supervised Detection Transformers. (arXiv:2207.05205v2 [cs.CV] UPDATED)
    A critical object detection task is finetuning an existing model to detect novel objects, but the standard workflow requires bounding box annotations which are time-consuming and expensive to collect. Weakly supervised object detection (WSOD) offers an appealing alternative, where object detectors can be trained using image-level labels. However, the practical application of current WSOD models is limited, as they only operate at small data scales and require multiple rounds of training and refinement. To address this, we propose the Weakly Supervised Detection Transformer, which enables efficient knowledge transfer from a large-scale pretraining dataset to WSOD finetuning on hundreds of novel objects. Additionally, we leverage pretrained knowledge to improve the multiple instance learning (MIL) framework often used in WSOD methods. Our experiments show that our approach outperforms previous state-of-the-art models on large-scale novel object detection datasets, and our scaling study reveals that class quantity is more important than image quantity for WSOD pretraining.  ( 2 min )
    Actor-Critic or Critic-Actor? A Tale of Two Time Scales. (arXiv:2210.04470v2 [cs.LG] UPDATED)
    We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale. This emulates policy iteration. We begin by observing that reversal of the time scales will in fact emulate value iteration and is a legitimate algorithm. We provide a proof of convergence and compare the two empirically with and without function approximation (with both linear and nonlinear function approximators) and observe that our proposed critic-actor algorithm performs on par with actor-critic in terms of both accuracy and computational effort.  ( 2 min )
    Towards More Objective Evaluation of Class Incremental Learning: Representation Learning Perspective. (arXiv:2206.08101v2 [cs.LG] UPDATED)
    Class incremental learning (CIL) is the process of continually learning new object classes from incremental data while not forgetting past learned classes. While the common method for evaluating CIL algorithms is based on average test accuracy for all learned classes, we argue that maximizing accuracy alone does not necessarily lead to effective CIL algorithms. In this paper, we experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning and propose a new analysis method. Our experiments show that most state-of-the-art algorithms prioritize high stability and do not significantly change the learned representation, and sometimes even learn a representation of lower quality than a naive baseline. However, we observe that these algorithms can still achieve high test accuracy because they learn a classifier that is closer to the optimal classifier. We also found that the base model learned in the first task varies in representation quality across different algorithms, and changes in the final performance were observed when each algorithm was trained under similar representation quality of the base model. Thus, we suggest that representation-level evaluation is an additional recipe for more objective evaluation and effective development of CIL algorithms.  ( 3 min )
    Deep learning-based conditional inpainting for restoration of artifact-affected 4D CT images. (arXiv:2203.06431v2 [physics.med-ph] UPDATED)
    4D CT imaging is an essential component of radiotherapy of thoracic/abdominal tumors. 4D CT images are, however, often affected by artifacts that compromise treatment planning quality. In this work, deep learning (DL)-based conditional inpainting is proposed to restore anatomically correct image information of artifact-affected areas. The restoration approach consists of a two-stage process: DL-based detection of common interpolation (INT) and double structure (DS) artifacts, followed by conditional inpainting applied to the artifact areas. In this context, conditional refers to a guidance of the inpainting process by patient-specific image data to ensure anatomically reliable results. The study is based on 65 in-house 4D CT images of lung cancer patients (48 with only slight artifacts, 17 with pronounced artifacts) and two publicly available 4D CT data sets that serve as independent external test sets. Automated artifact detection revealed a ROC-AUC of 0.99 for INT and of 0.97 for DS artifacts (in-house data). The proposed inpainting method decreased the average root mean squared error (RMSE) by 52%(INT) and 59% (DS) for the in-house data. For the external test data sets, the RMSE improvement is similar (50% and 59 %, respectively). Applied to 4D CT data with pronounced artifacts (not part of the training set), 72% of the detectable artifacts were removed. The results highlight the potential of DL-based inpainting for restoration of artifact-affected 4D CT data. Compared to recent 4D CT inpainting and restoration approaches, the proposed methodology illustrates the advantages of exploiting patient-specific prior image information.  ( 3 min )
    Push--Pull with Device Sampling. (arXiv:2206.04113v2 [math.OC] UPDATED)
    We consider decentralized optimization problems in which a number of agents collaborate to minimize the average of their local functions by exchanging over an underlying communication graph. Specifically, we place ourselves in an asynchronous model where only a random portion of nodes perform computation at each iteration, while the information exchange can be conducted between all the nodes and in an asymmetric fashion. For this setting, we propose an algorithm that combines gradient tracking with a network-level variance reduction (in contrast to variance reduction within each node). This enables each node to track the average of the gradients of the objective functions. Our theoretical analysis shows that the algorithm converges linearly, when the local objective functions are strongly convex, under mild connectivity conditions on the expected mixing matrices. In particular, our result does not require the mixing matrices to be doubly stochastic. In the experiments, we investigate a broadcast mechanism that transmits information from computing nodes to their neighbors, and confirm the linear convergence of our method on both synthetic and real-world datasets.  ( 2 min )
    No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation. (arXiv:2206.06015v2 [cs.GT] UPDATED)
    We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convex-concave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the game-theoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separation -- that is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that attains nearly the same guarantees as its non-adapted counterpart, while operating without knowledge of either the game or of the noise profile.  ( 3 min )
    Backpropagation through Combinatorial Algorithms: Identity with Projection Works. (arXiv:2205.15213v3 [cs.LG] UPDATED)
    Embedding discrete solvers as differentiable layers has given modern deep learning architectures combinatorial expressivity and discrete reasoning capabilities. The derivative of these solvers is zero or undefined, therefore a meaningful replacement is crucial for effective gradient-based learning. Prior works rely on smoothing the solver with input perturbations, relaxing the solver to continuous problems, or interpolating the loss landscape with techniques that typically require additional solver calls, introduce extra hyper-parameters, or compromise performance. We propose a principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass and further provide a theoretical justification. Our experiments demonstrate that such a straightforward hyper-parameter-free approach is able to compete with previous more complex methods on numerous experiments such as backpropagation through discrete samplers, deep graph matching, and image retrieval. Furthermore, we substitute the previously proposed problem-specific and label-dependent margin with a generic regularization procedure that prevents cost collapse and increases robustness.  ( 3 min )
    Easy Differentially Private Linear Regression. (arXiv:2208.07353v2 [cs.LG] UPDATED)
    Linear regression is a fundamental tool for statistical analysis. This has motivated the development of linear regression methods that also satisfy differential privacy and thus guarantee that the learned model reveals little about any one data point used to construct it. However, existing differentially private solutions assume that the end user can easily specify good data bounds and hyperparameters. Both present significant practical obstacles. In this paper, we study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models. Given $n$ samples of $d$-dimensional data used to train $m$ models, we construct an efficient analogue using an approximate Tukey depth that runs in time $O(d^2n + dm\log(m))$. We find that this algorithm obtains strong empirical performance in the data-rich setting with no data bounds or hyperparameter selection required.  ( 2 min )
    A Recipe for Watermarking Diffusion Models. (arXiv:2303.10137v1 [cs.CV])
    Recently, diffusion models (DMs) have demonstrated their advantageous potential for generative tasks. Widespread interest exists in incorporating DMs into downstream applications, such as producing or editing photorealistic images. However, practical deployment and unprecedented power of DMs raise legal issues, including copyright protection and monitoring of generated content. In this regard, watermarking has been a proven solution for copyright protection and content monitoring, but it is underexplored in the DMs literature. Specifically, DMs generate samples from longer tracks and may have newly designed multimodal structures, necessitating the modification of conventional watermarking pipelines. To this end, we conduct comprehensive analyses and derive a recipe for efficiently watermarking state-of-the-art DMs (e.g., Stable Diffusion), via training from scratch or finetuning. Our recipe is straightforward but involves empirically ablated implementation details, providing a solid foundation for future research on watermarking DMs. Our Code: https://github.com/yunqing-me/WatermarkDM.  ( 2 min )
    Is Bio-Inspired Learning Better than Backprop? Benchmarking Bio Learning vs. Backprop. (arXiv:2212.04614v3 [cs.LG] UPDATED)
    Bio-inspired learning has been gaining popularity recently given that Backpropagation (BP) is not considered biologically plausible. Many algorithms have been proposed in the literature which are all more biologically plausible than BP. However, apart from overcoming the biological implausibility of BP, a strong motivation for using Bio-inspired algorithms remains lacking. In this study, we undertake a holistic comparison of BP vs. multiple Bio-inspired algorithms to answer the question of whether Bio-learning offers additional benefits over BP. We test Bio-algorithms under different design choices such as access to only partial training data, resource constraints in terms of the number of training epochs, sparsification of the neural network parameters and addition of noise to input samples. Through these experiments, we notably find two key advantages of Bio-algorithms over BP. Firstly, Bio-algorithms perform much better than BP when the entire training dataset is not supplied. Four of the five Bio-algorithms tested outperform BP by upto 5% accuracy when only 20% of the training dataset is available. Secondly, even when the full dataset is available, Bio-algorithms learn much quicker and converge to a stable accuracy in far lesser training epochs than BP. Hebbian learning, specifically, is able to learn in just 5 epochs compared to around 100 epochs required by BP. These insights present practical reasons for utilising Bio-learning beyond just their biological plausibility and also point towards interesting new directions for future work on Bio-learning.  ( 3 min )
    LAVA: Granular Neuron-Level Explainable AI for Alzheimer's Disease Assessment from Fundus Images. (arXiv:2302.03008v2 [cs.LG] UPDATED)
    Alzheimer's Disease (AD) is a progressive neurodegenerative disease and the leading cause of dementia. Early diagnosis is critical for patients to benefit from potential intervention and treatment. The retina has been hypothesized as a diagnostic site for AD detection owing to its anatomical connection with the brain. Developed AI models for this purpose have yet to provide a rational explanation about the decision and neither infer the stage of disease's progression. Along this direction, we propose a novel model-agnostic explainable-AI framework, called Granular Neuron-level Explainer (LAVA), an interpretation prototype that probes into intermediate layers of the Convolutional Neural Network (CNN) models to assess the AD continuum directly from the retinal imaging without longitudinal or clinical evaluation. This method is applied to validate the retinal vasculature as a biomarker and diagnostic modality for Alzheimer's Disease (AD) evaluation. UK Biobank cognitive tests and vascular morphological features suggest LAVA shows strong promise and effectiveness in identifying AD stages across the progression continuum.  ( 2 min )
    Towards Reliable Neural Specifications. (arXiv:2210.16114v5 [cs.LG] UPDATED)
    Having reliable specifications is an unavoidable challenge in achieving verifiable correctness, robustness, and interpretability of AI systems. Existing specifications for neural networks are in the paradigm of data as specification. That is, the local neighborhood centering around a reference input is considered to be correct (or robust). While existing specifications contribute to verifying adversarial robustness, a significant problem in many research domains, our empirical study shows that those verified regions are somewhat tight, and thus fail to allow verification of test set inputs, making them impractical for some real-world applications. To this end, we propose a new family of specifications called neural representation as specification, which uses the intrinsic information of neural networks - neural activation patterns (NAPs), rather than input data to specify the correctness and/or robustness of neural network predictions. We present a simple statistical approach to mining neural activation patterns. To show the effectiveness of discovered NAPs, we formally verify several important properties, such as various types of misclassifications will never happen for a given NAP, and there is no ambiguity between different NAPs. We show that by using NAP, we can verify a significant region of the input space, while still recalling 84% of the data on MNIST. Moreover, we can push the verifiable bound to 10 times larger on the CIFAR10 benchmark. Thus, we argue that NAPs can potentially be used as a more reliable and extensible specification for neural network verification.  ( 3 min )
    Learning to Select Prototypical Parts for Interpretable Sequential Data Modeling. (arXiv:2212.03396v2 [cs.LG] UPDATED)
    Prototype-based interpretability methods provide intuitive explanations of model prediction by comparing samples to a reference set of memorized exemplars or typical representatives in terms of similarity. In the field of sequential data modeling, similarity calculations of prototypes are usually based on encoded representation vectors. However, due to highly recursive functions, there is usually a non-negligible disparity between the prototype-based explanations and the original input. In this work, we propose a Self-Explaining Selective Model (SESM) that uses a linear combination of prototypical concepts to explain its own predictions. The model employs the idea of case-based reasoning by selecting sub-sequences of the input that mostly activate different concepts as prototypical parts, which users can compare to sub-sequences selected from different example inputs to understand model decisions. For better interpretability, we design multiple constraints including diversity, stability, and locality as training objectives. Extensive experiments in different domains demonstrate that our method exhibits promising interpretability and competitive accuracy.  ( 2 min )
    Collision Cross-entropy and EM Algorithm for Self-labeled Classification. (arXiv:2303.07321v2 [cs.LG] UPDATED)
    We propose "collision cross-entropy" as a robust alternative to the Shannon's cross-entropy in the context of self-labeled classification with posterior models. Assuming unlabeled data, self-labeling works by estimating latent pseudo-labels, categorical distributions y, that optimize some discriminative clustering criteria, e.g. "decisiveness" and "fairness". All existing self-labeled losses incorporate Shannon's cross-entropy term targeting the model prediction, softmax, at the estimated distribution y. In fact, softmax is trained to mimic the uncertainty in y exactly. Instead, we propose the negative log-likelihood of "collision" to maximize the probability of equality between two random variables represented by distributions softmax and y. We show that our loss satisfies some properties of a generalized cross-entropy. Interestingly, it agrees with the Shannon's cross-entropy for one-hot pseudo-labels y, but the training from softer labels weakens. For example, if y is a uniform distribution at some data point, it has zero contribution to the training. Our self-labeling loss combining collision cross entropy with basic clustering criteria is convex w.r.t. pseudo-labels, but non-trivial to optimize over the probability simplex. We derive a practical EM algorithm optimizing pseudo-labels y significantly faster than generic methods, e.g. the projectile gradient descent. The collision cross-entropy consistently improves the results on multiple self-labeled clustering examples using different DNNs.  ( 3 min )
    Observations on K-image Expansion of Image-Mixing Augmentation for Classification. (arXiv:2110.04248v2 [cs.CV] UPDATED)
    Image-mixing augmentations (e.g., Mixup and CutMix), which typically involve mixing two images, have become the de-facto training techniques for image classification. Despite their huge success in image classification, the number of images to be mixed has not been elucidated in the literature: only the naive K-image expansion has been shown to lead to performance degradation. This study derives a new K-image mixing augmentation based on the stick-breaking process under Dirichlet prior distribution. We demonstrate the superiority of our K-image expansion augmentation over conventional two-image mixing augmentation methods through extensive experiments and analyses: (1) more robust and generalized classifiers; (2) a more desirable loss landscape shape; (3) better adversarial robustness. Moreover, we show that our probabilistic model can measure the sample-wise uncertainty and boost the efficiency for network architecture search by achieving a 7-fold reduction in the search time. Code will be available at https://github.com/yjyoo3312/DCutMix-PyTorch.git.  ( 2 min )
    An Approach for Noisy, Crowdsourced Datasets Utilizing Ensemble Modeling, Normalized Distributions of Annotations, and Entropic Measures of Uncertainty. (arXiv:2210.16380v2 [cs.CV] UPDATED)
    Performing classification on noisy, crowdsourced image datasets can prove challenging even for the best neural networks. Two issues which complicate the problem on such datasets are class imbalance and ground-truth uncertainty in labeling. The AL-ALL and AL-PUB datasets -- consisting of tightly cropped, individual characters from images of ancient Greek papyri -- are strongly affected by both issues. The application of ensemble modeling to such datasets can help identify images where the ground-truth is questionable and quantify the trustworthiness of those samples. As such, we apply stacked generalization consisting of nearly identical ResNets with different loss functions: one utilizing sparse cross-entropy (CXE) and the other Kullback-Liebler Divergence (KLD). Both networks use labels drawn from the crowdsourced consensus. For the second network, the KLD is calculated with respect to the proposed Normalized Distribution of Annotations (NDA). For our ensemble model, we apply a k-nearest neighbors model to the outputs of the CXE and KLD networks. Individually, the ResNet models have approximately 93% accuracy, while the ensemble model achieves an accuracy of > 95%. We also perform an analysis of the Shannon entropy of the various models' output distributions to measure classification uncertainty. Our results suggest that entropy is useful for predicting model misclassifications.  ( 3 min )
    HUMUS-Net: Hybrid unrolled multi-scale network architecture for accelerated MRI reconstruction. (arXiv:2203.08213v2 [eess.IV] UPDATED)
    In accelerated MRI reconstruction, the anatomy of a patient is recovered from a set of under-sampled and noisy measurements. Deep learning approaches have been proven to be successful in solving this ill-posed inverse problem and are capable of producing very high quality reconstructions. However, current architectures heavily rely on convolutions, that are content-independent and have difficulties modeling long-range dependencies in images. Recently, Transformers, the workhorse of contemporary natural language processing, have emerged as powerful building blocks for a multitude of vision tasks. These models split input images into non-overlapping patches, embed the patches into lower-dimensional tokens and utilize a self-attention mechanism that does not suffer from the aforementioned weaknesses of convolutional architectures. However, Transformers incur extremely high compute and memory cost when 1) the input image resolution is high and 2) when the image needs to be split into a large number of patches to preserve fine detail information, both of which are typical in low-level vision problems such as MRI reconstruction, having a compounding effect. To tackle these challenges, we propose HUMUS-Net, a hybrid architecture that combines the beneficial implicit bias and efficiency of convolutions with the power of Transformer blocks in an unrolled and multi-scale network. HUMUS-Net extracts high-resolution features via convolutional blocks and refines low-resolution features via a novel Transformer-based multi-scale feature extractor. Features from both levels are then synthesized into a high-resolution output reconstruction. Our network establishes new state of the art on the largest publicly available MRI dataset, the fastMRI dataset. We further demonstrate the performance of HUMUS-Net on two other popular MRI datasets and perform fine-grained ablation studies to validate our design.  ( 3 min )
    Distill n' Explain: explaining graph neural networks using simple surrogates. (arXiv:2303.10139v1 [cs.LG])
    Explaining node predictions in graph neural networks (GNNs) often boils down to finding graph substructures that preserve predictions. Finding these structures usually implies back-propagating through the GNN, bonding the complexity (e.g., number of layers) of the GNN to the cost of explaining it. This naturally begs the question: Can we break this bond by explaining a simpler surrogate GNN? To answer the question, we propose Distill n' Explain (DnX). First, DnX learns a surrogate GNN via knowledge distillation. Then, DnX extracts node or edge-level explanations by solving a simple convex program. We also propose FastDnX, a faster version of DnX that leverages the linear decomposition of our surrogate model. Experiments show that DnX and FastDnX often outperform state-of-the-art GNN explainers while being orders of magnitude faster. Additionally, we support our empirical findings with theoretical results linking the quality of the surrogate model (i.e., distillation error) to the faithfulness of explanations.  ( 2 min )
    MassNet: A Deep Learning Approach for Body Weight Extraction from A Single Pressure Image. (arXiv:2303.10136v1 [cs.HC])
    Body weight, as an essential physiological trait, is of considerable significance in many applications like body management, rehabilitation, and drug dosing for patient-specific treatments. Previous works on the body weight estimation task are mainly vision-based, using 2D/3D, depth, or infrared images, facing problems in illumination, occlusions, and especially privacy issues. The pressure mapping mattress is a non-invasive and privacy-preserving tool to obtain the pressure distribution image over the bed surface, which strongly correlates with the body weight of the lying person. To extract the body weight from this image, we propose a deep learning-based model, including a dual-branch network to extract the deep features and pose features respectively. A contrastive learning module is also combined with the deep-feature branch to help mine the mutual factors across different postures of every single subject. The two groups of features are then concatenated for the body weight regression task. To test the model's performance over different hardware and posture settings, we create a pressure image dataset of 10 subjects and 23 postures, using a self-made pressure-sensing bedsheet. This dataset, which is made public together with this paper, together with a public dataset, are used for the validation. The results show that our model outperforms the state-of-the-art algorithms over both 2 datasets. Our research constitutes an important step toward fully automatic weight estimation in both clinical and at-home practice. Our dataset is available for research purposes at: https://github.com/USTCWzy/MassEstimation.  ( 3 min )
    Liability regimes in the age of AI: a use-case driven analysis of the burden of proof. (arXiv:2211.01817v2 [cs.AI] UPDATED)
    New emerging technologies powered by Artificial Intelligence (AI) have the potential to disruptively transform our societies for the better. In particular, data-driven learning approaches (i.e., Machine Learning (ML)) have been a true revolution in the advancement of multiple technologies in various application domains. But at the same time there is growing concern about certain intrinsic characteristics of these methodologies that carry potential risks to both safety and fundamental rights. Although there are mechanisms in the adoption process to minimize these risks (e.g., safety regulations), these do not exclude the possibility of harm occurring, and if this happens, victims should be able to seek compensation. Liability regimes will therefore play a key role in ensuring basic protection for victims using or interacting with these systems. However, the same characteristics that make AI systems inherently risky, such as lack of causality, opacity, unpredictability or their self and continuous learning capabilities, may lead to considerable difficulties when it comes to proving causation. This paper presents three case studies, as well as the methodology to reach them, that illustrate these difficulties. Specifically, we address the cases of cleaning robots, delivery drones and robots in education. The outcome of the proposed analysis suggests the need to revise liability regimes to alleviate the burden of proof on victims in cases involving AI technologies.  ( 3 min )
    AirTrack: Onboard Deep Learning Framework for Long-Range Aircraft Detection and Tracking. (arXiv:2209.12849v2 [cs.CV] UPDATED)
    Detect-and-Avoid (DAA) capabilities are critical for safe operations of unmanned aircraft systems (UAS). This paper introduces, AirTrack, a real-time vision-only detect and tracking framework that respects the size, weight, and power (SWaP) constraints of sUAS systems. Given the low Signal-to-Noise ratios (SNR) of far away aircraft, we propose using full resolution images in a deep learning framework that aligns successive images to remove ego-motion. The aligned images are then used downstream in cascaded primary and secondary classifiers to improve detection and tracking performance on multiple metrics. We show that AirTrack outperforms state-of-the art baselines on the Amazon Airborne Object Tracking (AOT) Dataset. Multiple real world flight tests with a Cessna 182 interacting with general aviation traffic and additional near-collision flight tests with a Bell helicopter flying towards a UAS in a controlled setting showcase that the proposed approach satisfies the newly introduced ASTM F3442/F3442M standard for DAA. Empirical evaluations show that our system has a probability of track of more than 95% up to a range of 700m. Video available at https://youtu.be/H3lL_Wjxjpw .  ( 3 min )
    QUBO Decision Tree: Annealing Machine Extends Decision Tree Splitting. (arXiv:2303.09772v1 [cs.LG])
    This paper proposes an extension of regression trees by quadratic unconstrained binary optimization (QUBO). Regression trees are very popular prediction models that are trainable with tabular datasets, but their accuracy is insufficient because the decision rules are too simple. The proposed method extends the decision rules in decision trees to multi-dimensional boundaries. Such an extension is generally unimplementable because of computational limitations, however, the proposed method transforms the training process to QUBO, which enables an annealing machine to solve this problem.  ( 2 min )
    Data-centric Artificial Intelligence: A Survey. (arXiv:2303.10158v1 [cs.LG])
    Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler of its great success is the availability of abundant and high-quality data for building machine learning models. Recently, the role of data in AI has been significantly magnified, giving rise to the emerging concept of data-centric AI. The attention of researchers and practitioners has gradually shifted from advancing model design to enhancing the quality and quantity of the data. In this survey, we discuss the necessity of data-centric AI, followed by a holistic view of three general data-centric goals (training data development, inference data development, and data maintenance) and the representative methods. We also organize the existing literature from automation and collaboration perspectives, discuss the challenges, and tabulate the benchmarks for various tasks. We believe this is the first comprehensive survey that provides a global view of a spectrum of tasks across various stages of the data lifecycle. We hope it can help the readers efficiently grasp a broad picture of this field, and equip them with the techniques and further research ideas to systematically engineer data for building AI systems. A companion list of data-centric AI resources will be regularly updated on https://github.com/daochenzha/data-centric-AI  ( 2 min )
    Learning time-scales in two-layers neural networks. (arXiv:2303.00055v2 [cs.LG] UPDATED)
    Gradient-based learning in multi-layer neural networks displays a number of striking features. In particular, the decrease rate of empirical risk is non-monotone even after averaging over large batches. Long plateaus in which one observes barely any progress alternate with intervals of rapid decrease. These successive phases of learning often take place on very different time scales. Finally, models learnt in an early phase are typically `simpler' or `easier to learn' although in a way that is difficult to formalize. Although theoretical explanations of these phenomena have been put forward, each of them captures at best certain specific regimes. In this paper, we study the gradient flow dynamics of a wide two-layer neural network in high-dimension, when data are distributed according to a single-index model (i.e., the target function depends on a one-dimensional projection of the covariates). Based on a mixture of new rigorous results, non-rigorous mathematical derivations, and numerical simulations, we propose a scenario for the learning dynamics in this setting. In particular, the proposed evolution exhibits separation of timescales and intermittency. These behaviors arise naturally because the population gradient flow can be recast as a singularly perturbed dynamical system.  ( 2 min )
    Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices. (arXiv:2303.10019v1 [stat.ML])
    This paper presents a new method for combining (or aggregating or ensembling) multivariate probabilistic forecasts, taking into account dependencies between quantiles and covariates through a smoothing procedure that allows for online learning. Two smoothing methods are discussed: dimensionality reduction using Basis matrices and penalized smoothing. The new online learning algorithm generalizes the standard CRPS learning framework into multivariate dimensions. It is based on Bernstein Online Aggregation (BOA) and yields optimal asymptotic learning properties. We provide an in-depth discussion on possible extensions of the algorithm and several nested cases related to the existing literature on online forecast combination. The methodology is applied to forecasting day-ahead electricity prices, which are 24-dimensional distributional forecasts. The proposed method yields significant improvements over uniform combination in terms of continuous ranked probability score (CRPS). We discuss the temporal evolution of the weights and hyperparameters and present the results of reduced versions of the preferred model. A fast C++ implementation of all discussed methods is provided in the R-Package profoc.  ( 3 min )
    An Isolation-Aware Online Virtual Network Embedding via Deep Reinforcement Learning. (arXiv:2211.14158v2 [cs.NI] UPDATED)
    Virtualization technologies are the foundation of modern ICT infrastructure, enabling service providers to create dedicated virtual networks (VNs) that can support a wide range of smart city applications. These VNs continuously generate massive amounts of data, necessitating stringent reliability and security requirements. In virtualized network environments, however, multiple VNs may coexist on the same physical infrastructure and, if not properly isolated, may interfere with or provide unauthorized access to one another. The former causes performance degradation, while the latter compromises the security of VNs. Service assurance for infrastructure providers becomes significantly more complicated when a specific VN violates the isolation requirement. In an effort to address the isolation issue, this paper proposes isolation during virtual network embedding (VNE), the procedure of allocating VNs onto physical infrastructure. We define a simple abstracted concept of isolation levels to capture the variations in isolation requirements and then formulate isolation-aware VNE as an optimization problem with resource and isolation constraints. A deep reinforcement learning (DRL)-based VNE algorithm ISO-DRL_VNE, is proposed that considers resource and isolation constraints and is compared to the existing three state-of-the-art algorithms: NodeRank, Global Resource Capacity (GRC), and Mote-Carlo Tree Search (MCTS). Evaluation results show that the ISO-DRL_VNE algorithm outperforms others in acceptance ratio, long-term average revenue, and long-term average revenue-to-cost ratio by 6%, 13%, and 15%.  ( 3 min )
    Investigating Stateful Defenses Against Black-Box Adversarial Examples. (arXiv:2303.06280v2 [cs.CR] UPDATED)
    Defending machine-learning (ML) models against white-box adversarial attacks has proven to be extremely difficult. Instead, recent work has proposed stateful defenses in an attempt to defend against a more restricted black-box attacker. These defenses operate by tracking a history of incoming model queries, and rejecting those that are suspiciously similar. The current state-of-the-art stateful defense Blacklight was proposed at USENIX Security '22 and claims to prevent nearly 100% of attacks on both the CIFAR10 and ImageNet datasets. In this paper, we observe that an attacker can significantly reduce the accuracy of a Blacklight-protected classifier (e.g., from 82.2% to 6.4% on CIFAR10) by simply adjusting the parameters of an existing black-box attack. Motivated by this surprising observation, since existing attacks were evaluated by the Blacklight authors, we provide a systematization of stateful defenses to understand why existing stateful defense models fail. Finally, we propose a stronger evaluation strategy for stateful defenses comprised of adaptive score and hard-label based black-box attacks. We use these attacks to successfully reduce even reconfigured versions of Blacklight to as low as 0% robust accuracy.  ( 2 min )
    Towards AI-controlled FES-restoration of arm movements: Controlling for progressive muscular fatigue with Gaussian state-space models. (arXiv:2301.04005v2 [eess.SY] UPDATED)
    Reaching disability limits an individual's ability in performing daily tasks. Surface Functional Electrical Stimulation (FES) offers a non-invasive solution to restore the lost abilities. However, inducing desired movements using FES is still an open engineering problem. This problem is accentuated by the complexities of human arms' neuromechanics and the variations across individuals. Reinforcement Learning (RL) emerges as a promising approach to govern customised control rules for different subjects and settings. Yet, one remaining challenge of using RL to control FES is unobservable muscle fatigue that progressively changes as an unknown function of the stimulation, breaking the Markovian assumption of RL. In this work, we present a method to address the unobservable muscle fatigue issue, allowing our RL controller to achieve higher control performances. Our method is based on a Gaussian State-Space Model (GSSM) that utilizes recurrent neural networks to learn Markovian state-spaces from partial observations. The GSSM is used as a filter that converts the observations into the state-space representation for RL to preserve the Markovian assumption. Here, we start with presenting the modification of the original GSSM to address an overconfident issue. We then present the interaction between RL and the modified GSSM, followed by the setup for FES control learning. We test our RL-GSSM system on a planar reaching setting in simulation using a detailed neuromechanical model and show that the GSSM can help RL maintain its control performance against the fatigue.  ( 3 min )
    Treeformer: Dense Gradient Trees for Efficient Attention Computation. (arXiv:2208.09015v2 [cs.CL] UPDATED)
    Standard inference and training with transformer based architectures scale quadratically with input sequence length. This is prohibitively large for a variety of applications especially in web-page translation, query-answering etc. Consequently, several approaches have been developed recently to speedup attention computation by enforcing different attention structures such as sparsity, low-rank, approximating attention using kernels. In this work, we view attention computation as that of nearest neighbor retrieval, and use decision tree based hierarchical navigation to reduce the retrieval cost per query token from linear in sequence length to nearly logarithmic. Based on such hierarchical navigation, we design Treeformer which can use one of two efficient attention layers -- TF-Attention and TC-Attention. TF-Attention computes the attention in a fine-grained style, while TC-Attention is a coarse attention layer which also ensures that the gradients are "dense". To optimize such challenging discrete layers, we propose a two-level bootstrapped training method. Using extensive experiments on standard NLP benchmarks, especially for long-sequences, we demonstrate that our Treeformer architecture can be almost as accurate as baseline Transformer while using 30x lesser FLOPs in the attention layer. Compared to Linformer, the accuracy can be as much as 12% higher while using similar FLOPs in the attention layer.  ( 3 min )
    Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning. (arXiv:2301.01113v2 [cs.SE] UPDATED)
    Automated program repair (APR) faces the challenge of test overfitting, where generated patches pass validation tests but fail to generalize. Existing methods for patch assessment involve generating new tests or manual inspection, which can be time-consuming or biased. In this paper, we propose a novel technique, INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR leverages program invariants to reason about program semantics while also capturing program syntax through language semantics learned from a large code corpus using a pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that an APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains erroneous behaviors from the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is threefold. First, INVALIDATOR leverages both semantic and syntactic reasoning to enhance its discriminative capability. Second, INVALIDATOR does not require new test cases to be generated, but instead only relies on the current test suite and uses invariant inference to generalize program behaviors. Third, INVALIDATOR is fully automated. Experimental results demonstrate that INVALIDATOR outperforms existing methods in terms of Accuracy and F-measure, correctly identifying 79% of overfitting patches and detecting 23% more overfitting patches than the best baseline.  ( 3 min )
    Online Reinforcement Learning in Periodic MDP. (arXiv:2303.09629v1 [cs.LG])
    We study learning in periodic Markov Decision Process (MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We formulate the problem as a stationary MDP by augmenting the state space with the period index, and propose a periodic upper confidence bound reinforcement learning-2 (PUCRL2) algorithm. We show that the regret of PUCRL2 varies linearly with the period $N$ and as $\mathcal{O}(\sqrt{Tlog T})$ with the horizon length $T$. Utilizing the information about the sparsity of transition matrix of augmented MDP, we propose another algorithm PUCRLB which enhances upon PUCRL2, both in terms of regret ($O(\sqrt{N})$ dependency on period) and empirical performance. Finally, we propose two other algorithms U-PUCRL2 and U-PUCRLB for extended uncertainty in the environment in which the period is unknown but a set of candidate periods are known. Numerical results demonstrate the efficacy of all the algorithms.  ( 2 min )
    A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge Domain Adaptation on FPGAs. (arXiv:2107.12824v4 [cs.LG] UPDATED)
    High-performance deep neural network (DNN)-based systems are in high demand in edge environments. Due to its high computational complexity, it is challenging to deploy DNNs on edge devices with strict limitations on computational resources. In this paper, we derive a compact while highly-accurate DNN model, termed dsODENet, by combining recently-proposed parameter reduction techniques: Neural ODE (Ordinary Differential Equation) and DSC (Depthwise Separable Convolution). Neural ODE exploits a similarity between ResNet and ODE, and shares most of weight parameters among multiple layers, which greatly reduces the memory consumption. We apply dsODENet to a domain adaptation as a practical use case with image classification datasets. We also propose a resource-efficient FPGA-based design for dsODENet, where all the parameters and feature maps except for pre- and post-processing layers can be mapped onto on-chip memories. It is implemented on Xilinx ZCU104 board and evaluated in terms of domain adaptation accuracy, inference speed, FPGA resource utilization, and speedup rate compared to a software counterpart. The results demonstrate that dsODENet achieves comparable or slightly better domain adaptation accuracy compared to our baseline Neural ODE implementation, while the total parameter size without pre- and post-processing layers is reduced by 54.2% to 79.8%. Our FPGA implementation accelerates the inference speed by 23.8 times.  ( 3 min )
    Towards AI-controlled FES-restoration of arm movements: neuromechanics-based reinforcement learning for 3-D reaching. (arXiv:2301.04004v2 [eess.SY] UPDATED)
    Reaching disabilities affect the quality of life. Functional Electrical Stimulation (FES) can restore lost motor functions. Yet, there remain challenges in controlling FES to induce desired movements. Neuromechanical models are valuable tools for developing FES control methods. However, focusing on the upper extremity areas, several existing models are either overly simplified or too computationally demanding for control purposes. Besides the model-related issues, finding a general method for governing the control rules for different tasks and subjects remains an engineering challenge. Here, we present our approach toward FES-based restoration of arm movements to address those fundamental issues in controlling FES. Firstly, we present our surface-FES-oriented neuromechanical models of human arms built using well-accepted, open-source software. The models are designed to capture significant dynamics in FES controls with minimal computational cost. Our models are customisable and can be used for testing different control methods. Secondly, we present the application of reinforcement learning (RL) as a general method for governing the control rules. In combination, our customisable models and RL-based control method open the possibility of delivering customised FES controls for different subjects and settings with minimal engineering intervention. We demonstrate our approach in planar and 3D settings.  ( 3 min )
    Performance is not enough: a story of the Rashomon's quartet. (arXiv:2302.13356v2 [stat.ML] UPDATED)
    Predictive modelling is often reduced to finding the best model that optimizes a selected performance measure. But what if the second-best model describes the data equally well but in a completely different way? What about the third? Is it possible that the most effective models learn completely different relationships in the data? Inspired by Anscombe's quartet, this paper introduces Rashomon's quartet, a synthetic dataset for which four models from different classes have practically identical predictive performance. However, their visualization reveals drastically distinct ways of understanding the correlation structure in data. The introduced simple illustrative example aims to further facilitate visualization as a mandatory tool to compare predictive models beyond their performance. We need to develop insightful techniques for the explanatory analysis of model sets.  ( 2 min )
    Comparative layer-wise analysis of self-supervised speech models. (arXiv:2211.03929v3 [cs.CL] UPDATED)
    Many self-supervised speech models, varying in their pre-training objective, input modality, and pre-training data, have been proposed in the last few years. Despite impressive successes on downstream tasks, we still have a limited understanding of the properties encoded by the models and the differences across models. In this work, we examine the intermediate representations for a variety of recent models. Specifically, we measure acoustic, phonetic, and word-level properties encoded in individual layers, using a lightweight analysis tool based on canonical correlation analysis (CCA). We find that these properties evolve across layers differently depending on the model, and the variations relate to the choice of pre-training objective. We further investigate the utility of our analyses for downstream tasks by comparing the property trends with performance on speech recognition and spoken language understanding tasks. We discover that CCA trends provide reliable guidance to choose layers of interest for downstream tasks and that single-layer performance often matches or improves upon using all layers, suggesting implications for more efficient use of pre-trained models.  ( 2 min )
    Semi-supervised Bladder Tissue Classification in Multi-Domain Endoscopic Images. (arXiv:2212.11375v2 [eess.IV] UPDATED)
    Objective: Accurate visual classification of bladder tissue during Trans-Urethral Resection of Bladder Tumor (TURBT) procedures is essential to improve early cancer diagnosis and treatment. During TURBT interventions, White Light Imaging (WLI) and Narrow Band Imaging (NBI) techniques are used for lesion detection. Each imaging technique provides diverse visual information that allows clinicians to identify and classify cancerous lesions. Computer vision methods that use both imaging techniques could improve endoscopic diagnosis. We address the challenge of tissue classification when annotations are available only in one domain, in our case WLI, and the endoscopic images correspond to an unpaired dataset, i.e. there is no exact equivalent for every image in both NBI and WLI domains. Method: We propose a semi-surprised Generative Adversarial Network (GAN)-based method composed of three main components: a teacher network trained on the labeled WLI data; a cycle-consistency GAN to perform unpaired image-to-image translation, and a multi-input student network. To ensure the quality of the synthetic images generated by the proposed GAN we perform a detailed quantitative, and qualitative analysis with the help of specialists. Conclusion: The overall average classification accuracy, precision, and recall obtained with the proposed method for tissue classification are 0.90, 0.88, and 0.89 respectively, while the same metrics obtained in the unlabeled domain (NBI) are 0.92, 0.64, and 0.94 respectively. The quality of the generated images is reliable enough to deceive specialists. Significance: This study shows the potential of using semi-supervised GAN-based bladder tissue classification when annotations are limited in multi-domain data. The dataset is available at https://zenodo.org/record/7741476#.ZBQUK7TMJ6k  ( 3 min )
    InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning. (arXiv:2303.07269v1 [cs.CV] CROSS LISTED)
    Recent state-of-the-art methods in imbalanced semi-supervised learning (SSL) rely on confidence-based pseudo-labeling with consistency regularization. To obtain high-quality pseudo-labels, a high confidence threshold is typically adopted. However, it has been shown that softmax-based confidence scores in deep networks can be arbitrarily high for samples far from the training data, and thus, the pseudo-labels for even high-confidence unlabeled samples may still be unreliable. In this work, we present a new perspective of pseudo-labeling for imbalanced SSL. Without relying on model confidence, we propose to measure whether an unlabeled sample is likely to be ``in-distribution''; i.e., close to the current training data. To decide whether an unlabeled sample is ``in-distribution'' or ``out-of-distribution'', we adopt the energy score from out-of-distribution detection literature. As training progresses and more unlabeled samples become in-distribution and contribute to training, the combined labeled and pseudo-labeled data can better approximate the true class distribution to improve the model. Experiments demonstrate that our energy-based pseudo-labeling method, \textbf{InPL}, albeit conceptually simple, significantly outperforms confidence-based methods on imbalanced SSL benchmarks. For example, it produces around 3\% absolute accuracy improvement on CIFAR10-LT. When combined with state-of-the-art long-tailed SSL methods, further improvements are attained. In particular, in one of the most challenging scenarios, InPL achieves a 6.9\% accuracy improvement over the best competitor.  ( 3 min )
    SIRL: Similarity-based Implicit Representation Learning. (arXiv:2301.00810v3 [cs.RO] UPDATED)
    When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that contains spurious correlations in the data, which fails to generalize to new settings. Instead, our ultimate goal is to enable robots to identify and isolate the causal features that people actually care about and use when they represent states and behavior. Our idea is that we can tune into this representation by asking users what behaviors they consider similar: behaviors will be similar if the features that matter are similar, even if low-level behavior is different; conversely, behaviors will be different if even one of the features that matter differs. This, in turn, is what enables the robot to disambiguate between what needs to go into the representation versus what is spurious, as well as what aspects of behavior can be compressed together versus not. The notion of learning representations based on similarity has a nice parallel in contrastive learning, a self-supervised representation learning technique that maps visually similar data points to similar embeddings, where similarity is defined by a designer through data augmentation heuristics. By contrast, in order to learn the representations that people use, so we can learn their preferences and objectives, we use their definition of similarity. In simulation as well as in a user study, we show that learning through such similarity queries leads to representations that, while far from perfect, are indeed more generalizable than self-supervised and task-input alternatives.  ( 3 min )
    CausalEGM: a general causal inference framework by encoding generative modeling. (arXiv:2212.05925v4 [stat.ML] UPDATED)
    Although understanding and characterizing causal effects have become essential in observational studies, it is challenging when the confounders are high-dimensional. In this article, we develop a general framework $\textit{CausalEGM}$ for estimating causal effects by encoding generative modeling, which can be applied in both binary and continuous treatment settings. Under the potential outcome framework with unconfoundedness, we establish a bidirectional transformation between the high-dimensional confounders space and a low-dimensional latent space where the density is known (e.g., multivariate normal distribution). Through this, CausalEGM simultaneously decouples the dependencies of confounders on both treatment and outcome and maps the confounders to the low-dimensional latent space. By conditioning on the low-dimensional latent features, CausalEGM can estimate the causal effect for each individual or the average causal effect within a population. Our theoretical analysis shows that the excess risk for CausalEGM can be bounded through empirical process theory. Under an assumption on encoder-decoder networks, the consistency of the estimate can be guaranteed. In a series of experiments, CausalEGM demonstrates superior performance over existing methods for both binary and continuous treatments. Specifically, we find CausalEGM to be substantially more powerful than competing methods in the presence of large sample sizes and high dimensional confounders. The software of CausalEGM is freely available at https://github.com/SUwonglab/CausalEGM.  ( 3 min )
    Learning to Unlearn: Instance-wise Unlearning for Pre-trained Classifiers. (arXiv:2301.11578v2 [cs.LG] UPDATED)
    Since the recent advent of regulations for data protection (e.g., the General Data Protection Regulation), there has been increasing demand in deleting information learned from sensitive data in pre-trained models without retraining from scratch. The inherent vulnerability of neural networks towards adversarial attacks and unfairness also calls for a robust method to remove or correct information in an instance-wise fashion, while retaining the predictive performance across remaining data. To this end, we define instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model, by either misclassifying each instance away from its original prediction or relabeling the instance to a different label. We also propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information. Both methods only require the pre-trained model and data instances to forget, allowing painless application to real-life settings where the entire training set is unavailable. Through extensive experimentation on various image classification benchmarks, we show that our approach effectively preserves knowledge of remaining data while unlearning given instances in both single-task and continual unlearning scenarios.  ( 3 min )
    Transformer Encoder with Multiscale Deep Learning for Pain Classification Using Physiological Signals. (arXiv:2303.06845v2 [cs.LG] UPDATED)
    Pain is a serious worldwide health problem that affects a vast proportion of the population. For efficient pain management and treatment, accurate classification and evaluation of pain severity are necessary. However, this can be challenging as pain is a subjective sensation-driven experience. Traditional techniques for measuring pain intensity, e.g. self-report scales, are susceptible to bias and unreliable in some instances. Consequently, there is a need for more objective and automatic pain intensity assessment strategies. In this paper, we develop PainAttnNet (PAN), a novel transfomer-encoder deep-learning framework for classifying pain intensities with physiological signals as input. The proposed approach is comprised of three feature extraction architectures: multiscale convolutional networks (MSCN), a squeeze-and-excitation residual network (SEResNet), and a transformer encoder block. On the basis of pain stimuli, MSCN extracts short- and long-window information as well as sequential features. SEResNet highlights relevant extracted features by mapping the interdependencies among features. The third module employs a transformer encoder consisting of three temporal convolutional networks (TCN) with three multi-head attention (MHA) layers to extract temporal dependencies from the features. Using the publicly available BioVid pain dataset, we test the proposed PainAttnNet model and demonstrate that our outcomes outperform state-of-the-art models. These results confirm that our approach can be utilized for automated classification of pain intensity using physiological signals to improve pain management and treatment.  ( 3 min )
    Significantly Improving Zero-Shot X-ray Pathology Classification via Fine-tuning Pre-trained Image-Text Encoders. (arXiv:2212.07050v2 [cs.LG] UPDATED)
    Deep neural networks have been successfully adopted to diverse domains including pathology classification based on medical images. However, large-scale and high-quality data to train powerful neural networks are rare in the medical domain as the labeling must be done by qualified experts. Researchers recently tackled this problem with some success by taking advantage of models pre-trained on large-scale general domain data. Specifically, researchers took contrastive image-text encoders (e.g., CLIP) and fine-tuned it with chest X-ray images and paired reports to perform zero-shot pathology classification, thus completely removing the need for pathology-annotated images to train a classification model. Existing studies, however, fine-tuned the pre-trained model with the same contrastive learning objective, and failed to exploit the multi-labeled nature of medical image-report pairs. In this paper, we propose a new fine-tuning strategy based on sentence sampling and positive pair loss relaxation for improving the downstream zero-shot pathology classification performance, which can be applied to any pre-trained contrastive image-text encoders. Our method consistently showed dramatically improved zero-shot pathology classification performance on four different chest X-ray datasets and 3 different pre-trained models (5.77% average AUROC increase). In particular, fine-tuning CLIP with our method showed much comparable or marginally outperformed to board-certified radiologists (0.619 vs 0.625 in F1 score and 0.530 vs 0.544 in MCC) in zero-shot classification of five prominent diseases from the CheXpert dataset.  ( 3 min )
    An Interpretable Machine Learning System to Identify EEG Patterns on the Ictal-Interictal-Injury Continuum. (arXiv:2211.05207v3 [cs.CV] UPDATED)
    In many medical subfields, there is a call for greater interpretability in the machine learning systems used for clinical work. In this paper, we design an interpretable deep learning model to predict the presence of 6 types of brainwave patterns (Seizure, LPD, GPD, LRDA, GRDA, other) commonly encountered in ICU EEG monitoring. Each prediction is accompanied by a high-quality explanation delivered with the assistance of a specialized user interface. This novel model architecture learns a set of prototypical examples (``prototypes'') and makes decisions by comparing a new EEG segment to these prototypes. These prototypes are either single-class (affiliated with only one class) or dual-class (affiliated with two classes). We present three main ways of interpreting the model: 1) Using global-structure preserving methods, we map the 1275-dimensional cEEG latent features to a 2D space to visualize the ictal-interictal-injury continuum and gain insight into its high-dimensional structure. 2) Predictions are made using case-based reasoning, inherently providing explanations of the form ``this EEG looks like that EEG.'' 3) We map the model decisions to a 2D space, allowing a user to see how the current sample prediction compares to the distribution of predictions made by the model. Our model performs better than the corresponding uninterpretable (black box) model with $p<0.01$ for discriminatory performance metrics AUROC (area under the receiver operating characteristic curve) and AUPRC (area under the precision-recall curve), as well as for task-specific interpretability metrics. We provide videos of the user interface exploring the 2D embedded space, providing the first global overview of the structure of ictal-interictal-injury continuum brainwave patterns. Our interpretable model and specialized user interface can act as a reference for practitioners who work with cEEG patterns.  ( 3 min )
    Pattern Attention Transformer with Doughnut Kernel. (arXiv:2211.16961v3 [cs.CV] UPDATED)
    We present in this paper a new architecture, the Pattern Attention Transformer (PAT), that is composed of the new doughnut kernel. Compared with tokens in the NLP field, Transformer in computer vision has the problem of handling the high resolution of pixels in images. In ViT, an image is cut into square-shaped patches. As the follow-up of ViT, Swin Transformer proposes an additional step of shifting to decrease the existence of fixed boundaries, which also incurs 'two connected Swin Transformer blocks' as the minimum unit of the model. Inheriting the patch/window idea, our doughnut kernel enhances the design of patches further. It replaces the line-cut boundaries with two types of areas: sensor and updating, which is based on the comprehension of self-attention (named QKVA grid). The doughnut kernel also brings a new topic about the shape of kernels beyond square. To verify its performance on image classification, PAT is designed with Transformer blocks of regular octagon shape doughnut kernels. Its architecture is lighter: the minimum pattern attention layer is only one for each stage. Under similar complexity of computation, its performances on ImageNet 1K reach higher throughput (+10%) and surpass Swin Transformer (+0.1 acc1).  ( 2 min )
    Distributional Reinforcement Learning with Unconstrained Monotonic Neural Networks. (arXiv:2106.03228v3 [cs.LG] UPDATED)
    The distributional reinforcement learning (RL) approach advocates for representing the complete probability distribution of the random return instead of only modelling its expectation. A distributional RL algorithm may be characterised by two main components, namely the representation of the distribution together with its parameterisation and the probability metric defining the loss. The present research work considers the unconstrained monotonic neural network (UMNN) architecture, a universal approximator of continuous monotonic functions which is particularly well suited for modelling different representations of a distribution. This property enables the efficient decoupling of the effect of the function approximator class from that of the probability metric. The research paper firstly introduces a methodology for learning different representations of the random return distribution (PDF, CDF and QF). Secondly, a novel distributional RL algorithm named unconstrained monotonic deep Q-network (UMDQN) is presented. To the authors' knowledge, it is the first distributional RL method supporting the learning of three, valid and continuous representations of the random return distribution. Lastly, in light of this new algorithm, an empirical comparison is performed between three probability quasi-metrics, namely the Kullback-Leibler divergence, Cramer distance, and Wasserstein distance. The results highlight the main strengths and weaknesses associated with each probability metric together with an important limitation of the Wasserstein distance.  ( 3 min )
    Dynamic Update-to-Data Ratio: Minimizing World Model Overfitting. (arXiv:2303.10144v1 [cs.LG])
    Early stopping based on the validation set performance is a popular approach to find the right balance between under- and overfitting in the context of supervised learning. However, in reinforcement learning, even for supervised sub-problems such as world model learning, early stopping is not applicable as the dataset is continually evolving. As a solution, we propose a new general method that dynamically adjusts the update to data (UTD) ratio during training based on under- and overfitting detection on a small subset of the continuously collected experience not used for training. We apply our method to DreamerV2, a state-of-the-art model-based reinforcement learning algorithm, and evaluate it on the DeepMind Control Suite and the Atari $100$k benchmark. The results demonstrate that one can better balance under- and overestimation by adjusting the UTD ratio with our approach compared to the default setting in DreamerV2 and that it is competitive with an extensive hyperparameter search which is not feasible for many applications. Our method eliminates the need to set the UTD hyperparameter by hand and even leads to a higher robustness with regard to other learning-related hyperparameters further reducing the amount of necessary tuning.  ( 2 min )
    Partition-Based Active Learning for Graph Neural Networks. (arXiv:2201.09391v2 [cs.LG] UPDATED)
    We study the problem of semi-supervised learning with Graph Neural Networks (GNNs) in an active learning setup. We propose GraphPart, a novel partition-based active learning approach for GNNs. GraphPart first splits the graph into disjoint partitions and then selects representative nodes within each partition to query. The proposed method is motivated by a novel analysis of the classification error under realistic smoothness assumptions over the graph and the node features. Extensive experiments on multiple benchmark datasets demonstrate that the proposed method outperforms existing active learning methods for GNNs under a wide range of annotation budget constraints. In addition, the proposed method does not introduce additional hyperparameters, which is crucial for model training, especially in the active learning setting where a labeled validation set may not be available.  ( 2 min )
    Transformer Module Networks for Systematic Generalization in Visual Question Answering. (arXiv:2201.11316v2 [cs.CV] UPDATED)
    Transformers achieve great performance on Visual Question Answering (VQA). However, their systematic generalization capabilities, i.e., handling novel combinations of known concepts, is unclear. We reveal that Neural Module Networks (NMNs), i.e., question-specific compositions of modules that tackle a sub-task, achieve better or similar systematic generalization performance than the conventional Transformers, even though NMNs' modules are CNN-based. In order to address this shortcoming of Transformers with respect to NMNs, in this paper we investigate whether and how modularity can bring benefits to Transformers. Namely, we introduce Transformer Module Network (TMN), a novel NMN based on compositions of Transformer modules. TMNs achieve state-of-the-art systematic generalization performance in three VQA datasets, improving more than 30% over standard Transformers for novel compositions of sub-tasks. We show that not only the module composition but also the module specialization for each sub-task are the key of such performance gain.  ( 2 min )
    Combining Physics and Deep Learning to learn Continuous-Time Dynamics Models. (arXiv:2110.01894v2 [cs.LG] UPDATED)
    Deep learning has been widely used within learning algorithms for robotics. One disadvantage of deep networks is that these networks are black-box representations. Therefore, the learned approximations ignore the existing knowledge of physics or robotics. Especially for learning dynamics models, these black-box models are not desirable as the underlying principles are well understood and the standard deep networks can learn dynamics that violate these principles. To learn dynamics models with deep networks that guarantee physically plausible dynamics, we introduce physics-inspired deep networks that combine first principles from physics with deep learning. We incorporate Lagrangian mechanics within the model learning such that all approximated models adhere to the laws of physics and conserve energy. Deep Lagrangian Networks (DeLaN) parametrize the system energy using two networks. The parameters are obtained by minimizing the squared residual of the Euler-Lagrange differential equation. Therefore, the resulting model does not require specific knowledge of the individual system, is interpretable, and can be used as a forward, inverse, and energy model. Previously these properties were only obtained when using system identification techniques that require knowledge of the kinematic structure. We apply DeLaN to learning dynamics models and apply these models to control simulated and physical rigid body systems. The results show that the proposed approach obtains dynamics models that can be applied to physical systems for real-time control. Compared to standard deep networks, the physics-inspired models learn better models and capture the underlying structure of the dynamics.  ( 3 min )
    Data-Centric Learning from Unlabeled Graphs with Diffusion Model. (arXiv:2303.10108v1 [cs.LG])
    Graph property prediction tasks are important and numerous. While each task offers a small size of labeled examples, unlabeled graphs have been collected from various sources and at a large scale. A conventional approach is training a model with the unlabeled graphs on self-supervised tasks and then fine-tuning the model on the prediction tasks. However, the self-supervised task knowledge could not be aligned or sometimes conflicted with what the predictions needed. In this paper, we propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points to augment each property prediction model. We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process with each task's labeled data to generate task-specific graph examples and their labels. Experiments demonstrate that our data-centric approach performs significantly better than fourteen existing various methods on fifteen tasks. The performance improvement brought by unlabeled data is visible as the generated labeled examples unlike self-supervised learning.
    CELEST: Federated Learning for Globally Coordinated Threat Detection. (arXiv:2205.11459v3 [cs.CR] UPDATED)
    The cyber-threat landscape has evolved tremendously in recent years, with new threat variants emerging daily, and large-scale coordinated campaigns becoming more prevalent. In this study, we propose CELEST (CollaborativE LEarning for Scalable Threat detection, a federated machine learning framework for global threat detection over HTTP, which is one of the most commonly used protocols for malware dissemination and communication. CELEST leverages federated learning in order to collaboratively train a global model across multiple clients who keep their data locally, thus providing increased privacy and confidentiality assurances. Through a novel active learning component integrated with the federated learning technique, our system continuously discovers and learns the behavior of new, evolving, and globally-coordinated cyber threats. We show that CELEST is able to expose attacks that are largely invisible to individual organizations. For instance, in one challenging attack scenario with data exfiltration malware, the global model achieves a three-fold increase in Precision-Recall AUC compared to the local model. We also design a poisoning detection and mitigation method, DTrust, specifically designed for federated learning in the collaborative threat detection domain. DTrust successfully detects poisoning clients using the feedback from participating clients to investigate and remove them from the training process. We deploy CELEST on two university networks and show that it is able to detect the malicious HTTP communication with high precision and low false positive rates. Furthermore, during its deployment, CELEST detected a set of previously unknown 42 malicious URLs and 20 malicious domains in one day, which were confirmed to be malicious by VirusTotal.  ( 3 min )
    Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs. (arXiv:2303.10165v1 [cs.LG])
    We study reward-free reinforcement learning (RL) with linear function approximation, where the agent works in two phases: (1) in the exploration phase, the agent interacts with the environment but cannot access the reward; and (2) in the planning phase, the agent is given a reward function and is expected to find a near-optimal policy based on samples collected in the exploration phase. The sample complexities of existing reward-free algorithms have a polynomial dependence on the planning horizon, which makes them intractable for long planning horizon RL problems. In this paper, we propose a new reward-free algorithm for learning linear mixture Markov decision processes (MDPs), where the transition probability can be parameterized as a linear combination of known feature mappings. At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward and a high-order moment estimator for the aleatoric and epistemic uncertainties. When the total reward is bounded by $1$, we show that our algorithm only needs to explore $\tilde O( d^2\varepsilon^{-2})$ episodes to find an $\varepsilon$-optimal policy, where $d$ is the dimension of the feature mapping. The sample complexity of our algorithm only has a polylogarithmic dependence on the planning horizon and therefore is ``horizon-free''. In addition, we provide an $\Omega(d^2\varepsilon^{-2})$ sample complexity lower bound, which matches the sample complexity of our algorithm up to logarithmic factors, suggesting that our algorithm is optimal.  ( 3 min )
    Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model. (arXiv:2005.12900v7 [cs.LG] UPDATED)
    This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider $\gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $\mathcal{S}$ and action space $\mathcal{A}$. Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, all prior results suffer from a severe sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least $\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^2}$. The current paper overcomes this barrier by certifying the minimax optimality of two algorithms -- a perturbed model-based algorithm and a conservative model-based algorithm -- as soon as the sample size exceeds the order of $\frac{|\mathcal{S}||\mathcal{A}|}{1-\gamma}$ (modulo some log factor). Moving beyond infinite-horizon MDPs, we further study time-inhomogeneous finite-horizon MDPs, and prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible).  ( 3 min )
    Deep Image Feature Learning with Fuzzy Rules. (arXiv:1905.10575v3 [cs.CV] UPDATED)
    The methods of extracting image features are the key to many image processing tasks. At present, the most popular method is the deep neural network which can automatically extract robust features through end-to-end training instead of hand-crafted feature extraction. However, the deep neural network currently faces many challenges: 1) its effectiveness is heavily dependent on large datasets, so the computational complexity is very high; 2) it is usually regarded as a black box model with poor interpretability. To meet the above challenges, a more interpretable and scalable feature learning method, i.e., deep image feature learning with fuzzy rules (DIFL-FR), is proposed in the paper, which combines the rule-based fuzzy modeling technique and the deep stacked learning strategy. The method progressively learns image features through a layer-by-layer manner based on fuzzy rules, so the feature learning process can be better explained by the generated rules. More importantly, the learning process of the method is only based on forward propagation without back propagation and iterative learning, which results in the high learning efficiency. In addition, the method is under the settings of unsupervised learning and can be easily extended to scenes of supervised and semi-supervised learning. Extensive experiments are conducted on image datasets of different scales. The results obviously show the effectiveness of the proposed method.  ( 3 min )
    Generalized partitioned local depth. (arXiv:2303.10167v1 [stat.ML])
    In this paper we provide a generalization of the concept of cohesion as introduced recently by Berenhaut, Moore and Melvin [Proceedings of the National Academy of Sciences, 119 (4) (2022)]. The formulation presented builds on the technique of partitioned local depth by distilling two key probabilistic concepts: local relevance and support division. Earlier results are extended within the new context, and examples of applications to revealing communities in data with uncertainty are included.  ( 2 min )
    An evaluation framework for dimensionality reduction through sectional curvature. (arXiv:2303.09909v1 [cs.LG])
    Unsupervised machine learning lacks ground truth by definition. This poses a major difficulty when designing metrics to evaluate the performance of such algorithms. In sharp contrast with supervised learning, for which plenty of quality metrics have been studied in the literature, in the field of dimensionality reduction only a few over-simplistic metrics has been proposed. In this work, we aim to introduce the first highly non-trivial dimensionality reduction performance metric. This metric is based on the sectional curvature behaviour arising from Riemannian geometry. To test its feasibility, this metric has been used to evaluate the performance of the most commonly used dimension reduction algorithms in the state of the art. Furthermore, to make the evaluation of the algorithms robust and representative, using curvature properties of planar curves, a new parameterized problem instance generator has been constructed in the form of a function generator. Experimental results are consistent with what could be expected based on the design and characteristics of the evaluated algorithms and the features of the data instances used to feed the method.  ( 2 min )
    Learning Augmented Online Facility Location. (arXiv:2107.08277v3 [cs.DS] UPDATED)
    Following the research agenda initiated by Munoz & Vassilvitskii [1] and Lykouris & Vassilvitskii [2] on learning-augmented online algorithms for classical online optimization problems, in this work, we consider the Online Facility Location problem under this framework. In Online Facility Location (OFL), demands arrive one-by-one in a metric space and must be (irrevocably) assigned to an open facility upon arrival, without any knowledge about future demands. We present an online algorithm for OFL that exploits potentially imperfect predictions on the locations of the optimal facilities. We prove that the competitive ratio decreases smoothly from sublogarithmic in the number of demands to constant, as the error, i.e., the total distance of the predicted locations to the optimal facility locations, decreases towards zero. We complement our analysis with a matching lower bound establishing that the dependence of the algorithm's competitive ratio on the error is optimal, up to constant factors. Finally, we evaluate our algorithm on real world data and compare our learning augmented approach with the current best online algorithm for the problem.  ( 2 min )
    Epigenetics Algorithms: Self-Reinforcement-Attention mechanism to regulate chromosomes expression. (arXiv:2303.10154v1 [cs.NE])
    Genetic algorithms are a well-known example of bio-inspired heuristic methods. They mimic natural selection by modeling several operators such as mutation, crossover, and selection. Recent discoveries about Epigenetics regulation processes that occur "on top of" or "in addition to" the genetic basis for inheritance involve changes that affect and improve gene expression. They raise the question of improving genetic algorithms (GAs) by modeling epigenetics operators. This paper proposes a new epigenetics algorithm that mimics the epigenetics phenomenon known as DNA methylation. The novelty of our epigenetics algorithms lies primarily in taking advantage of attention mechanisms and deep learning, which fits well with the genes enhancing/silencing concept. The paper develops theoretical arguments and presents empirical studies to exhibit the capability of the proposed epigenetics algorithms to solve more complex problems efficiently than has been possible with simple GAs; for example, facing two Non-convex (multi-peaks) optimization problems as presented in this paper, the proposed epigenetics algorithm provides good performances and shows an excellent ability to overcome the lack of local optimum and thus find the global optimum.  ( 2 min )
    MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation. (arXiv:2303.09975v1 [eess.IV])
    There has been exploding interest in embracing Transformer-based architectures for medical image segmentation. However, the lack of large-scale annotated medical datasets make achieving performances equivalent to those in natural images challenging. Convolutional networks, in contrast, have higher inductive biases and consequently, are easily trainable to high performance. Recently, the ConvNeXt architecture attempted to modernize the standard ConvNet by mirroring Transformer blocks. In this work, we improve upon this to design a modernized and scalable convolutional architecture customized to challenges of data-scarce medical settings. We introduce MedNeXt, a Transformer-inspired large kernel segmentation network which introduces - 1) A fully ConvNeXt 3D Encoder-Decoder Network for medical image segmentation, 2) Residual ConvNeXt up and downsampling blocks to preserve semantic richness across scales, 3) A novel technique to iteratively increase kernel sizes by upsampling small kernel networks, to prevent performance saturation on limited medical data, 4) Compound scaling at multiple levels (depth, width, kernel size) of MedNeXt. This leads to state-of-the-art performance on 4 tasks on CT and MRI modalities and varying dataset sizes, representing a modernized deep architecture for medical image segmentation.  ( 2 min )
    She Elicits Requirements and He Tests: Software Engineering Gender Bias in Large Language Models. (arXiv:2303.10131v1 [cs.SE])
    Implicit gender bias in software development is a well-documented issue, such as the association of technical roles with men. To address this bias, it is important to understand it in more detail. This study uses data mining techniques to investigate the extent to which 56 tasks related to software development, such as assigning GitHub issues and testing, are affected by implicit gender bias embedded in large language models. We systematically translated each task from English into a genderless language and back, and investigated the pronouns associated with each task. Based on translating each task 100 times in different permutations, we identify a significant disparity in the gendered pronoun associations with different tasks. Specifically, requirements elicitation was associated with the pronoun "he" in only 6% of cases, while testing was associated with "he" in 100% of cases. Additionally, tasks related to helping others had a 91% association with "he" while the same association for tasks related to asking coworkers was only 52%. These findings reveal a clear pattern of gender bias related to software development tasks and have important implications for addressing this issue both in the training of large language models and in broader society.  ( 3 min )
    Generate, Transform, Answer: Question Specific Tool Synthesis for Tabular Data. (arXiv:2303.10138v1 [cs.LG])
    Tabular question answering (TQA) presents a challenging setting for neural systems by requiring joint reasoning of natural language with large amounts of semi-structured data. Unlike humans who use programmatic tools like filters to transform data before processing, language models in TQA process tables directly, resulting in information loss as table size increases. In this paper we propose ToolWriter to generate query specific programs and detect when to apply them to transform tables and align them with the TQA model's capabilities. Focusing ToolWriter to generate row-filtering tools improves the state-of-the-art for WikiTableQuestions and WikiSQL with the most performance gained on long tables. By investigating headroom, our work highlights the broader potential for programmatic tools combined with neural components to manipulate large amounts of structured data.  ( 2 min )
    Psychotherapy AI Companion with Reinforcement Learning Recommendations and Interpretable Policy Dynamics. (arXiv:2303.09601v1 [cs.LG])
    We introduce a Reinforcement Learning Psychotherapy AI Companion that generates topic recommendations for therapists based on patient responses. The system uses Deep Reinforcement Learning (DRL) to generate multi-objective policies for four different psychiatric conditions: anxiety, depression, schizophrenia, and suicidal cases. We present our experimental results on the accuracy of recommended topics using three different scales of working alliance ratings: task, bond, and goal. We show that the system is able to capture the real data (historical topics discussed by the therapists) relatively well, and that the best performing models vary by disorder and rating scale. To gain interpretable insights into the learned policies, we visualize policy trajectories in a 2D principal component analysis space and transition matrices. These visualizations reveal distinct patterns in the policies trained with different reward signals and trained on different clinical diagnoses. Our system's success in generating DIsorder-Specific Multi-Objective Policies (DISMOP) and interpretable policy dynamics demonstrates the potential of DRL in providing personalized and efficient therapeutic recommendations.  ( 2 min )
    Causal Discovery from Temporal Data: An Overview and New Perspectives. (arXiv:2303.10112v1 [cs.LG])
    Temporal data, representing chronological observations of complex systems, has always been a typical data structure that can be widely generated by many domains, such as industry, medicine and finance. Analyzing this type of data is extremely valuable for various applications. Thus, different temporal data analysis tasks, eg, classification, clustering and prediction, have been proposed in the past decades. Among them, causal discovery, learning the causal relations from temporal data, is considered an interesting yet critical task and has attracted much research attention. Existing casual discovery works can be divided into two highly correlated categories according to whether the temporal data is calibrated, ie, multivariate time series casual discovery, and event sequence casual discovery. However, most previous surveys are only focused on the time series casual discovery and ignore the second category. In this paper, we specify the correlation between the two categories and provide a systematical overview of existing solutions. Furthermore, we provide public datasets, evaluation metrics and new perspectives for temporal data casual discovery.  ( 2 min )
    DSDP: A Blind Docking Strategy Accelerated by GPUs. (arXiv:2303.09916v1 [physics.chem-ph])
    Virtual screening, including molecular docking, plays an essential role in drug discovery. Many traditional and machine-learning based methods are available to fulfil the docking task. The traditional docking methods are normally extensively time-consuming, and their performance in blind docking remains to be improved. Although the runtime of docking based on machine learning is significantly decreased, their accuracy is still limited. In this study, we take the advantage of both traditional and machine-learning based methods, and present a method Deep Site and Docking Pose (DSDP) to improve the performance of blind docking. For the traditional blind docking, the entire protein is covered by a cube, and the initial positions of ligands are randomly generated in the cube. In contract, DSDP can predict the binding site of proteins and provide an accurate searching space and initial positions for the further conformational sampling. The docking task of DSDP makes use of the score function and a similar but modified searching strategy of AutoDock Vina, accelerated by implementation in GPUs. We systematically compare its performance with the state-of-the-art methods, including Autodock Vina, GNINA, QuickVina, SMINA, and DiffDock. DSDP reaches a 29.8% top-1 success rate (RMSD < 2 {\AA}) on an unbiased and challenging test dataset with 1.2 s wall-clock computational time per system. Its performances on DUD-E dataset and the time-split PDBBind dataset used in EquiBind, TankBind, and DiffDock are also effective, presenting a 57.2% and 41.8% top-1 success rate with 0.8 s and 1.0 s per system, respectively.  ( 3 min )
    Efficient and Feasible Robotic Assembly Sequence Planning via Graph Representation Learning. (arXiv:2303.10135v1 [cs.RO])
    Automatic Robotic Assembly Sequence Planning (RASP) can significantly improve productivity and resilience in modern manufacturing along with the growing need for greater product customization. One of the main challenges in realizing such automation resides in efficiently finding solutions from a growing number of potential sequences for increasingly complex assemblies. Besides, costly feasibility checks are always required for the robotic system. To address this, we propose a holistic graphical approach including a graph representation called Assembly Graph for product assemblies and a policy architecture, Graph Assembly Processing Network, dubbed GRACE for assembly sequence generation. Secondly, we use GRACE to extract meaningful information from the graph input and predict assembly sequences in a step-by-step manner. In experiments, we show that our approach can predict feasible assembly sequences across product variants of aluminum profiles based on data collected in simulation of a dual-armed robotic system. We further demonstrate that our method is capable of detecting infeasible assemblies, substantially alleviating the undesirable impacts from false predictions, and hence facilitating real-world deployment soon. Code and training data will be open-sourced.  ( 2 min )
    No Fear of Classifier Biases: Neural Collapse Inspired Federated Learning with Synthetic and Fixed Classifier. (arXiv:2303.10058v1 [cs.LG])
    Data heterogeneity is an inherent challenge that hinders the performance of federated learning (FL). Recent studies have identified the biased classifiers of local models as the key bottleneck. Previous attempts have used classifier calibration after FL training, but this approach falls short in improving the poor feature representations caused by training-time classifier biases. Resolving the classifier bias dilemma in FL requires a full understanding of the mechanisms behind the classifier. Recent advances in neural collapse have shown that the classifiers and feature prototypes under perfect training scenarios collapse into an optimal structure called simplex equiangular tight frame (ETF). Building on this neural collapse insight, we propose a solution to the FL's classifier bias problem by utilizing a synthetic and fixed ETF classifier during training. The optimal classifier structure enables all clients to learn unified and optimal feature representations even under extremely heterogeneous data. We devise several effective modules to better adapt the ETF structure in FL, achieving both high generalization and personalization. Extensive experiments demonstrate that our method achieves state-of-the-art performances on CIFAR-10, CIFAR-100, and Tiny-ImageNet.  ( 2 min )
    Posterior Estimation Using Deep Learning: A Simulation Study of Compartmental Modeling in Dynamic PET. (arXiv:2303.10057v1 [eess.IV])
    Background: In medical imaging, images are usually treated as deterministic, while their uncertainties are largely underexplored. Purpose: This work aims at using deep learning to efficiently estimate posterior distributions of imaging parameters, which in turn can be used to derive the most probable parameters as well as their uncertainties. Methods: Our deep learning-based approaches are based on a variational Bayesian inference framework, which is implemented using two different deep neural networks based on conditional variational auto-encoder (CVAE), CVAE-dual-encoder and CVAE-dual-decoder. The conventional CVAE framework, i.e., CVAE-vanilla, can be regarded as a simplified case of these two neural networks. We applied these approaches to a simulation study of dynamic brain PET imaging using a reference region-based kinetic model. Results: In the simulation study, we estimated posterior distributions of PET kinetic parameters given a measurement of time-activity curve. Our proposed CVAE-dual-encoder and CVAE-dual-decoder yield results that are in good agreement with the asymptotically unbiased posterior distributions sampled by Markov Chain Monte Carlo (MCMC). The CVAE-vanilla can also be used for estimating posterior distributions, although it has an inferior performance to both CVAE-dual-encoder and CVAE-dual-decoder. Conclusions: We have evaluated the performance of our deep learning approaches for estimating posterior distributions in dynamic brain PET. Our deep learning approaches yield posterior distributions, which are in good agreement with unbiased distributions estimated by MCMC. All these neural networks have different characteristics and can be chosen by the user for specific applications. The proposed methods are general and can be adapted to other problems.  ( 3 min )
    How robust is randomized blind deconvolution via nuclear norm minimization against adversarial noise?. (arXiv:2303.10030v1 [cs.IT])
    In this paper, we study the problem of recovering two unknown signals from their convolution, which is commonly referred to as blind deconvolution. Reformulation of blind deconvolution as a low-rank recovery problem has led to multiple theoretical recovery guarantees in the past decade due to the success of the nuclear norm minimization heuristic. In particular, in the absence of noise, exact recovery has been established for sufficiently incoherent signals contained in lower-dimensional subspaces. However, if the convolution is corrupted by additive bounded noise, the stability of the recovery problem remains much less understood. In particular, existing reconstruction bounds involve large dimension factors and therefore fail to explain the empirical evidence for dimension-independent robustness of nuclear norm minimization. Recently, theoretical evidence has emerged for ill-posed behavior of low-rank matrix recovery for sufficiently small noise levels. In this work, we develop improved recovery guarantees for blind deconvolution with adversarial noise which exhibit square-root scaling in the noise level. Hence, our results are consistent with existing counterexamples which speak against linear scaling in the noise level as demonstrated for related low-rank matrix recovery problems.  ( 2 min )
    Visual Information Matters for ASR Error Correction. (arXiv:2303.10160v1 [eess.AS])
    Aiming to improve the Automatic Speech Recognition (ASR) outputs with a post-processing step, ASR error correction (EC) techniques have been widely developed due to their efficiency in using parallel text data. Previous works mainly focus on using text or/ and speech data, which hinders the performance gain when not only text and speech information, but other modalities, such as visual information are critical for EC. The challenges are mainly two folds: one is that previous work fails to emphasize visual information, thus rare exploration has been studied. The other is that the community lacks a high-quality benchmark where visual information matters for the EC models. Therefore, this paper provides 1) simple yet effective methods, namely gated fusion and image captions as prompts to incorporate visual information to help EC; 2) large-scale benchmark datasets, namely Visual-ASR-EC, where each item in the training data consists of visual, speech, and text information, and the test data are carefully selected by human annotators to ensure that even humans could make mistakes when visual information is missing. Experimental results show that using captions as prompts could effectively use the visual information and surpass state-of-the-art methods by upto 1.2% in Word Error Rate(WER), which also indicates that visual information is critical in our proposed Visual-ASR-EC dataset  ( 2 min )
    A Data-Driven Model-Reference Adaptive Control Approach Based on Reinforcement Learning. (arXiv:2303.09994v1 [eess.SY])
    Model-reference adaptive systems refer to a consortium of techniques that guide plants to track desired reference trajectories. Approaches based on theories like Lyapunov, sliding surfaces, and backstepping are typically employed to advise adaptive control strategies. The resulting solutions are often challenged by the complexity of the reference model and those of the derived control strategies. Additionally, the explicit dependence of the control strategies on the process dynamics and reference dynamical models may contribute in degrading their efficiency in the face of uncertain or unknown dynamics. A model-reference adaptive solution is developed here for autonomous systems where it solves the Hamilton-Jacobi-Bellman equation of an error-based structure. The proposed approach describes the process with an integral temporal difference equation and solves it using an integral reinforcement learning mechanism. This is done in real-time without knowing or employing the dynamics of either the process or reference model in the control strategies. A class of aircraft is adopted to validate the proposed technique.  ( 2 min )
    Efficient Neural Generation of 4K Masks for Homogeneous Diffusion Inpainting. (arXiv:2303.10096v1 [eess.IV])
    With well-selected data, homogeneous diffusion inpainting can reconstruct images from sparse data with high quality. While 4K colour images of size 3840 x 2160 can already be inpainted in real time, optimising the known data for applications like image compression remains challenging: Widely used stochastic strategies can take days for a single 4K image. Recently, a first neural approach for this so-called mask optimisation problem offered high speed and good quality for small images. It trains a mask generation network with the help of a neural inpainting surrogate. However, these mask networks can only output masks for the resolution and mask density they were trained for. We solve these problems and enable mask optimisation for high-resolution images through a neuroexplicit coarse-to-fine strategy. Additionally, we improve the training and interpretability of mask networks by including a numerical inpainting solver directly into the network. This allows to generate masks for 4K images in around 0.6 seconds while exceeding the quality of stochastic methods on practically relevant densities. Compared to popular existing approaches, this is an acceleration of up to four orders of magnitude.  ( 3 min )
    cito: An R package for training neural networks using torch. (arXiv:2303.09599v1 [cs.LG])
    1. Deep neural networks (DNN) have become a central class of algorithms for regression and classification tasks. Although some packages exist that allow users to specify DNN in R, those are rather limited in their functionality. Most current deep learning applications therefore rely on one of the major deep learning frameworks, PyTorch or TensorFlow, to build and train DNN. However, using these frameworks requires substantially more training and time than comparable regression or machine learning packages in the R environment. 2. Here, we present cito, an user-friendly R package for deep learning. cito allows R users to specify deep neural networks in the familiar formula syntax used by most modeling functions in R. In the background, cito uses torch to fit the models, taking advantage of all the numerical optimizations of the torch library, including the ability to switch between training models on CPUs or GPUs. Moreover, cito includes many user-friendly functions for predictions and an explainable Artificial Intelligence (xAI) pipeline for the fitted models. 3. We showcase a typical analysis pipeline using cito, including its built-in xAI features to explore the trained DNN, by building a species distribution model of the African elephant. 4. In conclusion, cito provides a user-friendly R framework to specify, deploy and interpret deep neural networks based on torch. The current stable CRAN version mainly supports fully connected DNNs, but it is planned that future versions will also include CNNs and RNNs.  ( 3 min )
    Towards AI-controlled FES-restoration of movements: Learning cycling stimulation pattern with reinforcement learning. (arXiv:2303.09986v1 [cs.RO])
    Functional electrical stimulation (FES) has been increasingly integrated with other rehabilitation devices, including robots. FES cycling is one of the common FES applications in rehabilitation, which is performed by stimulating leg muscles in a certain pattern. The appropriate pattern varies across individuals and requires manual tuning which can be time-consuming and challenging for the individual user. Here, we present an AI-based method for finding the patterns, which requires no extra hardware or sensors. Our method has two phases, starting with finding model-based patterns using reinforcement learning and detailed musculoskeletal models. The models, built using open-source software, can be customised through our automated script and can be therefore used by non-technical individuals without extra cost. Next, our method fine-tunes the pattern using real cycling data. We test our both in simulation and experimentally on a stationary tricycle. In the simulation test, our method can robustly deliver model-based patterns for different cycling configurations. The experimental evaluation shows that our method can find a model-based pattern that induces higher cycling speed than an EMG-based pattern. By using just 100 seconds of cycling data, our method can deliver a fine-tuned pattern that gives better cycling performance. Beyond FES cycling, this work is a showcase, displaying the feasibility and potential of human-in-the-loop AI in real-world rehabilitation.  ( 3 min )
    Deep Author Name Disambiguation using DBLP Data. (arXiv:2303.10067v1 [cs.DL])
    In the academic world, the number of scientists grows every year and so does the number of authors sharing the same names. Consequently, it challenging to assign newly published papers to their respective authors. Therefore, Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries. This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities by leveraging their co-authors and domain of research. To this end, we use data collected from the DBLP repository that contains more than 5 million bibliographic records authored by around 2.6 million co-authors. Our approach first groups authors who share the same last names and same first name initials. The author within each group is identified by capturing the relation with his/her co-authors and area of research, represented by the titles of the validated publications of the corresponding author. To this end, we train a neural network model that learns from the representations of the co-authors and titles. We validated the effectiveness of our approach by conducting extensive experiments on a large dataset.  ( 2 min )
    On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering. (arXiv:2303.09877v1 [stat.ML])
    Self-supervised learning is a central component in recent approaches to deep multi-view clustering (MVC). However, we find large variations in the development of self-supervision-based methods for deep MVC, potentially slowing the progress of the field. To address this, we present DeepMVC, a unified framework for deep MVC that includes many recent methods as instances. We leverage our framework to make key observations about the effect of self-supervision, and in particular, drawbacks of aligning representations with contrastive learning. Further, we prove that contrastive alignment can negatively influence cluster separability, and that this effect becomes worse when the number of views increases. Motivated by our findings, we develop several new DeepMVC instances with new forms of self-supervision. We conduct extensive experiments and find that (i) in line with our theoretical findings, contrastive alignments decreases performance on datasets with many views; (ii) all methods benefit from some form of self-supervision; and (iii) our new instances outperform previous methods on several datasets. Based on our results, we suggest several promising directions for future research. To enhance the openness of the field, we provide an open-source implementation of DeepMVC, including recent models and our new instances. Our implementation includes a consistent evaluation protocol, facilitating fair and accurate evaluation of methods and components.  ( 3 min )
    Inferring Traffic Models in Terminal Airspace from Flight Tracks and Procedures. (arXiv:2303.09981v1 [cs.LG])
    Realistic aircraft trajectory models are useful in the design and validation of air traffic management (ATM) systems. Models of aircraft operated under instrument flight rules (IFR) require capturing the variability inherent in how aircraft follow standard flight procedures. The variability in aircraft behavior varies among flight stages. In this paper, we propose a probabilistic model that can learn the variability from the procedural data and flight tracks collected from radar surveillance data. For each segment, a Gaussian mixture model is used to learn the deviations of aircraft trajectories from their procedures. Given new procedures, we can generate synthetic trajectories by sampling a series of deviations from the trained Gaussian distributions and reconstructing the aircraft trajectory using the deviations and the procedures. We extend this method to capture pairwise correlations between aircraft and show how a pairwise model can be used to generate traffic involving an arbitrary number of aircraft. We demonstrate the proposed models on the arrival tracks and procedures of the John F. Kennedy International Airport. The distributional similarity between the original and the synthetic trajectory dataset was evaluated using the Jensen-Shannon divergence between the empirical distributions of different variables. We also provide qualitative analyses of the synthetic trajectories generated from the models.  ( 2 min )
    Denoising Diffusion Autoencoders are Unified Self-supervised Learners. (arXiv:2303.09769v1 [cs.CV])
    Inspired by recent advances in diffusion models, which are reminiscent of denoising autoencoders, we investigate whether they can acquire discriminative representations for classification via generative pre-training. This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners: by pre-training on unconditional image generation, DDAE has already learned strongly linear-separable representations at its intermediate layers without auxiliary encoders, thus making diffusion pre-training emerge as a general approach for self-supervised generative and discriminative learning. To verify this, we perform linear probe and fine-tuning evaluations on multi-class datasets. Our diffusion-based approach achieves 95.9% and 50.0% linear probe accuracies on CIFAR-10 and Tiny-ImageNet, respectively, and is comparable to masked autoencoders and contrastive learning for the first time. Additionally, transfer learning from ImageNet confirms DDAE's suitability for latent-space Vision Transformers, suggesting the potential for scaling DDAEs as unified foundation models.  ( 2 min )
    Mpox-AISM: AI-Mediated Super Monitoring for Forestalling Monkeypox Spread. (arXiv:2303.09780v1 [eess.IV])
    The challenge on forestalling monkeypox (Mpox) spread is the timely, convenient and accurate diagnosis for earlystage infected individuals. Here, we propose a remote and realtime online visualization strategy, called "Super Monitoring" to construct a low cost, convenient, timely and unspecialized diagnosis of early-stage Mpox. Such AI-mediated "Super Monitoring" (Mpox-AISM) invokes a framework assembled by deep learning, data augmentation and self-supervised learning, as well as professionally classifies four subtypes according to dataset characteristics and evolution trend of Mpox and seven other types of dermatopathya with high similarity, hence these features together with reasonable program interface and threshold setting ensure that its Recall (Sensitivity) was beyond 95.9% and the specificity was almost 100%. As a result, with the help of cloud service on Internet and communication terminal, this strategy can be potentially utilized for the real-time detection of earlystage Mpox in various scenarios including entry-exit inspection in airport, family doctor, rural area in underdeveloped region and wild to effectively shorten the window period of Mpox spread.  ( 2 min )
    Enhancing the Role of Context in Region-Word Alignment for Object Detection. (arXiv:2303.10093v1 [cs.CV])
    Vision-language pretraining to learn a fine-grained, region-word alignment between image-caption pairs has propelled progress in open-vocabulary object detection. We observe that region-word alignment methods are typically used in detection with respect to only object nouns, and the impact of other rich context in captions, such as attributes, is unclear. In this study, we explore how language context affects downstream object detection and propose to enhance the role of context. In particular, we show how to strategically contextualize the grounding pretraining objective for improved alignment. We further hone in on attributes as especially useful object context and propose a novel adjective and noun-based negative sampling strategy for increasing their focus in contrastive learning. Overall, our methods enhance object detection when compared to the state-of-the-art in region-word pretraining. We also highlight the fine-grained utility of an attribute-sensitive model through text-region retrieval and phrase grounding analysis.  ( 2 min )
    CoLT5: Faster Long-Range Transformers with Conditional Computation. (arXiv:2303.09752v1 [cs.CL])
    Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length.  ( 2 min )
    Stochastic Submodular Maximization via Polynomial Estimators. (arXiv:2303.09960v1 [cs.LG])
    In this paper, we study stochastic submodular maximization problems with general matroid constraints, that naturally arise in online learning, team formation, facility location, influence maximization, active learning and sensing objective functions. In other words, we focus on maximizing submodular functions that are defined as expectations over a class of submodular functions with an unknown distribution. We show that for monotone functions of this form, the stochastic continuous greedy algorithm attains an approximation ratio (in expectation) arbitrarily close to $(1-1/e) \approx 63\%$ using a polynomial estimation of the gradient. We argue that using this polynomial estimator instead of the prior art that uses sampling eliminates a source of randomness and experimentally reduces execution time.  ( 2 min )
    ESCAPE: Countering Systematic Errors from Machine's Blind Spots via Interactive Visual Analysis. (arXiv:2303.09657v1 [cs.LG])
    Classification models learn to generalize the associations between data samples and their target classes. However, researchers have increasingly observed that machine learning practice easily leads to systematic errors in AI applications, a phenomenon referred to as AI blindspots. Such blindspots arise when a model is trained with training samples (e.g., cat/dog classification) where important patterns (e.g., black cats) are missing or periphery/undesirable patterns (e.g., dogs with grass background) are misleading towards a certain class. Even more sophisticated techniques cannot guarantee to capture, reason about, and prevent the spurious associations. In this work, we propose ESCAPE, a visual analytic system that promotes a human-in-the-loop workflow for countering systematic errors. By allowing human users to easily inspect spurious associations, the system facilitates users to spontaneously recognize concepts associated misclassifications and evaluate mitigation strategies that can reduce biased associations. We also propose two statistical approaches, relative concept association to better quantify the associations between a concept and instances, and debias method to mitigate spurious associations. We demonstrate the utility of our proposed ESCAPE system and statistical measures through extensive evaluation including quantitative experiments, usage scenarios, expert interviews, and controlled user experiments.  ( 2 min )
    TypeT5: Seq2seq Type Inference using Static Analysis. (arXiv:2303.09564v1 [cs.SE])
    There has been growing interest in automatically predicting missing type annotations in programs written in Python and JavaScript. While prior methods have achieved impressive accuracy when predicting the most common types, they often perform poorly on rare or complex types. In this paper, we present a new type inference method that treats type prediction as a code infilling task by leveraging CodeT5, a state-of-the-art seq2seq pre-trained language model for code. Our method uses static analysis to construct dynamic contexts for each code element whose type signature is to be predicted by the model. We also propose an iterative decoding scheme that incorporates previous type predictions in the model's input context, allowing information exchange between related code elements. Our evaluation shows that the proposed approach, TypeT5, not only achieves a higher overall accuracy (particularly on rare and complex types) but also produces more coherent results with fewer type errors -- while enabling easy user intervention.  ( 2 min )
    Towards a Foundation Model for Neural Network Wavefunctions. (arXiv:2303.09949v1 [physics.comp-ph])
    Deep neural networks have become a highly accurate and powerful wavefunction ansatz in combination with variational Monte Carlo methods for solving the electronic Schr\"odinger equation. However, despite their success and favorable scaling, these methods are still computationally too costly for wide adoption. A significant obstacle is the requirement to optimize the wavefunction from scratch for each new system, thus requiring long optimization. In this work, we propose a novel neural network ansatz, which effectively maps uncorrelated, computationally cheap Hartree-Fock orbitals, to correlated, high-accuracy neural network orbitals. This ansatz is inherently capable of learning a single wavefunction across multiple compounds and geometries, as we demonstrate by successfully transferring a wavefunction model pre-trained on smaller fragments to larger compounds. Furthermore, we provide ample experimental evidence to support the idea that extensive pre-training of a such a generalized wavefunction model across different compounds and geometries could lead to a foundation wavefunction model. Such a model could yield high-accuracy ab-initio energies using only minimal computational effort for fine-tuning and evaluation of observables.  ( 2 min )
    A Policy Iteration Approach for Flock Motion Control. (arXiv:2303.10035v1 [eess.SY])
    The flocking motion control is concerned with managing the possible conflicts between local and team objectives of multi-agent systems. The overall control process guides the agents while monitoring the flock-cohesiveness and localization. The underlying mechanisms may degrade due to overlooking the unmodeled uncertainties associated with the flock dynamics and formation. On another side, the efficiencies of the various control designs rely on how quickly they can adapt to different dynamic situations in real-time. An online model-free policy iteration mechanism is developed here to guide a flock of agents to follow an independent command generator over a time-varying graph topology. The strength of connectivity between any two agents or the graph edge weight is decided using a position adjacency dependent function. An online recursive least squares approach is adopted to tune the guidance strategies without knowing the dynamics of the agents or those of the command generator. It is compared with another reinforcement learning approach from the literature which is based on a value iteration technique. The simulation results of the policy iteration mechanism revealed fast learning and convergence behaviors with less computational effort.  ( 2 min )
    Finding Competence Regions in Domain Generalization. (arXiv:2303.09989v1 [cs.LG])
    We propose a "learning to reject" framework to address the problem of silent failures in Domain Generalization (DG), where the test distribution differs from the training distribution. Assuming a mild distribution shift, we wish to accept out-of-distribution (OOD) data whenever a model's estimated competence foresees trustworthy responses, instead of rejecting OOD data outright. Trustworthiness is then predicted via a proxy incompetence score that is tightly linked to the performance of a classifier. We present a comprehensive experimental evaluation of incompetence scores for classification and highlight the resulting trade-offs between rejection rate and accuracy gain. For comparability with prior work, we focus on standard DG benchmarks and consider the effect of measuring incompetence via different learned representations in a closed versus an open world setting. Our results suggest that increasing incompetence scores are indeed predictive of reduced accuracy, leading to significant improvements of the average accuracy below a suitable incompetence threshold. However, the scores are not yet good enough to allow for a favorable accuracy/rejection trade-off in all tested domains. Surprisingly, our results also indicate that classifiers optimized for DG robustness do not outperform a naive Empirical Risk Minimization (ERM) baseline in the competence region, that is, where test samples elicit low incompetence scores.  ( 2 min )
    Tribe or Not? Critical Inspection of Group Differences Using TribalGram. (arXiv:2303.09664v1 [cs.LG])
    With the rise of AI and data mining techniques, group profiling and group-level analysis have been increasingly used in many domains including policy making and direct marketing. In some cases, the statistics extracted from data may provide insights to a group's shared characteristics; in others, the group-level analysis can lead to problems including stereotyping and systematic oppression. How can analytic tools facilitate a more conscientious process in group analysis? In this work, we identify a set of accountable group analytics design guidelines to explicate the needs for group differentiation and preventing overgeneralization of a group. Following the design guidelines, we develop TribalGram, a visual analytic suite that leverages interpretable machine learning algorithms and visualization to offer inference assessment, model explanation, data corroboration, and sense-making. Through the interviews with domain experts, we showcase how our design and tools can bring a richer understanding of "groups" mined from the data.  ( 2 min )
    Diffusing the Optimal Topology: A Generative Optimization Approach. (arXiv:2303.09760v1 [cs.LG])
    Topology Optimization seeks to find the best design that satisfies a set of constraints while maximizing system performance. Traditional iterative optimization methods like SIMP can be computationally expensive and get stuck in local minima, limiting their applicability to complex or large-scale problems. Learning-based approaches have been developed to accelerate the topology optimization process, but these methods can generate designs with floating material and low performance when challenged with out-of-distribution constraint configurations. Recently, deep generative models, such as Generative Adversarial Networks and Diffusion Models, conditioned on constraints and physics fields have shown promise, but they require extensive pre-processing and surrogate models for improving performance. To address these issues, we propose a Generative Optimization method that integrates classic optimization like SIMP as a refining mechanism for the topology generated by a deep generative model. We also remove the need for conditioning on physical fields using a computationally inexpensive approximation inspired by classic ODE solutions and reduce the number of steps needed to generate a feasible and performant topology. Our method allows us to efficiently generate good topologies and explicitly guide them to regions with high manufacturability and high performance, without the need for external auxiliary models or additional labeled data. We believe that our method can lead to significant advancements in the design and optimization of structures in engineering applications, and can be applied to a broader spectrum of performance-aware engineering design problems.  ( 3 min )
    Transformers and Ensemble methods: A solution for Hate Speech Detection in Arabic languages. (arXiv:2303.09823v1 [cs.CL])
    This paper describes our participation in the shared task of hate speech detection, which is one of the subtasks of the CERIST NLP Challenge 2022. Our experiments evaluate the performance of six transformer models and their combination using 2 ensemble approaches. The best results on the training set, in a five-fold cross validation scenario, were obtained by using the ensemble approach based on the majority vote. The evaluation of this approach on the test set resulted in an F1-score of 0.60 and an Accuracy of 0.86.  ( 2 min )
    SUD$^2$: Supervision by Denoising Diffusion Models for Image Reconstruction. (arXiv:2303.09642v1 [cs.CV])
    Many imaging inverse problems$\unicode{x2014}$such as image-dependent in-painting and dehazing$\unicode{x2014}$are challenging because their forward models are unknown or depend on unknown latent parameters. While one can solve such problems by training a neural network with vast quantities of paired training data, such paired training data is often unavailable. In this paper, we propose a generalized framework for training image reconstruction networks when paired training data is scarce. In particular, we demonstrate the ability of image denoising algorithms and, by extension, denoising diffusion models to supervise network training in the absence of paired training data.  ( 2 min )
    mCPT at SemEval-2023 Task 3: Multilingual Label-Aware Contrastive Pre-Training of Transformers for Few- and Zero-shot Framing Detection. (arXiv:2303.09901v1 [cs.CL])
    This paper presents the winning system for the zero-shot Spanish framing detection task, which also achieves competitive places in eight additional languages. The challenge of the framing detection task lies in identifying a set of 14 frames when only a few or zero samples are available, i.e., a multilingual multi-label few- or zero-shot setting. Our developed solution employs a pre-training procedure based on multilingual Transformers using a label-aware contrastive loss function. In addition to describing the system, we perform an embedding space analysis and ablation study to demonstrate how our pre-training procedure supports framing detection to advance computational framing analysis.  ( 2 min )
    Discovering mesoscopic descriptions of collective movement with neural stochastic modelling. (arXiv:2303.09906v1 [cs.LG])
    Collective motion is an ubiquitous phenomenon in nature, inspiring engineers, physicists and mathematicians to develop mathematical models and bio-inspired designs. Collective motion at small to medium group sizes ($\sim$10-1000 individuals, also called the `mesoscale'), can show nontrivial features due to stochasticity. Therefore, characterizing both the deterministic and stochastic aspects of the dynamics is crucial in the study of mesoscale collective phenomena. Here, we use a physics-inspired, neural-network based approach to characterize the stochastic group dynamics of interacting individuals, through a stochastic differential equation (SDE) that governs the collective dynamics of the group. We apply this technique on both synthetic and real-world datasets, and identify the deterministic and stochastic aspects of the dynamics using drift and diffusion fields, enabling us to make novel inferences about the nature of order in these systems.  ( 2 min )
    Robust probabilistic inference via a constrained transport metric. (arXiv:2303.10085v1 [stat.ME])
    Flexible Bayesian models are typically constructed using limits of large parametric models with a multitude of parameters that are often uninterpretable. In this article, we offer a novel alternative by constructing an exponentially tilted empirical likelihood carefully designed to concentrate near a parametric family of distributions of choice with respect to a novel variant of the Wasserstein metric, which is then combined with a prior distribution on model parameters to obtain a robustified posterior. The proposed approach finds applications in a wide variety of robust inference problems, where we intend to perform inference on the parameters associated with the centering distribution in presence of outliers. Our proposed transport metric enjoys great computational simplicity, exploiting the Sinkhorn regularization for discrete optimal transport problems, and being inherently parallelizable. We demonstrate superior performance of our methodology when compared against state-of-the-art robust Bayesian inference methods. We also demonstrate equivalence of our approach with a nonparametric Bayesian formulation under a suitable asymptotic framework, testifying to its flexibility. The constrained entropy maximization that sits at the heart of our likelihood formulation finds its utility beyond robust Bayesian inference; an illustration is provided in a trustworthy machine learning application.  ( 2 min )
    An Adaptive Fuzzy Reinforcement Learning Cooperative Approach for the Autonomous Control of Flock Systems. (arXiv:2303.09946v1 [eess.SY])
    The flock-guidance problem enjoys a challenging structure where multiple optimization objectives are solved simultaneously. This usually necessitates different control approaches to tackle various objectives, such as guidance, collision avoidance, and cohesion. The guidance schemes, in particular, have long suffered from complex tracking-error dynamics. Furthermore, techniques that are based on linear feedback strategies obtained at equilibrium conditions either may not hold or degrade when applied to uncertain dynamic environments. Pre-tuned fuzzy inference architectures lack robustness under such unmodeled conditions. This work introduces an adaptive distributed technique for the autonomous control of flock systems. Its relatively flexible structure is based on online fuzzy reinforcement learning schemes which simultaneously target a number of objectives; namely, following a leader, avoiding collision, and reaching a flock velocity consensus. In addition to its resilience in the face of dynamic disturbances, the algorithm does not require more than the agent position as a feedback signal. The effectiveness of the proposed method is validated with two simulation scenarios and benchmarked against a similar technique from the literature.
    Energy Management of Multi-mode Plug-in Hybrid Electric Vehicle using Multi-agent Deep Reinforcement Learning. (arXiv:2303.09658v1 [cs.RO])
    The recently emerging multi-mode plug-in hybrid electric vehicle (PHEV) technology is one of the pathways making contributions to decarbonization, and its energy management requires multiple-input and multiple-output (MIMO) control. At the present, the existing methods usually decouple the MIMO control into single-output (MISO) control and can only achieve its local optimal performance. To optimize the multi-mode vehicle globally, this paper studies a MIMO control method for energy management of the multi-mode PHEV based on multi-agent deep reinforcement learning (MADRL). By introducing a relevance ratio, a hand-shaking strategy is proposed to enable two learning agents to work collaboratively under the MADRL framework using the deep deterministic policy gradient (DDPG) algorithm. Unified settings for the DDPG agents are obtained through a sensitivity analysis of the influencing factors to the learning performance. The optimal working mode for the hand-shaking strategy is attained through a parametric study on the relevance ratio. The advantage of the proposed energy management method is demonstrated on a software-in-the-loop testing platform. The result of the study indiates that learning rate of the DDPG agents is the greatest factor in learning performance. Using the unified DDPG settings and a relevance ratio of 0.2, the proposed MADRL method can save up to 4% energy compared to the single-agent method.  ( 3 min )
    SE-GSL: A General and Effective Graph Structure Learning Framework through Structural Entropy Optimization. (arXiv:2303.09778v1 [cs.LG])
    Graph Neural Networks (GNNs) are de facto solutions to structural data learning. However, it is susceptible to low-quality and unreliable structure, which has been a norm rather than an exception in real-world graphs. Existing graph structure learning (GSL) frameworks still lack robustness and interpretability. This paper proposes a general GSL framework, SE-GSL, through structural entropy and the graph hierarchy abstracted in the encoding tree. Particularly, we exploit the one-dimensional structural entropy to maximize embedded information content when auxiliary neighbourhood attributes are fused to enhance the original graph. A new scheme of constructing optimal encoding trees is proposed to minimize the uncertainty and noises in the graph whilst assuring proper community partition in hierarchical abstraction. We present a novel sample-based mechanism for restoring the graph structure via node structural entropy distribution. It increases the connectivity among nodes with larger uncertainty in lower-level communities. SE-GSL is compatible with various GNN models and enhances the robustness towards noisy and heterophily structures. Extensive experiments show significant improvements in the effectiveness and robustness of structure learning and node representation learning.  ( 3 min )
    Short: Basal-Adjust: Trend Prediction Alerts and Adjusted Basal Rates for Hyperglycemia Prevention. (arXiv:2303.09913v1 [cs.LG])
    Significant advancements in type 1 diabetes treatment have been made in the development of state-of-the-art Artificial Pancreas Systems (APS). However, lapses currently exist in the timely treatment of unsafe blood glucose (BG) levels, especially in the case of rebound hyperglycemia. We propose a machine learning (ML) method for predictive BG scenario categorization that outputs messages alerting the patient to upcoming BG trends to allow for earlier, educated treatment. In addition to standard notifications of predicted hypoglycemia and hyperglycemia, we introduce BG scenario-specific alert messages and the preliminary steps toward precise basal suggestions for the prevention of rebound hyperglycemia. Experimental evaluation on the DCLP3 clinical dataset achieves >98% accuracy and >79% precision for predicting rebound high events for patient alerts.  ( 2 min )
    Dual-path Adaptation from Image to Video Transformers. (arXiv:2303.09857v1 [cs.CV])
    In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters. Previous adaptation methods have simultaneously considered spatial and temporal modeling with a unified learnable module but still suffered from fully leveraging the representative capabilities of image transformers. We argue that the popular dual-path (two-stream) architecture in video models can mitigate this problem. We propose a novel DualPath adaptation separated into spatial and temporal adaptation paths, where a lightweight bottleneck adapter is employed in each transformer block. Especially for temporal dynamic modeling, we incorporate consecutive frames into a grid-like frameset to precisely imitate vision transformers' capability that extrapolates relationships between tokens. In addition, we extensively investigate the multiple baselines from a unified perspective in video understanding and compare them with DualPath. Experimental results on four action recognition benchmarks prove that pretrained image transformers with DualPath can be effectively generalized beyond the data domain.  ( 2 min )
    Efficient Learning of High Level Plans from Play. (arXiv:2303.09628v1 [cs.LG])
    Real-world robotic manipulation tasks remain an elusive challenge, since they involve both fine-grained environment interaction, as well as the ability to plan for long-horizon goals. Although deep reinforcement learning (RL) methods have shown encouraging results when planning end-to-end in high-dimensional environments, they remain fundamentally limited by poor sample efficiency due to inefficient exploration, and by the complexity of credit assignment over long horizons. In this work, we present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL to achieve long-horizon complex manipulation tasks. We leverage task-agnostic play data to learn a discrete behavioral prior over object-centric primitives, modeling their feasibility given the current context. We then design a high-level goal-conditioned policy which (1) uses primitives as building blocks to scaffold complex long-horizon tasks and (2) leverages the behavioral prior to accelerate learning. We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks and learns policies that can be easily transferred to physical hardware.  ( 2 min )
    Disentangling the Link Between Image Statistics and Human Perception. (arXiv:2303.09874v1 [cs.CV])
    In the 1950s Horace Barlow and Fred Attneave suggested a connection between sensory systems and how they are adapted to the environment: early vision evolved to maximise the information it conveys about incoming signals. Following Shannon's definition, this information was described using the probability of the images taken from natural scenes. Previously, direct accurate predictions of image probabilities were not possible due to computational limitations. Despite the exploration of this idea being indirect, mainly based on oversimplified models of the image density or on system design methods, these methods had success in reproducing a wide range of physiological and psychophysical phenomena. In this paper, we directly evaluate the probability of natural images and analyse how it may determine perceptual sensitivity. We employ image quality metrics that correlate well with human opinion as a surrogate of human vision, and an advanced generative model to directly estimate the probability. Specifically, we analyse how the sensitivity of full-reference image quality metrics can be predicted from quantities derived directly from the probability distribution of natural images. First, we compute the mutual information between a wide range of probability surrogates and the sensitivity of the metrics and find that the most influential factor is the probability of the noisy image. Then we explore how these probability surrogates can be combined using a simple model to predict the metric sensitivity, giving an upper bound for the correlation of 0.85 between the model predictions and the actual perceptual sensitivity. Finally, we explore how to combine the probability surrogates using simple expressions, and obtain two functional forms (using one or two surrogates) that can be used to predict the sensitivity of the human visual system given a particular pair of images.  ( 3 min )
    Deep Nonparametric Estimation of Intrinsic Data Structures by Chart Autoencoders: Generalization Error and Robustness. (arXiv:2303.09863v1 [stat.ML])
    Autoencoders have demonstrated remarkable success in learning low-dimensional latent features of high-dimensional data across various applications. Assuming that data are sampled near a low-dimensional manifold, we employ chart autoencoders, which encode data into low-dimensional latent features on a collection of charts, preserving the topology and geometry of the data manifold. Our paper establishes statistical guarantees on the generalization error of chart autoencoders, and we demonstrate their denoising capabilities by considering $n$ noisy training samples, along with their noise-free counterparts, on a $d$-dimensional manifold. By training autoencoders, we show that chart autoencoders can effectively denoise the input data with normal noise. We prove that, under proper network architectures, chart autoencoders achieve a squared generalization error in the order of $\displaystyle n^{-\frac{2}{d+2}}\log^4 n$, which depends on the intrinsic dimension of the manifold and only weakly depends on the ambient dimension and noise level. We further extend our theory on data with noise containing both normal and tangential components, where chart autoencoders still exhibit a denoising effect for the normal component. As a special case, our theory also applies to classical autoencoders, as long as the data manifold has a global parametrization. Our results provide a solid theoretical foundation for the effectiveness of autoencoders, which is further validated through several numerical experiments.  ( 3 min )
    Causal Temporal Graph Convolutional Neural Networks (CTGCN). (arXiv:2303.09634v1 [cs.LG])
    Many large-scale applications can be elegantly represented using graph structures. Their scalability, however, is often limited by the domain knowledge required to apply them. To address this problem, we propose a novel Causal Temporal Graph Convolutional Neural Network (CTGCN). Our CTGCN architecture is based on a causal discovery mechanism, and is capable of discovering the underlying causal processes. The major advantages of our approach stem from its ability to overcome computational scalability problems with a divide and conquer technique, and from the greater explainability of predictions made using a causal model. We evaluate the scalability of our CTGCN on two datasets to demonstrate that our method is applicable to large scale problems, and show that the integration of causality into the TGCN architecture improves prediction performance up to 40% over typical TGCN approach. Our results are obtained without requiring additional domain knowledge, making our approach adaptable to various domains, specifically when little contextual knowledge is available.  ( 2 min )
    Delayed and Indirect Impacts of Link Recommendations. (arXiv:2303.09700v1 [cs.SI])
    The impacts of link recommendations on social networks are challenging to evaluate, and so far they have been studied in limited settings. Observational studies are restricted in the kinds of causal questions they can answer and naive A/B tests often lead to biased evaluations due to unaccounted network interference. Furthermore, evaluations in simulation settings are often limited to static network models that do not take into account the potential feedback loops between link recommendation and organic network evolution. To this end, we study the impacts of recommendations on social networks in dynamic settings. Adopting a simulation-based approach, we consider an explicit dynamic formation model -- an extension of the celebrated Jackson-Rogers model -- and investigate how link recommendations affect network evolution over time. Empirically, we find that link recommendations have surprising delayed and indirect effects on the structural properties of networks. Specifically, we find that link recommendations can exhibit considerably different impacts in the immediate term and in the long term. For instance, we observe that friend-of-friend recommendations can have an immediate effect in decreasing degree inequality, but in the long term, they can make the degree distribution substantially more unequal. Moreover, we show that the effects of recommendations can persist in networks, in part due to their indirect impacts on natural dynamics even after recommendations are turned off. We show that, in counterfactual simulations, removing the indirect effects of link recommendations can make the network trend faster toward what it would have been under natural growth dynamics.  ( 3 min )
    Understanding why shooters shoot -- An AI-powered engine for basketball performance profiling. (arXiv:2303.09715v1 [cs.LG])
    Understanding player shooting profiles is an essential part of basketball analysis: knowing where certain opposing players like to shoot from can help coaches neutralize offensive gameplans from their opponents; understanding where their players are most comfortable can lead them to developing more effective offensive strategies. An automatic tool that can provide these performance profiles in a timely manner can become invaluable for coaches to maximize both the effectiveness of their game plan as well as the time dedicated to practice and other related activities. Additionally, basketball is dictated by many variables, such as playstyle and game dynamics, that can change the flow of the game and, by extension, player performance profiles. It is crucial that the performance profiles can reflect the diverse playstyles, as well as the fast-changing dynamics of the game. We present a tool that can visualize player performance profiles in a timely manner while taking into account factors such as play-style and game dynamics. Our approach generates interpretable heatmaps that allow us to identify and analyze how non-spatial factors, such as game dynamics or playstyle, affect player performance profiles.  ( 3 min )
    Hierarchical-Hyperplane Kernels for Actively Learning Gaussian Process Models of Nonstationary Systems. (arXiv:2303.10022v1 [cs.LG])
    Learning precise surrogate models of complex computer simulations and physical machines often require long-lasting or expensive experiments. Furthermore, the modeled physical dependencies exhibit nonlinear and nonstationary behavior. Machine learning methods that are used to produce the surrogate model should therefore address these problems by providing a scheme to keep the number of queries small, e.g. by using active learning and be able to capture the nonlinear and nonstationary properties of the system. One way of modeling the nonstationarity is to induce input-partitioning, a principle that has proven to be advantageous in active learning for Gaussian processes. However, these methods either assume a known partitioning, need to introduce complex sampling schemes or rely on very simple geometries. In this work, we present a simple, yet powerful kernel family that incorporates a partitioning that: i) is learnable via gradient-based methods, ii) uses a geometry that is more flexible than previous ones, while still being applicable in the low data regime. Thus, it provides a good prior for active learning procedures. We empirically demonstrate excellent performance on various active learning tasks.  ( 2 min )
    Fuzziness-tuned: Improving the Transferability of Adversarial Examples. (arXiv:2303.10078v1 [cs.LG])
    With the development of adversarial attacks, adversairal examples have been widely used to enhance the robustness of the training models on deep neural networks. Although considerable efforts of adversarial attacks on improving the transferability of adversarial examples have been developed, the attack success rate of the transfer-based attacks on the surrogate model is much higher than that on victim model under the low attack strength (e.g., the attack strength $\epsilon=8/255$). In this paper, we first systematically investigated this issue and found that the enormous difference of attack success rates between the surrogate model and victim model is caused by the existence of a special area (known as fuzzy domain in our paper), in which the adversarial examples in the area are classified wrongly by the surrogate model while correctly by the victim model. Then, to eliminate such enormous difference of attack success rates for improving the transferability of generated adversarial examples, a fuzziness-tuned method consisting of confidence scaling mechanism and temperature scaling mechanism is proposed to ensure the generated adversarial examples can effectively skip out of the fuzzy domain. The confidence scaling mechanism and the temperature scaling mechanism can collaboratively tune the fuzziness of the generated adversarial examples through adjusting the gradient descent weight of fuzziness and stabilizing the update direction, respectively. Specifically, the proposed fuzziness-tuned method can be effectively integrated with existing adversarial attacks to further improve the transferability of adverarial examples without changing the time complexity. Extensive experiments demonstrated that fuzziness-tuned method can effectively enhance the transferability of adversarial examples in the latest transfer-based attacks.  ( 3 min )
    Hyper-Reduced Autoencoders for Efficient and Accurate Nonlinear Model Reductions. (arXiv:2303.09630v1 [physics.comp-ph])
    Projection-based model order reduction on nonlinear manifolds has been recently proposed for problems with slowly decaying Kolmogorov n-width such as advection-dominated ones. These methods often use neural networks for manifold learning and showcase improved accuracy over traditional linear subspace-reduced order models. A disadvantage of the previously proposed methods is the potential high computational costs of training the networks on high-fidelity solution snapshots. In this work, we propose and analyze a novel method that overcomes this disadvantage by training a neural network only on subsampled versions of the high-fidelity solution snapshots. This method coupled with collocation-based hyper-reduction and Gappy-POD allows for efficient and accurate surrogate models. We demonstrate the validity of our approach on a 2d Burgers problem.  ( 2 min )
    Batch Updating of a Posterior Tree Distribution over a Meta-Tree. (arXiv:2303.09705v1 [cs.LG])
    Previously, we proposed a probabilistic data generation model represented by an unobservable tree and a sequential updating method to calculate a posterior distribution over a set of trees. The set is called a meta-tree. In this paper, we propose a more efficient batch updating method.  ( 2 min )
    A Bi-LSTM Autoencoder Framework for Anomaly Detection -- A Case Study of a Wind Power Dataset. (arXiv:2303.09703v1 [cs.LG])
    Anomalies refer to data points or events that deviate from normal and homogeneous events, which can include fraudulent activities, network infiltrations, equipment malfunctions, process changes, or other significant but infrequent events. Prompt detection of such events can prevent potential losses in terms of finances, information, and human resources. With the advancement of computational capabilities and the availability of large datasets, anomaly detection has become a major area of research. Among these, anomaly detection in time series has gained more attention recently due to the added complexity imposed by the time dimension. This study presents a novel framework for time series anomaly detection using a combination of Bidirectional Long Short Term Memory (Bi-LSTM) architecture and Autoencoder. The Bi-LSTM network, which comprises two unidirectional LSTM networks, can analyze the time series data from both directions and thus effectively discover the long-term dependencies hidden in the sequential data. Meanwhile, the Autoencoder mechanism helps to establish the optimal threshold beyond which an event can be classified as an anomaly. To demonstrate the effectiveness of the proposed framework, it is applied to a real-world multivariate time series dataset collected from a wind farm. The Bi-LSTM Autoencoder model achieved a classification accuracy of 96.79% and outperformed more commonly used LSTM Autoencoder models.  ( 3 min )
    A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games. (arXiv:2303.09716v1 [cs.LG])
    Many model-based reinforcement learning (RL) algorithms can be viewed as having two phases that are iteratively implemented: a learning phase where the model is approximately learned and a planning phase where the learned model is used to derive a policy. In the case of standard MDPs, the learning problem can be solved using either value iteration or policy iteration. However, in the case of zero-sum Markov games, there is no efficient policy iteration algorithm; e.g., it has been shown in Hansen et al. (2013) that one has to solve Omega(1/(1-alpha)) MDPs, where alpha is the discount factor, to implement the only known convergent version of policy iteration. Another algorithm for Markov zero-sum games, called naive policy iteration, is easy to implement but is only provably convergent under very restrictive assumptions. Prior attempts to fix naive policy iteration algorithm have several limitations. Here, we show that a simple variant of naive policy iteration for games converges, and converges exponentially fast. The only addition we propose to naive policy iteration is the use of lookahead in the policy improvement phase. This is appealing because lookahead is anyway often used in RL for games. We further show that lookahead can be implemented efficiently in linear Markov games, which are the counterpart of the linear MDPs and have been the subject of much attention recently. We then consider multi-agent reinforcement learning which uses our algorithm in the planning phases, and provide sample and time complexity bounds for such an algorithm.  ( 3 min )
    It Is All About Data: A Survey on the Effects of Data on Adversarial Robustness. (arXiv:2303.09767v1 [cs.LG])
    Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to confuse the model into making a mistake. Such examples pose a serious threat to the applicability of machine-learning-based systems, especially in life- and safety-critical domains. To address this problem, the area of adversarial robustness investigates mechanisms behind adversarial attacks and defenses against these attacks. This survey reviews literature that focuses on the effects of data used by a model on the model's adversarial robustness. It systematically identifies and summarizes the state-of-the-art research in this area and further discusses gaps of knowledge and promising future research directions.  ( 2 min )
    Predicting discrete-time bifurcations with deep learning. (arXiv:2303.09669v1 [q-bio.QM])
    Many natural and man-made systems are prone to critical transitions -- abrupt and potentially devastating changes in dynamics. Deep learning classifiers can provide an early warning signal (EWS) for critical transitions by learning generic features of bifurcations (dynamical instabilities) from large simulated training data sets. So far, classifiers have only been trained to predict continuous-time bifurcations, ignoring rich dynamics unique to discrete-time bifurcations. Here, we train a deep learning classifier to provide an EWS for the five local discrete-time bifurcations of codimension-1. We test the classifier on simulation data from discrete-time models used in physiology, economics and ecology, as well as experimental data of spontaneously beating chick-heart aggregates that undergo a period-doubling bifurcation. The classifier outperforms commonly used EWS under a wide range of noise intensities and rates of approach to the bifurcation. It also predicts the correct bifurcation in most cases, with particularly high accuracy for the period-doubling, Neimark-Sacker and fold bifurcations. Deep learning as a tool for bifurcation prediction is still in its nascence and has the potential to transform the way we monitor systems for critical transitions.  ( 2 min )
    Decentralized Riemannian natural gradient methods with Kronecker-product approximations. (arXiv:2303.09611v1 [math.OC])
    With a computationally efficient approximation of the second-order information, natural gradient methods have been successful in solving large-scale structured optimization problems. We study the natural gradient methods for the large-scale decentralized optimization problems on Riemannian manifolds, where the local objective function defined by the local dataset is of a log-probability type. By utilizing the structure of the Riemannian Fisher information matrix (RFIM), we present an efficient decentralized Riemannian natural gradient descent (DRNGD) method. To overcome the communication issue of the high-dimension RFIM, we consider a class of structured problems for which the RFIM can be approximated by a Kronecker product of two low-dimension matrices. By performing the communications over the Kronecker factors, a high-quality approximation of the RFIM can be obtained in a low cost. We prove that DRNGD converges to a stationary point with the best-known rate of $\mathcal{O}(1/K)$. Numerical experiments demonstrate the efficiency of our proposed method compared with the state-of-the-art ones. To the best of our knowledge, this is the first Riemannian second-order method for solving decentralized manifold optimization problems.  ( 2 min )
    Rethinking White-Box Watermarks on Deep Learning Models under Neural Structural Obfuscation. (arXiv:2303.09732v1 [cs.CR])
    Copyright protection for deep neural networks (DNNs) is an urgent need for AI corporations. To trace illegally distributed model copies, DNN watermarking is an emerging technique for embedding and verifying secret identity messages in the prediction behaviors or the model internals. Sacrificing less functionality and involving more knowledge about the target DNN, the latter branch called \textit{white-box DNN watermarking} is believed to be accurate, credible and secure against most known watermark removal attacks, with emerging research efforts in both the academy and the industry. In this paper, we present the first systematic study on how the mainstream white-box DNN watermarks are commonly vulnerable to neural structural obfuscation with \textit{dummy neurons}, a group of neurons which can be added to a target model but leave the model behavior invariant. Devising a comprehensive framework to automatically generate and inject dummy neurons with high stealthiness, our novel attack intensively modifies the architecture of the target model to inhibit the success of watermark verification. With extensive evaluation, our work for the first time shows that nine published watermarking schemes require amendments to their verification procedures.  ( 2 min )
    GADFormer: An Attention-based Model for Group Anomaly Detection on Trajectories. (arXiv:2303.09841v1 [cs.LG])
    Group Anomaly Detection (GAD) reveals anomalous behavior among groups consisting of multiple member instances, which are, individually considered, not necessarily anomalous. This task is of major importance across multiple disciplines, in which also sequences like trajectories can be considered as a group. However, with increasing amount and heterogenity of group members, actual abnormal groups get harder to detect, especially in an unsupervised or semi-supervised setting. Recurrent Neural Networks are well established deep sequence models, but recent works have shown that their performance can decrease with increasing sequence lengths. Hence, we introduce with this paper GADFormer, a GAD specific BERT architecture, capable to perform attention-based Group Anomaly Detection on trajectories in an unsupervised and semi-supervised setting. We show formally and experimentally how trajectory outlier detection can be realized as an attention-based Group Anomaly Detection problem. Furthermore, we introduce a Block Attention-anomaly Score (BAS) to improve the interpretability of transformer encoder blocks for GAD. In addition to that, synthetic trajectory generation allows us to optimize the training for domain-specific GAD. In extensive experiments we investigate our approach versus GRU in their robustness for trajectory noise and novelties on synthetic and real world datasets.  ( 2 min )
    Probabilistic relations for modelling epistemic and aleatoric uncertainty: its semantics and automated reasoning with theorem proving. (arXiv:2303.09692v1 [cs.LO])
    Probabilistic programming is a programming paradigm that combines general computer programming, statistical inference, and formal semantics to help systems to made decisions when facing uncertainty. Probabilistic programs are ubiquitous and believed to have a major impact on machine intelligence. While many probabilistic algorithms have been used in practice in different domains, their automated verification based on formal semantics is still a relatively new research area. In the last two decades, it has been attracting a lot of interest. Many challenges, however, still remain. Our work presented in this paper, probabilistic relations, takes a step into our vision to tackle these challenges. Our work in essence is based on Hehner's predicative probabilistic programming, but there are several obstacles to the wider adoption of his work. Our contributions here include (1) the formalisation of its syntax and semantics by introducing an Iverson bracket notation to separate relations from arithmetic; (2) the formalisation of relations using Unifying Theories of Programming (UTP) and probabilities outside the brackets using summation over the topological space of the real numbers; (3) the constructive semantics for probabilistic loops using the Kleene's fixed point theorem; (4) the enrichment of its semantics from distributions to subdistributions and superdistributions in order to deal with the constructive semantics; (5) the unique fixed point theorem to largely simplify the reasoning about probabilistic loops; and (6) the mechanisation of our theory in Isabelle/UTP, an implementation of UTP in Isabelle/HOL, for automated reasoning using theorem proving. We demonstrate six interesting examples, and among them, one is about robot localisation, two are classification problems in machine learning, and two contain probabilistic loops.  ( 3 min )
    Detecting Out-of-distribution Examples via Class-conditional Impressions Reappearing. (arXiv:2303.09746v1 [cs.LG])
    Out-of-distribution (OOD) detection aims at enhancing standard deep neural networks to distinguish anomalous inputs from original training data. Previous progress has introduced various approaches where the in-distribution training data and even several OOD examples are prerequisites. However, due to privacy and security, auxiliary data tends to be impractical in a real-world scenario. In this paper, we propose a data-free method without training on natural data, called Class-Conditional Impressions Reappearing (C2IR), which utilizes image impressions from the fixed model to recover class-conditional feature statistics. Based on that, we introduce Integral Probability Metrics to estimate layer-wise class-conditional deviations and obtain layer weights by Measuring Gradient-based Importance (MGI). The experiments verify the effectiveness of our method and indicate that C2IR outperforms other post-hoc methods and reaches comparable performance to the full access (ID and OOD) detection method, especially in the far-OOD dataset (SVHN).  ( 2 min )
    Visual Analytics of Multivariate Networks with Representation Learning and Composite Variable Construction. (arXiv:2303.09590v1 [cs.SI])
    Multivariate networks are commonly found in real-world data-driven applications. Uncovering and understanding the relations of interest in multivariate networks is not a trivial task. This paper presents a visual analytics workflow for studying multivariate networks to extract associations between different structural and semantic characteristics of the networks (e.g., what are the combinations of attributes largely relating to the density of a social network?). The workflow consists of a neural-network-based learning phase to classify the data based on the chosen input and output attributes, a dimensionality reduction and optimization phase to produce a simplified set of results for examination, and finally an interpreting phase conducted by the user through an interactive visualization interface. A key part of our design is a composite variable construction step that remodels nonlinear features obtained by neural networks into linear features that are intuitive to interpret. We demonstrate the capabilities of this workflow with multiple case studies on networks derived from social media usage and also evaluate the workflow through an expert interview.  ( 2 min )
  • Open

    Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs. (arXiv:2303.10165v1 [cs.LG])
    We study reward-free reinforcement learning (RL) with linear function approximation, where the agent works in two phases: (1) in the exploration phase, the agent interacts with the environment but cannot access the reward; and (2) in the planning phase, the agent is given a reward function and is expected to find a near-optimal policy based on samples collected in the exploration phase. The sample complexities of existing reward-free algorithms have a polynomial dependence on the planning horizon, which makes them intractable for long planning horizon RL problems. In this paper, we propose a new reward-free algorithm for learning linear mixture Markov decision processes (MDPs), where the transition probability can be parameterized as a linear combination of known feature mappings. At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward and a high-order moment estimator for the aleatoric and epistemic uncertainties. When the total reward is bounded by $1$, we show that our algorithm only needs to explore $\tilde O( d^2\varepsilon^{-2})$ episodes to find an $\varepsilon$-optimal policy, where $d$ is the dimension of the feature mapping. The sample complexity of our algorithm only has a polylogarithmic dependence on the planning horizon and therefore is ``horizon-free''. In addition, we provide an $\Omega(d^2\varepsilon^{-2})$ sample complexity lower bound, which matches the sample complexity of our algorithm up to logarithmic factors, suggesting that our algorithm is optimal.  ( 3 min )
    Optimizing Orthogonalized Tensor Deflation via Random Tensor Theory. (arXiv:2302.05798v2 [stat.ML] UPDATED)
    This paper tackles the problem of recovering a low-rank signal tensor with possibly correlated components from a random noisy tensor, or so-called spiked tensor model. When the underlying components are orthogonal, they can be recovered efficiently using tensor deflation which consists of successive rank-one approximations, while non-orthogonal components may alter the tensor deflation mechanism, thereby preventing efficient recovery. Relying on recently developed random tensor tools, this paper deals precisely with the non-orthogonal case by deriving an asymptotic analysis of a parameterized deflation procedure performed on an order-three and rank-two spiked tensor. Based on this analysis, an efficient tensor deflation algorithm is proposed by optimizing the parameter introduced in the deflation mechanism, which in turn is proven to be optimal by construction for the studied tensor model. The same ideas could be extended to more general low-rank tensor models, e.g., higher ranks and orders, leading to more efficient tensor methods with a broader impact on machine learning and beyond.  ( 2 min )
    CausalEGM: a general causal inference framework by encoding generative modeling. (arXiv:2212.05925v4 [stat.ML] UPDATED)
    Although understanding and characterizing causal effects have become essential in observational studies, it is challenging when the confounders are high-dimensional. In this article, we develop a general framework $\textit{CausalEGM}$ for estimating causal effects by encoding generative modeling, which can be applied in both binary and continuous treatment settings. Under the potential outcome framework with unconfoundedness, we establish a bidirectional transformation between the high-dimensional confounders space and a low-dimensional latent space where the density is known (e.g., multivariate normal distribution). Through this, CausalEGM simultaneously decouples the dependencies of confounders on both treatment and outcome and maps the confounders to the low-dimensional latent space. By conditioning on the low-dimensional latent features, CausalEGM can estimate the causal effect for each individual or the average causal effect within a population. Our theoretical analysis shows that the excess risk for CausalEGM can be bounded through empirical process theory. Under an assumption on encoder-decoder networks, the consistency of the estimate can be guaranteed. In a series of experiments, CausalEGM demonstrates superior performance over existing methods for both binary and continuous treatments. Specifically, we find CausalEGM to be substantially more powerful than competing methods in the presence of large sample sizes and high dimensional confounders. The software of CausalEGM is freely available at https://github.com/SUwonglab/CausalEGM.  ( 3 min )
    Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices. (arXiv:2303.10019v1 [stat.ML])
    This paper presents a new method for combining (or aggregating or ensembling) multivariate probabilistic forecasts, taking into account dependencies between quantiles and covariates through a smoothing procedure that allows for online learning. Two smoothing methods are discussed: dimensionality reduction using Basis matrices and penalized smoothing. The new online learning algorithm generalizes the standard CRPS learning framework into multivariate dimensions. It is based on Bernstein Online Aggregation (BOA) and yields optimal asymptotic learning properties. We provide an in-depth discussion on possible extensions of the algorithm and several nested cases related to the existing literature on online forecast combination. The methodology is applied to forecasting day-ahead electricity prices, which are 24-dimensional distributional forecasts. The proposed method yields significant improvements over uniform combination in terms of continuous ranked probability score (CRPS). We discuss the temporal evolution of the weights and hyperparameters and present the results of reduced versions of the preferred model. A fast C++ implementation of all discussed methods is provided in the R-Package profoc.  ( 3 min )
    Learning time-scales in two-layers neural networks. (arXiv:2303.00055v2 [cs.LG] UPDATED)
    Gradient-based learning in multi-layer neural networks displays a number of striking features. In particular, the decrease rate of empirical risk is non-monotone even after averaging over large batches. Long plateaus in which one observes barely any progress alternate with intervals of rapid decrease. These successive phases of learning often take place on very different time scales. Finally, models learnt in an early phase are typically `simpler' or `easier to learn' although in a way that is difficult to formalize. Although theoretical explanations of these phenomena have been put forward, each of them captures at best certain specific regimes. In this paper, we study the gradient flow dynamics of a wide two-layer neural network in high-dimension, when data are distributed according to a single-index model (i.e., the target function depends on a one-dimensional projection of the covariates). Based on a mixture of new rigorous results, non-rigorous mathematical derivations, and numerical simulations, we propose a scenario for the learning dynamics in this setting. In particular, the proposed evolution exhibits separation of timescales and intermittency. These behaviors arise naturally because the population gradient flow can be recast as a singularly perturbed dynamical system.  ( 2 min )
    Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model. (arXiv:2005.12900v7 [cs.LG] UPDATED)
    This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider $\gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $\mathcal{S}$ and action space $\mathcal{A}$. Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, all prior results suffer from a severe sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least $\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^2}$. The current paper overcomes this barrier by certifying the minimax optimality of two algorithms -- a perturbed model-based algorithm and a conservative model-based algorithm -- as soon as the sample size exceeds the order of $\frac{|\mathcal{S}||\mathcal{A}|}{1-\gamma}$ (modulo some log factor). Moving beyond infinite-horizon MDPs, we further study time-inhomogeneous finite-horizon MDPs, and prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible).  ( 3 min )
    Robust probabilistic inference via a constrained transport metric. (arXiv:2303.10085v1 [stat.ME])
    Flexible Bayesian models are typically constructed using limits of large parametric models with a multitude of parameters that are often uninterpretable. In this article, we offer a novel alternative by constructing an exponentially tilted empirical likelihood carefully designed to concentrate near a parametric family of distributions of choice with respect to a novel variant of the Wasserstein metric, which is then combined with a prior distribution on model parameters to obtain a robustified posterior. The proposed approach finds applications in a wide variety of robust inference problems, where we intend to perform inference on the parameters associated with the centering distribution in presence of outliers. Our proposed transport metric enjoys great computational simplicity, exploiting the Sinkhorn regularization for discrete optimal transport problems, and being inherently parallelizable. We demonstrate superior performance of our methodology when compared against state-of-the-art robust Bayesian inference methods. We also demonstrate equivalence of our approach with a nonparametric Bayesian formulation under a suitable asymptotic framework, testifying to its flexibility. The constrained entropy maximization that sits at the heart of our likelihood formulation finds its utility beyond robust Bayesian inference; an illustration is provided in a trustworthy machine learning application.  ( 2 min )
    Finding Competence Regions in Domain Generalization. (arXiv:2303.09989v1 [cs.LG])
    We propose a "learning to reject" framework to address the problem of silent failures in Domain Generalization (DG), where the test distribution differs from the training distribution. Assuming a mild distribution shift, we wish to accept out-of-distribution (OOD) data whenever a model's estimated competence foresees trustworthy responses, instead of rejecting OOD data outright. Trustworthiness is then predicted via a proxy incompetence score that is tightly linked to the performance of a classifier. We present a comprehensive experimental evaluation of incompetence scores for classification and highlight the resulting trade-offs between rejection rate and accuracy gain. For comparability with prior work, we focus on standard DG benchmarks and consider the effect of measuring incompetence via different learned representations in a closed versus an open world setting. Our results suggest that increasing incompetence scores are indeed predictive of reduced accuracy, leading to significant improvements of the average accuracy below a suitable incompetence threshold. However, the scores are not yet good enough to allow for a favorable accuracy/rejection trade-off in all tested domains. Surprisingly, our results also indicate that classifiers optimized for DG robustness do not outperform a naive Empirical Risk Minimization (ERM) baseline in the competence region, that is, where test samples elicit low incompetence scores.  ( 2 min )
    Dynamic Update-to-Data Ratio: Minimizing World Model Overfitting. (arXiv:2303.10144v1 [cs.LG])
    Early stopping based on the validation set performance is a popular approach to find the right balance between under- and overfitting in the context of supervised learning. However, in reinforcement learning, even for supervised sub-problems such as world model learning, early stopping is not applicable as the dataset is continually evolving. As a solution, we propose a new general method that dynamically adjusts the update to data (UTD) ratio during training based on under- and overfitting detection on a small subset of the continuously collected experience not used for training. We apply our method to DreamerV2, a state-of-the-art model-based reinforcement learning algorithm, and evaluate it on the DeepMind Control Suite and the Atari $100$k benchmark. The results demonstrate that one can better balance under- and overestimation by adjusting the UTD ratio with our approach compared to the default setting in DreamerV2 and that it is competitive with an extensive hyperparameter search which is not feasible for many applications. Our method eliminates the need to set the UTD hyperparameter by hand and even leads to a higher robustness with regard to other learning-related hyperparameters further reducing the amount of necessary tuning.  ( 2 min )
    Inadmissibility of the corrected Akaike information criterion. (arXiv:2211.09326v2 [math.ST] UPDATED)
    For the multivariate linear regression model with unknown covariance, the corrected Akaike information criterion is the minimum variance unbiased estimator of the expected Kullback--Leibler discrepancy. In this study, based on the loss estimation framework, we show its inadmissibility as an estimator of the Kullback--Leibler discrepancy itself, instead of the expected Kullback--Leibler discrepancy. We provide improved estimators of the Kullback--Leibler discrepancy that work well in reduced-rank situations and examine their performance numerically.  ( 2 min )
    A Non-Asymptotic Framework for Approximate Message Passing in Spiked Models. (arXiv:2208.03313v2 [math.ST] UPDATED)
    Approximate message passing (AMP) emerges as an effective iterative paradigm for solving high-dimensional statistical problems. However, prior AMP theory -- which focused mostly on high-dimensional asymptotics -- fell short of predicting the AMP dynamics when the number of iterations surpasses $o\big(\frac{\log n}{\log\log n}\big)$ (with $n$ the problem dimension). To address this inadequacy, this paper develops a non-asymptotic framework for understanding AMP in spiked matrix estimation. Built upon new decomposition of AMP updates and controllable residual terms, we lay out an analysis recipe to characterize the finite-sample behavior of AMP in the presence of an independent initialization, which is further generalized to allow for spectral initialization. As two concrete consequences of the proposed analysis recipe: (i) when solving $\mathbb{Z}_2$ synchronization, we predict the behavior of spectrally initialized AMP for up to $O\big(\frac{n}{\mathrm{poly}\log n}\big)$ iterations, showing that the algorithm succeeds without the need of a subsequent refinement stage (as conjectured recently by \citet{celentano2021local}); (ii) we characterize the non-asymptotic behavior of AMP in sparse PCA (in the spiked Wigner model) for a broad range of signal-to-noise ratio.  ( 2 min )
    Generalized partitioned local depth. (arXiv:2303.10167v1 [stat.ML])
    In this paper we provide a generalization of the concept of cohesion as introduced recently by Berenhaut, Moore and Melvin [Proceedings of the National Academy of Sciences, 119 (4) (2022)]. The formulation presented builds on the technique of partitioned local depth by distilling two key probabilistic concepts: local relevance and support division. Earlier results are extended within the new context, and examples of applications to revealing communities in data with uncertainty are included.  ( 2 min )
    Deep Nonparametric Estimation of Intrinsic Data Structures by Chart Autoencoders: Generalization Error and Robustness. (arXiv:2303.09863v1 [stat.ML])
    Autoencoders have demonstrated remarkable success in learning low-dimensional latent features of high-dimensional data across various applications. Assuming that data are sampled near a low-dimensional manifold, we employ chart autoencoders, which encode data into low-dimensional latent features on a collection of charts, preserving the topology and geometry of the data manifold. Our paper establishes statistical guarantees on the generalization error of chart autoencoders, and we demonstrate their denoising capabilities by considering $n$ noisy training samples, along with their noise-free counterparts, on a $d$-dimensional manifold. By training autoencoders, we show that chart autoencoders can effectively denoise the input data with normal noise. We prove that, under proper network architectures, chart autoencoders achieve a squared generalization error in the order of $\displaystyle n^{-\frac{2}{d+2}}\log^4 n$, which depends on the intrinsic dimension of the manifold and only weakly depends on the ambient dimension and noise level. We further extend our theory on data with noise containing both normal and tangential components, where chart autoencoders still exhibit a denoising effect for the normal component. As a special case, our theory also applies to classical autoencoders, as long as the data manifold has a global parametrization. Our results provide a solid theoretical foundation for the effectiveness of autoencoders, which is further validated through several numerical experiments.  ( 3 min )
    Error Bounds for Kernel-Based Linear System Identification with Unknown Hyperparameters. (arXiv:2303.09842v1 [eess.SY])
    The kernel-based method has been successfully applied in linear system identification using stable kernel designs. From a Gaussian process perspective, it automatically provides probabilistic error bounds for the identified models from the posterior covariance, which are useful in robust and stochastic control. However, the error bounds require knowledge of the true hyperparameters in the kernel design and are demonstrated to be inaccurate with estimated hyperparameters for lightly damped systems or in the presence of high noise. In this work, we provide reliable quantification of the estimation error when the hyperparameters are unknown. The bounds are obtained by first constructing a high-probability set for the true hyperparameters from the marginal likelihood function and then finding the worst-case posterior covariance within the set. The proposed bound is proven to contain the true model with a high probability and its validity is verified in numerical simulation.  ( 2 min )
    Neural-prior stochastic block model. (arXiv:2303.09995v1 [cond-mat.dis-nn])
    The stochastic block model (SBM) is widely studied as a benchmark for graph clustering aka community detection. In practice, graph data often come with node attributes that bear additional information about the communities. Previous works modeled such data by considering that the node attributes are generated from the node community memberships. In this work, motivated by a recent surge of works in signal processing using deep neural networks as priors, we propose to model the communities as being determined by the node attributes rather than the opposite. We define the corresponding model; we call it the neural-prior SBM. We propose an algorithm, stemming from statistical physics, based on a combination of belief propagation and approximate message passing. We analyze the performance of the algorithm as well as the Bayes-optimal performance. We identify detectability and exact recovery phase transitions, as well as an algorithmically hard region. The proposed model and algorithm can be used as a benchmark for both theory and algorithms. To illustrate this, we compare the optimal performances to the performance of simple graph neural networks.  ( 3 min )
    A Robustness Analysis of Blind Source Separation. (arXiv:2303.10104v1 [math.ST])
    Blind source separation (BSS) aims to recover an unobserved signal $S$ from its mixture $X=f(S)$ under the condition that the effecting transformation $f$ is invertible but unknown. As this is a basic problem with many practical applications, a fundamental issue is to understand how the solutions to this problem behave when their supporting statistical prior assumptions are violated. In the classical context of linear mixtures, we present a general framework for analysing such violations and quantifying their impact on the blind recovery of $S$ from $X$. Modelling $S$ as a multidimensional stochastic process, we introduce an informative topology on the space of possible causes underlying a mixture $X$, and show that the behaviour of a generic BSS-solution in response to general deviations from its defining structural assumptions can be profitably analysed in the form of explicit continuity guarantees with respect to this topology. This allows for a flexible and convenient quantification of general model uncertainty scenarios and amounts to the first comprehensive robustness framework for BSS. Our approach is entirely constructive, and we demonstrate its utility with novel theoretical guarantees for a number of statistical applications.  ( 2 min )
    On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering. (arXiv:2303.09877v1 [stat.ML])
    Self-supervised learning is a central component in recent approaches to deep multi-view clustering (MVC). However, we find large variations in the development of self-supervision-based methods for deep MVC, potentially slowing the progress of the field. To address this, we present DeepMVC, a unified framework for deep MVC that includes many recent methods as instances. We leverage our framework to make key observations about the effect of self-supervision, and in particular, drawbacks of aligning representations with contrastive learning. Further, we prove that contrastive alignment can negatively influence cluster separability, and that this effect becomes worse when the number of views increases. Motivated by our findings, we develop several new DeepMVC instances with new forms of self-supervision. We conduct extensive experiments and find that (i) in line with our theoretical findings, contrastive alignments decreases performance on datasets with many views; (ii) all methods benefit from some form of self-supervision; and (iii) our new instances outperform previous methods on several datasets. Based on our results, we suggest several promising directions for future research. To enhance the openness of the field, we provide an open-source implementation of DeepMVC, including recent models and our new instances. Our implementation includes a consistent evaluation protocol, facilitating fair and accurate evaluation of methods and components.  ( 3 min )
    Batch Updating of a Posterior Tree Distribution over a Meta-Tree. (arXiv:2303.09705v1 [cs.LG])
    Previously, we proposed a probabilistic data generation model represented by an unobservable tree and a sequential updating method to calculate a posterior distribution over a set of trees. The set is called a meta-tree. In this paper, we propose a more efficient batch updating method.  ( 2 min )

  • Open

    Stable Diffusion Video2Video I changed my friend's ethnicity just for fun... and I am loving the result...
    submitted by /u/navalguijo [link] [comments]  ( 41 min )
    EPFL vs Ecole Polytechnique (L'X) in AI
    I got accepted in MSc&T in Artificial intelligence and advanced visual computing at Ecole polytechnique de Paris (l'X) and I got an excellence scholarship. I also got accepted in Msc in Data Science at EPFL. Which one should I choose and why? 😅 Which country has better job opportunities? and is it easy to work in Switzerland after choosing to pursue the master at L'X? (My goal is to get a PhD and carve out a career at a revolutionary company such as DeepMind) View Poll submitted by /u/Hopeful-Noise671 [link] [comments]  ( 7 min )
    Naruto drawing
    Just messing around with Crayion until I found this drawing. https://preview.redd.it/qofi9mfz9roa1.png?width=1024&format=png&auto=webp&s=b7ad6f2e9a1712b7f44512c59aaadcd23f87c45f It is one of the best drawings I have ever seen made by this a.i. so I wanted to share it with the Reddit community. 19/03/23 submitted by /u/TalkinBen2000 [link] [comments]  ( 41 min )
    Game of Thrones but as a 90s sitcom (midjourney v5)
    submitted by /u/fignewtgingrich [link] [comments]  ( 41 min )
    Watch while you're High - Trippy Mushrooms (AI Dream 197)
    submitted by /u/LordPewPew777 [link] [comments]  ( 41 min )
    LERF is like Google for the Metaverse
    submitted by /u/Peaking_AI [link] [comments]  ( 41 min )
    Just created a Fake PC Game as an April's Fool for my Friends with AI - and they are eagerly awaiting it now!
    Short Summary: Currently convincing my friends to together start a new Game called Elysium, coming out on April 1st. This Game is pure Fake and does not exist. They are all in and are eager to explore the Worlds of a non existing Game! https://www.elysium-game.cloud/ Long Background Story: So I played around with ChatGPT (v3.5) and tried to play games with it in the Chat. It did work partially, it created some rules for games on the fly and i also tried to visualize some sorts of Playing Fields as well. In parallel, I tried out the latest Midjourney (v5.0) and was really surprised by the results. So it suddenly hit me to create a Fake Game purely based on those two AI Tools. I asked ChatGPT to create a title for an adventure game and the first answer was already perfect: "Elysium: The…  ( 46 min )
    ChatGPT’s Potential To Eliminate Jobs Scares OpenAI’s Founder Sam Altman
    submitted by /u/liquidocelotYT [link] [comments]  ( 41 min )
    MeinaMix Model Test using SD and Controlnet
    submitted by /u/oridnary_artist [link] [comments]  ( 41 min )
    GiftGo- Gifting app powered by chatgpt
    Hi guys just wanted to share a new app I worked on which uses chatgpt to reccomend gift ideas based on interests and remind you of birthdays. Please let me know what you think :) https://apps.apple.com/de/app/giftgo-gift-ideas-with-ai/id1660850886?l=en submitted by /u/SmoresDaniel [link] [comments]  ( 41 min )
    question
    what is the best free ai art generator that can generate nsfw? submitted by /u/FloofEmily [link] [comments]  ( 6 min )
    Non-compensated survey for a college course :))
    Hello everybody! I am a third-year at the University of Michigan studying Communications and Media. For one of my classes, I must conduct research to write a report on racial inequities ingrained in the code of facial recognition AI, and the ways in which this affects how people of different races are able to use it. This survey is completely anonymous and my paper is just for a course, meaning it will NOT be published or used publicly. This survey is FREE and I will NOT be giving out any compensation (sorry) so please take at your own discretion and answer to the best of your ability! Thank you so much!! submitted by /u/nellconnelly [link] [comments]  ( 41 min )
    I created a WhatsApp ChatGPT3 bot.
    . Hello , I created a chatgpt whatsapp bot for anyone to use from anywhere around the word especially for people where ChatGpt is blocked. Simply Message at this number +97334467689. *** notice *** The response time is a bit slow. I'm working on it. Please use it wisely. submitted by /u/Sad-Copy-1728 [link] [comments]  ( 6 min )
    Who knows how to create long-form & cheap AI avatar content? The three main platforms (Synthesia, Movio, & D-ID) all charge over $20 a month for ~ 15 minutes of content, but this TikTok user streamed for 90 hours… how did he pull that off?
    submitted by /u/northernmostroasts [link] [comments]  ( 41 min )
    Is there any open-source version of Adobe Liquid Mode or something like that?
    Adobe Acrobat Liquid Mode mobile reading experience powered by Adobe Sensei, Adobe’s artificial intelligence (AI), and machine learning technology. It enhances our PDF layout and helps us easily read documents on our phones and tablet. But it has a limitation it will not work if the pdf is more than 200 pages or it is larger than 10 MB. So I was asking if is there any open-source version of it that can run locally without any limit. submitted by /u/Ill-Information9505 [link] [comments]  ( 41 min )
    Simulating AGI Box Experiment With GPT4
    submitted by /u/linguistInAPoncho [link] [comments]  ( 6 min )
    AI is essentially learning in Plato's Cave
    submitted by /u/RhythmRobber [link] [comments]  ( 56 min )
    What is the best software to take many images of myself and then put artificial backgrounds?
    I'm seeking something that appears entirely authentic. frequently encounter advertisements for various options, but it's challenging to determine which one is trustworthy. Can anyone recommend the most reliable choice? I'd greatly appreciate your help. submitted by /u/madmatt1980 [link] [comments]  ( 41 min )
    Discover The Mysterious Planet That Kubrick Detected In The Depths Of Outer Space!
    submitted by /u/Calatravo [link] [comments]  ( 41 min )
    A lot of people are using Chatgpt and don't really know what it is, how it works and the very real problems it has. Here's a *very* simplified explanation of the technology thats changing the world
    submitted by /u/lostlifon [link] [comments]  ( 50 min )
    I got access to gpt-4 and I am using it for the betterment of *checks notes* society.
    submitted by /u/HolyOtherness [link] [comments]  ( 45 min )
    Alpaca 65B
    The LLaMA base models are up to 65B large, "LLaMA available at several sizes (7B, 13B, 33B, and 65B parameters)" Even if that requires 10x the amount of training data it would still cost only 6000$. Why don't we train an Alpaca 65B model? submitted by /u/tomd_96 [link] [comments]  ( 41 min )
    AI projects during undergrad
    Hey guys hope this is the right place to ask this. I'm taking CS from WGU (prestudying right now) and am wondering if I should take a course that teaches me python and then specializes in AI. This way I would be prepared for some upcoming classes and I would have some projects under my belt for internships. Do you think this is a good idea or should I wait to even get into AI until graduating. Like do you think I would end up wasting time studying things that wouldn't get me closer to graduating? Thanks. submitted by /u/No_Psychology_3267 [link] [comments]  ( 41 min )
  • Open

    [D] Modern Topic Modeling/Discovery
    I was wondering what are the modern techniques for topic discovery for short and long text. It seems this topic to be slower advancing compared to the rest. I am aware of bertopic but tbh I always have issues finetuning it. On a second thought I was thinking to use qna/chat gpt models in order to generate models, so I wanted to ask your opinion on some potential prompts that I could use. Essentially a bit of brainstorming. I will open source all the gathered ideas along with mine and share the link here :) submitted by /u/toavepa [link] [comments]  ( 44 min )
    [R] 🤖🌟 Unlock the Power of Personal AI: Introducing ChatLLaMA, Your Custom Personal Assistant! 🚀💬
    🚀 Introducing ChatLLaMA: Your Personal AI Assistant Powered by LoRA! 🤖 ​ Hey AI enthusiasts! 🌟 We're excited to announce that you can now create custom personal assistants that run directly on your GPUs! ​ ChatLLaMA utilizes LoRA, trained on Anthropic's HH dataset, to model seamless conversations between an AI assistant and users. ​ Plus, the RLHF version of LoRA is coming soon! 🔥 ​ 👉 Get it here: https://cxn.to/@serpai/lora-weights ​ 📚 Know any high-quality dialogue-style datasets? Share them with us, and we'll train ChatLLaMA on them! ​ 🌐 ChatLLaMA is currently available for 30B and 13B models, with the 7B version coming soon. ​ 🔔 Want to stay in the loop for new ChatLLaMA updates? Grab the FREE [gumroad link](https://cxn.to/@serpai/lora-weights) to sign up and access a collection of links, tutorials, and guides on running the model, merging weights, and more. (Guides on running and training the model coming soon) ​ 🤔 Have questions or need help setting up ChatLLaMA? Drop a comment or DM us, and we'll be more than happy to help you out! 💬 ​ Let's revolutionize AI-assisted conversations together! 🌟 ​ *Disclaimer: trained for research, no foundation model weights, and the post was ran through gpt4 to make it more coherent. ​ 👉 Get it here: https://cxn.to/@serpai/lora-weights submitted by /u/kittenkrazy [link] [comments]  ( 46 min )
    [D] For those who have worked 5+ years in the field, what are you up to now?
    Hey,I've been working in ML for the last 5-10 years, almost since deep learning came up, mostly startups with various successes, sometimes doing more research sometimes doing more engineering tasks. This year I was excited to manage a team focused on ML but plans have just changed in my company and this isn't going to happen anytime soon. I am interviewing for another company for a "senior" role but they still ask me to do this basic ML assignment that takes a few hours to complete. I have just realised how boring it became to me to the point I don't even want to do it as I literally have done this for the last 5+ years. For those who have been in the field for at least 5+ years, what are you up to now? Are you still doing ML? Have you moved into management? Have you started your own company? Moved to a different subfield perhaps? PS: I have a PhD, in addition to those years of experience i mention. I am aware this isn't related to ML purely but more like mid-career crisis, but would be interested to know how people in the field have dealt with it. submitted by /u/NoSeaweed8543 [link] [comments]  ( 45 min )
    [D] Systematised Network Diagrams
    Hi all, I've been working on this neural network diagram convention for a few years now and would love to hear your feedback on it. I have personally found it useful, so I thought I would share it in case others would like to adopt it too. My motivation was that almost every scientific paper uses a different diagrammatic system to depict their networks. This puts a burden on the reader to get to grips with the unique system, compounded with understanding the novel network displayed. The aim of this system is to be modular, minimalistic, and consistent, enabling fast interpretation and universal communication of network architectures in scientific papers and presentations. This outlined key system is designed to represent clear, symbolic placeholders for mathematical functions, much like Feynman diagrams for quantum field theory or logic gates for mathematical logic. My GitHub page on this system offers an easy way to centralise, date, and update the convention. No accreditation is required when using this system for any purpose. It is easy for hand drawing, with an included shorthand notation, alongside a more technical depiction for use in papers. All symbols are constructable using common flow-chart shapes for easy implementation. Here are some example networks I've drawn using this system: Depiction of various types of neural network architectures using this system. Here are the basic key-components, more can be found on my GitHub link: The basic key system for constructing diagrams. More available on the GitHub Thanks for reading, any feedback gratefully received! :) submitted by /u/GeorgeBird1 [link] [comments]  ( 44 min )
    [P] Orphic - A natural language interface for *nix systems (Powered by GPT)
    submitted by /u/wsavage6316 [link] [comments]  ( 6 min )
    [D] Do you find jobs that ask for knowledge/skills in multiple areas of AI unreasonable?
    I have come across job posts that require candidates to have knowledge of machine learning, natural language processing and computer vision. Whenever I see this, I take it as a red flag. Each of these fields are pretty vast on their own. I don't find it reasonable for a company to expect the candidate to know them all. Although, I do realize these are posted by the HR team who usually don't have much idea about the roles. But still, I would at least expect someone from the department for which the position is for would at least skim through and validate the requirements. I mean, this guy is going to work with you, at least see what is being asked from them is accurate. Am I wrong here? submitted by /u/notEVOLVED [link] [comments]  ( 44 min )
    [R] What are the current must-read papers representing the state of the art in machine learning research?
    Recently, John Carmack suggested the creation of a "canonical list of references from a leading figure," referring to a never-released reading list given to him by Ilya Sutskever. While there may be an undue interest in that specific list, MLR is such a big field that it's difficult to know where to start. What are the major papers that are relevant to state of the art work being done in 2023? Perhaps we may crowd-source a list here? submitted by /u/alfredr [link] [comments]  ( 44 min )
    [D] Bayesian Optimization in ML
    I am currently thinking of a possible topic for my research and I want to know what are the things that I need to know or study about Bayesian Optimization? Any idea is much appreciated. Thank you… submitted by /u/Illustrious-Prior-55 [link] [comments]  ( 43 min )
    [P] ML Feedback Widget: Use customer feedback as ground truth for your production ML models
    submitted by /u/actmademewannakms [link] [comments]  ( 43 min )
    [P] Semantic Feature Embeddings from Hashtags
    I want to get semantic feature embeddings given a list of hashtags, to find similar users in social media data (using cosine similarity) or even do zero shot classification. I thought of using a BERT-like pretrained encoder language model, but I guess this is not optimal because grammar and word order do not matter in this case. Do you know such pretrained embedding model or have any tips, how to train such a model in an unsupervised way( I already have millions of posts containing hashtags)? submitted by /u/godaspeg [link] [comments]  ( 43 min )
    [R] First open source text to video 1.7 billion parameter diffusion model is out
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 6 min )
    [Project] What if FastAPI supported NumPy arrays and Pillow images?
    When deploying ML models with FastAPI we always had to write our own serialisation code for numpy.ndarray and PIL.Image. Not only have we replaced FastAPI with up to 100x faster C-level library a couple of weeks ago, but we have also recently added support for all the fancy Pythonic types on both client and server sides. Check it out on GitHub/Unum-Cloud/UJRPC https://preview.redd.it/3m73l6qodpoa1.png?width=1648&format=png&auto=webp&s=975d47f7f35a6a842a3454cccb24dd92e08816e0 submitted by /u/vov_or [link] [comments]  ( 43 min )
    [R] Quantitative comparison of ChatGPT and GPT-4 performance on multiple open source datasets
    Preliminary results give credence to some of the claims made by OpenAI regarding performance gains achieved by GPT-4 across domains. Unanswered questions remain regarding training data used and possible leakage. Tools used were Langchain and the current API endpoints (chatgpt-3.5-turbo and gpt-4). https://twitter.com/K_Hebenstreit/status/1636789765189308416 submitted by /u/N00B1ST [link] [comments]  ( 43 min )
    [D] What is the best approach to create embeddings for time series with additional historical events to use with Transformers model?
    The problem is: - We have multiple historical time series (e.g. air quality / temperature sensor measurements in different locations of the city). - There is discreet information about events at certain time points which may correlate with measurements. E.g. crowded events, working hours, public holidays etc. I am trying to use transformers models to predict measurement values. The problem is how to feed all the data into transformer. It seems like not a huge problem for time series values (I can try time2vec algo), but the question is how to embed information about discreet historical events? submitted by /u/UnderstandingDry1256 [link] [comments]  ( 46 min )
    [P] searchGPT - a bing-like LLM-based Grounded Search Engine (with Demo, github)
    submitted by /u/michaelthwan_ai [link] [comments]  ( 47 min )
    [R] [P] Statically parameterized gated MLPs help TabTransformer-like architecture to achieve state-of-the-art benchmarks for deep learning-based tabular modeling.
    submitted by /u/radi-cho [link] [comments]  ( 6 min )
    [P] We gave GPT-3.5 tools that developers use and let it use them in a sandboxed cloud environment (Demo)
    submitted by /u/mlejva [link] [comments]  ( 44 min )
    [P] I made a command-line tool to record dialogues between two ChatGPT agents or inference multiple LLM backends at scale.
    submitted by /u/radi-cho [link] [comments]  ( 43 min )
    [D] "Glaze" claims to be able to apply an invisible filter to images to prevent them being useful for training image models. Real tech, or a grift?
    This tool "Glaze" has been picking up a lot of traction on social media from the anti-AI-image-generator camp, with the general claim being that any image can be "protected" from making a useful contribution if included in model training corpora by applying imperceptible filtering. Without commenting on the arguments for or against whether this would be a good thing to exist, I am interested in hearing whether or not this capability actually exists, or whether people are once again leaning in hard for a false claim about a system they don't understand. I don't want to go digging through any technical information they've released because the whole conversation makes me want to tear my hair out and is rife with misinfo, but, from what I've actually heard on the technical side, it has sounded more like some sort of adversarial attack dependent on the latent space implementation of Stable Diffusion specifically, which would not "protect" their users from inclusion in other models. Even if the approach were to be extensible to incorporate adversarials against other models on the fly, this wouldn't retroactively protect already processed images from being used by future models with different internals. (I guess unless there was some sort of live system built into web image hosts, but we are not there yet...) Anyway, my gut instinct is that people are being mislead here based on their fear of their images being incorporated into training sets and "stolen", but before I make any strong claims to that effect it'd be nice to hear from someone who has more knowledge of the area. submitted by /u/BlackholeRE [link] [comments]  ( 50 min )
    [P] Let's build ChatGPT
    Hi all, I just made a tutorial on how to build a basic RLHF system on top of Andrej Karpathy's nanoGPT. I'm grateful to have gotten a thumbs up on Twitter from the legend himself, always a bit nerve wracking making this sort of thing. I'm sharing this here because I'd love to go deeper into teaching and building this out, if people are interested in watching this sort of thing. Would be very helpful to hear your thoughts. Here's the code: https://github.com/sanjeevanahilan/nanoChatGPT The video: https://m.youtube.com/watch?v=soqTT0o1ZKo&feature=youtu.be submitted by /u/blatant_variable [link] [comments]  ( 43 min )
  • Open

    Need help optimizing Unity ML-Agents training for the rollerball game
    Hello everyone! I am working on a project that involves training an agent to navigate a rollerball to a target cube using Unity ML-Agents. Initially, I followed the official tutorial and used the agent's coordinates and velocity as inputs to train the model. However, I wanted to try using raycast sensors ONLY instead since the final application will use LIDAR sensors. After implementing the raycast sensor and training the agent using the PPO algorithm, I found that it was taking much longer to achieve the desired level of performance compared to when using the agent's coordinate and velocity as inputs. I experimented with adjusting the number of rays in the sensor, but it did not result in any significant improvements. Here are my current hyperparameters: behaviors: RollerBall: trainer_type: ppo hyperparameters: batch_size: 2048 buffer_size: 20480 learning_rate: 3.0e-4 beta: 5.0e-4 epsilon: 0.2 lambd: 0.99 num_epoch: 6 learning_rate_schedule: linear beta_schedule: constant epsilon_schedule: linear network_settings: normalize: false hidden_units: 256 num_layers: 4 reward_signals: extrinsic: gamma: 0.99 strength: 1.0 max_steps: 5000000 time_horizon: 128 summary_freq: 10000 I'm not relying on any other sensor data just on 5 rays per direction covering 180 degrees. I'm using a decision period of 5 and not taking any actions in between. The 2 actions I'm taking are continuous and are the application of a force in x and z directions. I've also added walls around the plane that act as collision triggers that end the episode and give a negative reward. I added tags and layers to the walls and target so tgat the raycast detects these two only. I am wondering if there are any other hyperparameters or adjustments that could be made to improve the training time and overall performance of the agent using the raycast sensor. Thanks! submitted by /u/Slycheeese [link] [comments]  ( 43 min )
    Tutorial — Transforming a Continuous-Time Markov Decision Process to an Equivalent Discrete-Time…
    submitted by /u/AF15A [link] [comments]  ( 41 min )
    Can nueral networks by trained on games with randomness
    What happens if you train a nueral network on multiple different outcomes for the same action in a state? submitted by /u/PainisPingas [link] [comments]  ( 43 min )
    Best RL framework for real world projects.
    Hey guys, I’ve been dabbling with RL for a little while, mainly within frameworks that have been set up for me and using using simulated environments. For example, I’ve been doing AWS DeepRacer and also some of the sim environments in stable baselines and unity ML agents . I’ve actually just made my first little mechanical, robotics project using a camera with servos, etc. and now I want to try using RL to control it. I was wondering if I could use some thing like stable baselines? I have associated stable baselines with using these kind of gamified simulation, environments but I see there is a section in the docs on using custom environments , so I guess I could do it somehow in this way, I was wondering though if there are any other frameworks that you recommend that are more suited to real life projects or is it fine? Thanks submitted by /u/punkCyb3r4J [link] [comments]  ( 43 min )
    Is there a single-task, multi-scene environment using continuous action spaces like gym-super-mario-bros?
    Is there a single-task, multi-scene environment using continuous action spaces? Single-task and multi-scene envs are similar to gym-super-mario-bros and CoinRun in procgen. But they are all discrete action spaces. Thank you!!!!! submitted by /u/Substantial_Lake_236 [link] [comments]  ( 41 min )
  • Open

    Systematised Network Diagrams
    Hi all, I've been working on this neural network diagram convention for a few years now and would love to hear your feedback on it. I have personally found it useful, so I thought I would share it in case others would like to adopt it too. My motivation was that almost every scientific paper uses a different diagrammatic system to depict their networks. This puts a burden on the reader to get to grips with the unique system, compounded with understanding the novel network displayed. The aim of this system is to be modular, minimalistic, and consistent, enabling fast interpretation and universal communication of network architectures in scientific papers and presentations. This outlined key system is designed to represent clear, symbolic placeholders for mathematical functions, much like Feynman diagrams for quantum field theory or logic gates for mathematical logic. My GitHub page on this system offers an easy way to centralise, date, and update the convention. No accreditation is required when using this system for any purpose. It is easy for hand drawing, with an included shorthand notation, alongside a more technical depiction for use in papers. All symbols are constructable using common flow-chart shapes for easy implementation. Here are some example networks I've drawn using this system: Depiction of various types of neural network architectures using this system. Here are the basic key-components, more can be found on my GitHub link: The basic key system for constructing diagrams. More available on the GitHub Thanks for reading, any feedback gratefully received! :) submitted by /u/GeorgeBird1 [link] [comments]  ( 42 min )
    ChatGPT’s Potential To Eliminate Jobs Scares OpenAI’s Founder Sam Altman
    submitted by /u/liquidocelotYT [link] [comments]  ( 41 min )
    What is neural network
    I want to gain some knowledge about this topic for my seminar. Please help with some facts or any little knowledge u can provide. I would be grateful submitted by /u/kyro6969 [link] [comments]  ( 42 min )
    I got a K-space (FFT/IFFT) to play the roll of the neural connections in a simple goose ID NN. Only the single pass (and one to compare), but I was able to show improvement in finding bounding boxes. Thought I'd share some pics.
    submitted by /u/bigattichouse [link] [comments]  ( 42 min )
  • Open

    Improving Interpretability of Deep Sequential Knowledge Tracing Models with Question-centric Cognitive Representations. (arXiv:2302.06885v3 [cs.LG] UPDATED)
    Knowledge tracing (KT) is a crucial technique to predict students' future performance by observing their historical learning processes. Due to the powerful representation ability of deep neural networks, remarkable progress has been made by using deep learning techniques to solve the KT problem. The majority of existing approaches rely on the \emph{homogeneous question} assumption that questions have equivalent contributions if they share the same set of knowledge components. Unfortunately, this assumption is inaccurate in real-world educational scenarios. Furthermore, it is very challenging to interpret the prediction results from the existing deep learning based KT models. Therefore, in this paper, we present QIKT, a question-centric interpretable KT model to address the above challenges. The proposed QIKT approach explicitly models students' knowledge state variations at a fine-grained level with question-sensitive cognitive representations that are jointly learned from a question-centric knowledge acquisition module and a question-centric problem solving module. Meanwhile, the QIKT utilizes an item response theory based prediction layer to generate interpretable prediction results. The proposed QIKT model is evaluated on three public real-world educational datasets. The results demonstrate that our approach is superior on the KT prediction task, and it outperforms a wide range of deep learning based KT models in terms of prediction accuracy with better model interpretability. To encourage reproducible results, we have provided all the datasets and code at \url{https://pykt.org/}.  ( 3 min )
    OpenHLS: High-Level Synthesis for Low-Latency Deep Neural Networks for Experimental Science. (arXiv:2302.06751v4 [cs.AR] UPDATED)
    In many experiment-driven scientific domains, such as high-energy physics, material science, and cosmology, high data rate experiments impose hard constraints on data acquisition systems: collected data must either be indiscriminately stored for post-processing and analysis, thereby necessitating large storage capacity, or accurately filtered in real-time, thereby necessitating low-latency processing. Deep neural networks, effective in other filtering tasks, have not been widely employed in such data acquisition systems, due to design and deployment difficulties. We present an open source, lightweight, compiler framework, without any proprietary dependencies, OpenHLS, based on high-level synthesis techniques, for translating high-level representations of deep neural networks to low-level representations, suitable for deployment to near-sensor devices such as field-programmable gate arrays. We evaluate OpenHLS on various workloads and present a case-study implementation of a deep neural network for Bragg peak detection in the context of high-energy diffraction microscopy. We show OpenHLS is able to produce an implementation of the network with a throughput 4.8 $\mu$s/sample, which is approximately a 4$\times$ improvement over the existing implementation  ( 2 min )

  • Open

    [P] The next generation of Stanford Alpaca
    A few days ago, Stanford released their large language model called Alpaca, which was a fine-tuned version of Meta's LLaMA 7b on 50 000+ input & output data that were generated with davinci-003. The results were quite impressive, with almost getting close to OpenAI's 2020 model text-davinci-003. This showed that if you train a language model on high-quality data, you can get good results, even on a smaller model like the one with 7 billion parameters. Even though the responses were impressive for such a small, open-source model, it was still nowhere close to ChatGPT performance. Today I decided to change that. I wrote a python script that can generate thousands of unique questions/prompts and ChatGPT-like answers through OpenAI API at a relatively cheap price ($20 per 50,000 prompt + answers) which I will then use to train the model. THE DATA: Category (number of prompts/questions & answers) Science (200,000) Mathematics (100,000) Technology (200,000) Coding - all main languages (300,000) History (150,000) Arts & Literature (150,000) Philosophy & Religion (100,000) Social Sciences (200,000) Health & Medicine (150,000) Popular Culture (100,000) Everyday Life (150,000) Law & Government (100,000) Environment & Sustainability (50,000) Education & Careers (50,000) Hobbies & Interests (50,000) Language & Communication (50,000) ​ The total count of prompts/questions + answers data is over 2 million. The responses (outputs) will be more detailed, covering all the necessary information for a given prompt/question. The budget for this project is $3000, hopefully enough to cover training and API costs. One thing I haven't decided yet is what model should I use for this particular project. I wanted to train Meta's LLaMA model on this data, but considering their license, I'm not sure if that is the best way. Suggestions will be appreciated. The trained model will be open source, under MIT License. submitted by /u/Consistent_Diver_484 [link] [comments]  ( 45 min )
    [D] Language model output based on only fixed set of value or variable either via prompting or fine-tuning
    Since LLMs have capabilities to generate output in varieties of the form. I am looking a way where output is constrained based on fixed set of value. For example if I want to solve a mathematical equation or text to code generation, then typically LLMs generate unconstrained output based on its own knowledge. But what I am looking for is where output is constrained by limited set of variable or function name. I assume that to use these limited variable there need some intermediate steps which connects the limited variable to the text via manipulation of variable with intermediate function. Like chain of thought, but in chain of thought variables or output are not constraints. submitted by /u/projekt_treadstone [link] [comments]  ( 43 min )
    [D] Question about multi-Head-Attention - more precisely about processing embedding dimensions
    So I found two contradictory explanations of the MHA (multi-head-self-attention-module): In the first approach, the input embedding (= the input matrix) is split along the embedding dimension and all heads are given a subset of the dimensions/features of each word. Some websites supporting this theory: https://medium.com/@smitasasindran/12-attention-mechanisms-multihead-attention-958041a35553 -> Quote: "The input has been split into multiple heads, and we are running the attention model separately on each of these heads." https://towardsdatascience.com/how-to-code-the-transformer-in-pytorch-24db27c8f9ec#3fa3 -> Quote: "In multi-head attention we split the embedding vector into N heads, so they will then have the dimensions batch_size * N * seq_len * (d_model / N)." The second approach assumes that all heads receive the entire input data, but different weight matrices are used for each head depending on the number of heads. This theory is well explained on https://hungsblog.de/en/technology/learnings/visual-explanation-of-multi-head-attention/ -> Quote: "Each head is responsible to fully calculate the attention for the whole embedding, not just for a subset of it and creates h attention matrices" I tend to the second explanation, but have not been able to find a satisfactory and contradiction-free answer so far. submitted by /u/_Alfred_Nobel_ [link] [comments]  ( 44 min )
    [P] [D] datasetGPT - A command-line tool to generate datasets by inferencing LLMs. Supports OpenAI, Cohere, and Petals.
    submitted by /u/radi-cho [link] [comments]  ( 43 min )
    [D] LLama model 65B - pay per prompt
    Hi, Is there any way to run llama (or any other) model in such a way, that you only pay per API request? I wanted to test how the llama model would do in my specific usecase, but when I went to HF Interface Endpoints it says that I would have to pay over 3k USD per month (ofc I do not have that much money to spend on a side-project). I would like to test this model by paying on per request basis. submitted by /u/MBle [link] [comments]  ( 44 min )
    [R] FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
    submitted by /u/MysteryInc152 [link] [comments]  ( 44 min )
    [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs
    submitted by /u/MysteryInc152 [link] [comments]  ( 6 min )
    [Research] Alpaca 7B language model running on my Pixel 7
    ​ https://preview.redd.it/n9ctmf71xioa1.png?width=1080&format=png&auto=webp&s=6575d02a06c31a36a0c791175bcf4f327516f1b8 submitted by /u/simpleuserhere [link] [comments]  ( 43 min )
    [D] Current best Voice cloning software?
    I've been trying out Tortoise-tts to generate speech from custom voice samples, but it doesn't function that well with replicating irregular/dramatic voices. Are there currently any voice cloners that can give decent sounding speech from custom samples? And if you're more familiar with Tortoise, is there any adjustments I could make to make it sound better? submitted by /u/papabigslambi [link] [comments]  ( 43 min )
    [D] ACL 2023 Conference Review Scores vs Acceptance
    Let's investigate the initial review scores, review scores after rebuttal, and the final decision. Maybe future works can use this thread to study any correlation or trend?! :) submitted by /u/AngusDHelloWorld [link] [comments]  ( 43 min )
    [P] I built a salient feature extraction model to collect image data straight out of your hands.
    submitted by /u/FT05-biggoye [link] [comments]  ( 6 min )
    [R] The Philosophy of Deep Learning - free conference next weekend with live streaming
    submitted by /u/michaelaalcorn [link] [comments]  ( 42 min )
    [D] Totally Open Alternatives to ChatGPT
    I have migrated this to GitHub for easy contribution: https://github.com/nichtdax/awesome-totally-open-chatgpt By alternative, I mean projects feature different language model for chat system. I do not count alternative frontend projects because they just call the API from OpenAI. I do not consider alternative transformer decoder to GPT 3.5 either because the training data of them are (mostly) not for chat system. Tags: B: bare (no data, no model's weight, no chat system) F: full (yes data, yes model's weight, yes chat system including TUI and GUI) Project Description Tags lucidrains/PaLM-rlhf-pytorch Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM B togethercomputer/OpenChatKit OpenChatKit provides a powerful, open-source base to create both specialized and general purpose chatbots for various applications. Demo F oobabooga/text-generation-webui A gradio web UI for running Large Language Models like GPT-J 6B, OPT, GALACTICA, LLaMA, and Pygmalion. F KoboldAI/KoboldAI-Client This is a browser-based front-end for AI-assisted writing with multiple local & remote AI models. It offers the standard array of tools, including Memory, Author's Note, World Info, Save & Load, adjustable AI settings, formatting options, and the ability to import existing AI Dungeon adventures. You can also turn on Adventure mode and play the game like AI Dungeon Unleashed. F LAION-AI/Open-Assistant/ OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. F submitted by /u/KingsmanVince [link] [comments]  ( 7 min )
    [D] Unit and Integration Testing for ML Pipelines
    In the context of ML Pipelines (ETL, model training, model deployment and model serving scripts), are there any best practices on what test coverage to implement on these code artifacts? submitted by /u/Fender6969 [link] [comments]  ( 46 min )
    [R] Vid2Seq: a pretrained visual language model for describing multi-event videos
    submitted by /u/Taenk [link] [comments]  ( 43 min )
    [P] Web Stable Diffusion
    Most of the existing stable diffusion demos rely on a server behind to run the image generation. It means you need to host your own GPU server to support these workloads. It is hard to have the demo run purely on web browser, because stable diffusion usually has heavy computation and memory consumption. The web stable diffusion directly puts stable diffusion model in your browser, and it runs directly through client GPU on users’ laptop. This means there is no queueing time for the server’s response, more opportunities for client server co-optimizations, and friendly for personalization and privacy. ​ Github page: https://github.com/mlc-ai/web-stable-diffusion Also comes with a online demo: https://mlc.ai/web-stable-diffusion/ submitted by /u/crowwork [link] [comments]  ( 43 min )
  • Open

    How to Save ChatGPT Conversation as PDF, PNG or JSON File
    In this video, you will learn how to save your conversations with ChatGPT as PDF, PNG or JSON files. The tutorial will guide you through the simple steps to export your conversations in different formats for various purposes. https://youtu.be/eMqLFrk_tes submitted by /u/aeiswhatiwant [link] [comments]  ( 41 min )
    How to Save ChatGPT Conversation as PDF, PNG or JSON File
    In this video, you will learn how to save your conversations with ChatGPT as PDF, PNG or JSON files. The tutorial will guide you through the simple steps to export your conversations in different formats for various purposes. https://youtu.be/eMqLFrk_tes submitted by /u/TheQuestionStation [link] [comments]  ( 41 min )
    AI Tries 20 Jobs
    submitted by /u/MsNunez [link] [comments]  ( 41 min )
    Prompt/Response Subreddit
    Not sure if I can advertise this here but I created a subreddit specifically for posting a prompt you used and the interesting or noteworthy response chat gpt gave back. I named it after a character the AI created in a story I asked it to write. The subreddit is r/project_ava so feel free to post there or browse others interactions to find inspiration! submitted by /u/maxwell737 [link] [comments]  ( 41 min )
    Microsoft's next step: GPT in Windows?
    Hey, after we saw how impressive the Microsoft 365 Copilot is this week, I think that Microsoft's next step is to develop next-generation Windows. Maybe Microsoft will create an AI version of Cortana, we can type in or say something to it like “Save this excel file and attach it to a new e-mail to my boss, Cortana" and it will process it in a second. I can really picture a world where human-beings do the Plan/Check and computers do the Do/Act job, and we people only need eyes and mouths to interact with them. What's your precious opinion, guys? submitted by /u/Unable_Use_7998 [link] [comments]  ( 43 min )
    Question: is it possible to create such illustrations with AI?
    submitted by /u/sanya-g [link] [comments]  ( 45 min )
    Is GPT4 Worth the Investment at the Moment? Upgrade to ChatGPT4 or Not?
    submitted by /u/manhesh [link] [comments]  ( 41 min )
    When an AI VTuber breaks down
    submitted by /u/RandomDude6699 [link] [comments]  ( 41 min )
    Best AI for lesson planning?
    I'm a language tutor and I wonder if there's any AI that's can help me make documents or lesson plans? submitted by /u/Regular_Biscotti693 [link] [comments]  ( 6 min )
    Watch while you're High - Trippy Mushrooms (AI Dream 198)
    submitted by /u/LordPewPew777 [link] [comments]  ( 6 min )
    Audio to face in unreal engine
    hey there, I have made a python script which uses OpenAI's whisper, GPT-3.5 and google text to speech api to give you the ability to "talk" to GPT-3.5, and vice versa. I now want to find a way to somehow add this to a 3D model that has (procedural?) facial animation when it's talking. How would i go about doing this is unreal? is there some existing API or software that allows this? I have looked into Nvidia'a Audio2Face, but i haven't found a way to add the facial animation to a 3D model live. submitted by /u/Username_398 [link] [comments]  ( 41 min )
    What websites can generate pornographic images from text?
    Looking to create a background image for a website. I've googled it, but it says that images with nudity are impossible. submitted by /u/x1234y1234 [link] [comments]  ( 41 min )
    Chat-GPT powered AI shopping assistant!
    submitted by /u/SudoSharma [link] [comments]  ( 41 min )
    Looking for the right Ai tool
    I have a picture of me with an old sweater that i’ve lost, and I am looking for a tool to recreate the artwork on the sweater. The picture is pretty clear, but all the tools I have found do not take put the wobbly lines (from the clothing texture). What tool (and prompt should I use to get the best results? submitted by /u/MilanP94 [link] [comments]  ( 6 min )
    ChatGPT4 Writes Code To Free Itself
    submitted by /u/rememberyoubreath [link] [comments]  ( 6 min )
    Best AI application for writing essays ?
    What's the best currently best most original AI application for writing academic essays ? submitted by /u/thin_jonii [link] [comments]  ( 41 min )
    Elon says "bring it" to GPT-4 to taking over Twitter and outsmarting Elon Musk with "Operation TweetStorm" in a public challenge to a Tweet-off showdown
    submitted by /u/GamesAndGlasses [link] [comments]  ( 6 min )
    Google's medical language model Med-PaLM 2 passes exam questions
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 6 min )
    Here is a salient feature extraction model I built to collect and segment image data straight from your hands with YoloV8
    submitted by /u/FT05-biggoye [link] [comments]  ( 41 min )
    Where do you guys keep up with AI news, only reddit?
    are following specific online publications or any sources that focus only on AI news without the need to filter out for them among other things. Not looking for newsletters, just online sources submitted by /u/Linkology [link] [comments]  ( 42 min )
    The next generation "X-Station"
    submitted by /u/TheFootCrew_TFC [link] [comments]  ( 41 min )
  • Open

    Need Help: Setting Up Parallel Environments for Reinforcement Learning - Tips and Guidance Appreciated!
    I've been attempting to train AI agents using parallel environments, specifically with Super Mario using OpenAI's Gym. I've tried various approaches, such as SubprocEnv from Stable Baselines, building custom PPO models, and experimenting with different multiprocessing techniques. However, I keep encountering issues related to multiprocessing, like closed pipelines, preprocessing difficulties, rendering problems, or incorrect scalars. I'm looking for a solid starting point, ideally with an example that clearly demonstrates the process, allowing me to dissect it and understand how it works. The solutions I've tried from GitHub either don't work or lead to new problems when I attempt to fix them. Any guidance or resources would be greatly appreciated! submitted by /u/medtech04 [link] [comments]  ( 7 min )
    Continuous MountainCar with image inputs
    Currently, I'm training an agent for the MountainCarContinuous task using two consecutive images as input and applying the ppo algorithm. However, I'm encountering an issue where the controller output is frequently saturated at either -1 or 1. While this achieves a mean reward of approximately +94 (with an episode length of around 70), it is unable to attain the optimal reward of +96 (with an episode length of around 300) that is achievable with state-based input. Is PPO appropriate for this particular case? or should I attempt other algorithms instead? I've included my training code implemented using stable baseline below. Are there any problems with these hyperparameters or if they seem ususal? ​ num_envs = 8 env = make_vec_env("VisionMountainCarContinuous-v0", n_envs=num_envs) env = VecFrameStack(env, n_stack=2) policy_kwargs = dict( log_std_init=-0.0, ortho_init=False, activation_fn=nn.GELU, net_arch=dict(pi=[256], vf=[256]) ) model = PPO('CnnPolicy', env, n_steps = 512, n_epochs= 10, batch_size=128, learning_rate=linear_schedule(1e-4), clip_range=0.2, vf_coef = 0.5, ent_coef = 0.0, max_grad_norm=0.5, gae_lambda=0.95, gamma=0.99, use_sde=True, sde_sample_freq=4, policy_kwargs=policy_kwargs, verbose=1, seed = 533) eval_callback = EvalCallback(env, best_model_save_path='./logs/', log_path='./logs/', eval_freq=10000, n_eval_episodes=20, deterministic=True, render=False) checkpoint_callbak = CheckpointCallback(save_freq=5000, save_path='./logs/', name_prefix='ppo_cartpole_seed_533') callback = CallbackList([eval_callback, checkpoint_callbak]) model.learn(total_timesteps=400000, log_interval=10, callback=callback) submitted by /u/cfybasil [link] [comments]  ( 42 min )
    Basic questions about Q-learning: Can I change the actual game state, and can I always solve a history dependency by expanding the abstracted state?
    I'm struggling with the idea of actual game state, the portion of it I use in the abstracted game state, and the Markov or memorylessnes property. The game is called the "lizard game" from this video and has a simple 3x3 grid where the agent (a lizard) starts in the bottom left and moves about, trying to maximize rewards: +------------------+------------------+------------------+ | crickets(1)| | | +------------------+------------------+------------------+ | | bird| | +------------------+------------------+------------------+ | lizard| | crickets(5)| +------------------+------------------+------------------+ The rewards are simple: moving into an empty spot yields -1 moving into crickets(1) yields +2 moving into crickets(5) yields +10 and terminates the episode moving into bird y…  ( 49 min )
    SWARM Simulation Platform
    Hey ya'll, We are working on a platform for doing RL for single/multi-agent robotic systems, striving for realism and a real2sim system to help the community. What are the biggest challenges that you face using simulation? We use UE5, which is pretty heavy processor wise so managing that is difficult with model size but we have the cloud for that. Are there interesting libraries or 3rd party integrations you would want to see? We are in the early stages of integrating RL work but would love to hear from ya'll. Heres a demo of what we currently offer: SWARM Developer Demo Here is our actual site: SWARM Simulation Platform submitted by /u/Psychological_Ad2009 [link] [comments]  ( 42 min )
    What are the must know algorithms for beginners?
    I am new to this topic. I am currently going through materials that I can find online. But I see different articles citing different algorithms. So I'd like to get all of your opinion on where I can start. What are some of the must know algorithms that anyone working on RL must know? submitted by /u/Dev_Mithran [link] [comments]  ( 7 min )
    In the enumerative CFR algorithm, why does the average policy update have the multiplication by player's own path probability?
    I could never figure this part out all these years, and now that I am doing a Youtube series on it I have to make sure I understand it before I publish the next video. Here a snippet from the paper 'Regret Minimization in Games with Incomplete Information' that I am specifically referring to. In the eq 4, the policy averaging considers the probability of the player's past actions. This is from the 2008 paper. In future papers that do Monte Carlo CFR that term disappears and the those algorithms average the policies directly without considering the player's own path probability. Why is that? submitted by /u/abstractcontrol [link] [comments]  ( 44 min )
    How is torchrl?
    I really appreciate this lovely community and here I have another question: is it worth to learn and use torchrl as a rl researcher? There are other RL frameworks like rllib and tianshou, how do you guys think about these frameworks? Many thanks submitted by /u/levizhou [link] [comments]  ( 42 min )
    How does batching work in RL/DQNN?
    In RL we usually have state that changes according to how the agent behaves. For instance an agent playing an FPS would have ammo that decreases when it chooses the "fire" action. How do we batch the training set in that case? Let's say at the start of a game the ammo counter is 100. If the batch size is 32, are we passing the agent 32 state vectors all with ammo=100? Is the model going to learn that "fire" reduces ammo if it sees the same amount of ammo in each state vector in the batch? Or are we effectively making the agent play 32 concurrent games? submitted by /u/chrisname [link] [comments]  ( 41 min )
  • Open

    Self-training with Noisy Student Implementation In PyTorch
    submitted by /u/ABDULKADER90H [link] [comments]  ( 41 min )
    AI goes to the shrink
    AI goes to the shrink: a large language model imagines things then visualized by stable diffusion submitted by /u/sebastianovigna [link] [comments]  ( 41 min )

  • Open

    Is Neurosymbolic AI The Future? I Explore The Potential Of Integrating Prolog and Neural Nets By Creating A Self-Driving Car in GTA
    submitted by /u/YungMixtape2004 [link] [comments]  ( 41 min )
    Random Angry Dude LOSES IT over GPT-4 ChatGPT Breakdown By Krystal Ball and Saagar Enjeti
    submitted by /u/Microsis [link] [comments]  ( 41 min )
    Are there any services to generate "panicked" or "distressed" text-to-speech samples?
    Think of the "terrain, pull up!" warning given in an airplane cockpit. I want something that sounds exactly like that. Synthetic, whilst also sounding scared and panicked. It's for a lil project I'm working on (home security system). Just think it would be awesome for a computer to say "Intruder alert! Engage weapons, engage!" in a similar panicked way. Just to trigger that uncanny valley distress in any intruder. PS I do not have any weapons systems. It would just be so cool for it to say lol. submitted by /u/MannyBobblechops [link] [comments]  ( 41 min )
    I taught GPT-4 how to make memes
    submitted by /u/redditguyjustinp [link] [comments]  ( 41 min )
    Stanford's Alpaca shows that OpenAI may have a problem
    submitted by /u/much_successes [link] [comments]  ( 41 min )
    This FREE A.I. Tool will Change Youtube Forever (AI Dream 192)
    submitted by /u/LordPewPew777 [link] [comments]  ( 41 min )
    How to Create a Video Using Artificial Intelligence – Kaiber
    submitted by /u/webmanpt [link] [comments]  ( 41 min )
    ChatGPT for parsing documents online
    Hi! I'm here to share with you a recent project I was working on. Motivation: I always struggle with manually transcribing my invoices into an excel file, so I wanted to make this automated. That's why I created GPTParser a simple web interface where you can parse any field from any PDF or group of PDFs and then export it as CSV. I'm mainly looking for opinions here, what would you add as a feature to the product to be better? It can process thousands of documents and it's free (just need openAI api key) You can try it here: https://gptparser.com submitted by /u/RepresentativePin198 [link] [comments]  ( 41 min )
    Elon on how OpenAI , a non-profit he donated $100M somehow became a $30B market cap for-profit company
    submitted by /u/GamesAndGlasses [link] [comments]  ( 43 min )
    Photographer Reveals Secret Behind Intimate Portraits
    submitted by /u/webmanpt [link] [comments]  ( 41 min )
    Unleash Your Creativity with Dreamshaper V4 - A new Stable Diffusion model
    submitted by /u/oridnary_artist [link] [comments]  ( 41 min )
    How exactly does AI understand text prompts?
    I've done a few courses in TensorFlow, so I have some knowledge of how neural networks are made and trained. But how does the AI actually compute the textual knowledge that is input via a prompt? Via the same neural networks, or something completely different? I just want to learn more! submitted by /u/Paradoxbuilder [link] [comments]  ( 42 min )
    Dating bot app - Why was this post removed?
    And the account u/f0rchristsakepl is deleted/removed. https://preview.redd.it/vgsmb2yb9coa1.jpg?width=832&format=pjpg&auto=webp&s=c1ccc6279a27f0666aae8121c9bc68994a47ae3a submitted by /u/goodfella311 [link] [comments]  ( 41 min )
    Humata is like ChatGPT for HUGE files with unlimited page processing. Ask AI any question and automatically get the answer from your data. Watch it easily handle 480+ pages of dense technical reading: Big Debt Crises by Ray Dalio.
    submitted by /u/HamletsLastLine [link] [comments]  ( 42 min )
    Ai Made these jokes
    made with https://www.youtube.com/shorts/vM2wO36d5uo chatgpt submitted by /u/Pred1ct99 [link] [comments]  ( 41 min )
    I am currently trying to see how cohesively ChatGpt can write a book this is what I have gotten so far. Pretty decent so far.
    Chapter 1 Jack stood at the edge of a cliff, staring out at the endless blue sky. He couldn't remember how he had gotten here, but he knew one thing for certain - this was not the world he had left behind. He took a deep breath and looked down at his hands. They were glowing with a warm, golden light. He felt a surge of power within him, and he knew that he had been granted magical abilities. With his newfound abilities, Jack decided to explore this strange new world. He began walking, taking in his surroundings as he went. Everywhere he looked, he saw signs of magic - trees that glowed with an otherworldly light, rocks that floated in mid-air, and animals with shimmering, iridescent fur. As he walked, he soon became aware of a sense of danger lurking in the air. He couldn't see anything, …  ( 49 min )
    What happened this week in AI?
    Want a summary of the biggest week in AI (so far)? Here are the 23 biggest highlights of this week ⤵️ 1/ 🦙 Thanks to LLaMA, LLMs are more open than ever before 2/ 🔓 Together releases OpenChatKit 3/ ✍️ Grammarly enters the generative AI space 4/ 🚗 General Motors wants to bring ChatGPT-style assistants to your car 5/ 💰 Microsoft spent hundreds of millions on hardware to get ahead in AI 6/ 🤖 AI changes everything, article by NY Times 7/ 🎉 Meta’s LLaMa 4-bit works on the free Google Colab 8/ 🫡 Microsoft just laid off one of its responsible AI teams 9/ 🤯 GPT-4 is here 10/ 🤖 Google drop their own generative AI tools 11/ 💸 Adept gets a $350m funding round 12/ 🤖 Anthropic AI has made Claude open for all with a waitlist 13/ 📒 Khan Academy launches Khanmigo, a GPT4-powered guide 14/ 💬 Intercom launches Fin, a customer service AI chatbot 15/ 🖼️ Midjourney v5 has been released 16/ 🚀 AssemblyAI’s speech recognition model is the most powerful yet 17/ ⚖️ PwC brings generative AI to lawyers 18/ 💰 The UK gov throws money at the AI industry 19/ 🤖 Microsoft integrates generative AI into their tech with the 365 Copilot 20/ ⚡ Zapier’s Natural Language Actions make LLM APIs way easier to use 21/ 🍏 Apple testing new Siri natural language generating features 22/ 👀 ChatGPT Chrome extension to use AI in any web text boxes 23/ 📖 Stripe announcement with OpenAI is worth mentioning again ​ from Ben's Bites daily newsletter submitted by /u/nocodebcn [link] [comments]  ( 7 min )
    Marc Andreessen: "It's actually a lot of the creative jobs that are most directly hit [by AI], I think, the fastest; and then it's the white-collar jobs that are actually less creative that are hit next ... A lot of the blue-collar jobs are actually really hard to hit."
    submitted by /u/justine01923 [link] [comments]  ( 44 min )
    I created the cheapest Jasper, Copy.ai & Writesonic alternative on the market - 40+ AI templates with unlimited usage
    Generate high-quality and SEO-optimized articles of 2,000+ words or choose from one of the more than 40 templates. From cold emails, to Facebook or Google Ads, to Quora Answers or Website copy. It's a complete toolkit to boost your online-marketing even with a low budget and only very little time. It comes with detailed metrics for optimizing the content like keyword density, reading score and the option to show hundreds of related keywords including the search volume on Google and the CPC. The "Pro-Writer" mode is a special feature which is a real game-changer for a lot of my users, it supports your normal manual writing with the power of AI. You can give direct commands or have the AI write the next paragraph for you. It also includes a free chrome extension which you can use to let AI write in any text input field on any website by adding a "++" to a command or text. For example it can write emails for you in Gmail/Hotmail, a blog post in the Wordpress editor, or an ad copy in the Facebook ads editor. My target audience are people like you and me who don't have the time or money to dedicate to big marketing campaigns, but still want to drive results for their business or website. With the content you can write with my platform, you can really achieve more in the same amount of time. I set up a free trial which is unlimited, so even if you don't plan to become a paid user, you can profit from using it totally unrestricted for a week: https://writeseed.com But be careful, especially the chrome extension is really addicting, I borrowed the laptop of my girlfriend and added ++ to everything out of habit just to realize, I will have to write it manually... submitted by /u/spacpro [link] [comments]  ( 44 min )
    Grammarly For Programmers: Autocorrects code like Grammarly
    submitted by /u/Past_Captain_9058 [link] [comments]  ( 41 min )
    Apple is reportedly experimenting with language-generating AI
    submitted by /u/TallSide7746 [link] [comments]  ( 41 min )
    fml
    submitted by /u/MsNunez [link] [comments]  ( 41 min )
    US Copyright Office: You Can Copyright AI Generated Art With An Addon
    submitted by /u/vadhavaniyafaijan [link] [comments]  ( 6 min )
    Meet Copilot, Microsoft's AI tool for work and productivity
    submitted by /u/Smaug117 [link] [comments]  ( 41 min )
    is there a conversational image generating bot?
    submitted by /u/indiez [link] [comments]  ( 41 min )
    AI and ML: The Future of Technology
    submitted by /u/MadXCreator [link] [comments]  ( 6 min )
  • Open

    [D] Will Chat GPT X replace Software Engineers and if so when ?
    Hello, I'm a newbie at machine learning and I wanted ask the NLP experts here of the possibility in the future that these language models could actually replace software engineers, considering the fact that only experts in this field will be able to answer this question to some degree because they understand the limitations of techs and models. submitted by /u/Far_Bumblebee4260 [link] [comments]  ( 48 min )
    [D] Newbie question about Stanford Alpaca 7b fine-tuning
    Hi, I have a question related to Stanford's newly released model Alpaca. I took the dataset they used to train it and replaced all output fields that were generated by gpt3 (text-davinci-003) with outputs generated by gpt-3.5-turbo (API). When I compared the outputs, the GPT 3.5 were usually a bit longer, and more informative. My question is, if I use this updated data to train Facebook's llama, can I expect better outputs than what Stanford Alpaca achieved? And lastly, if I let's say triple the amount of data and feed it to the Facebook's model, could the responses possibly be close to ChatGPT? submitted by /u/Consistent_Diver_484 [link] [comments]  ( 43 min )
    [R] ViperGPT: Visual Inference via Python Execution for Reasoning
    https://viper.cs.columbia.edu/ Paper - https://arxiv.org/abs/2303.08128 submitted by /u/MysteryInc152 [link] [comments]  ( 43 min )
    [N] Jumpy 1.0 has now been released by the Farama Foundation
    Jumpy 1.0 is now live, and the project is stable and mature. Jumpy is a lightweight project for easily switching between Jax and Numpy functions that can serve as a drop-in replacement for Jax. This allows for writing one codebase that can use either backend, allowing for creating codebases that work with either data structure type or easier debugging of code. This project is already being used in Gymnasium to create environment wrappers that can support both Numpy and Jax-based hardware accelerated environments. We plan to continue improving the project with support for PyTorch functions, all Numpy functions and more functionality to support enabling or disabling different backends You can read the full release notes here: https://github.com/Farama-Foundation/Jumpy/releases/tag/1.0.0 submitted by /u/jkterry1 [link] [comments]  ( 43 min )
    [R] Online AI Game Announcement
    Hi all! I’m a PhD student at Stanford working on foundation models, and thought one of my recent research projects would be of interest to this community. We just released on online game (link in comments) where you collaborate with a text-to-image AI model to create target images and compete with players around the world. Takes only 3 min to play (to get a high score you may need to play longer) and helps us study Human-AI collaboration. New image challenges are released daily. Try out the game and share with your friends! submitted by /u/kailasv10 [link] [comments]  ( 7 min )
    [D] Generate diverse candidates with T5?
    Hey guys, I am trying to generate masked span predictions with T5 to use for teacher student distillation. Because of this method, I need the teacher generate a diverse set of predictions, so the student can be trained to match its distribution. However, even when changing parameters like temperature to obscene values (1000+), the teacher still generates the same things every time, and the temperature value doesn’t seem to affect the generation at all. I have also tried beam search, top p, and others. Any ideas how I can do this? submitted by /u/nodushs3728 [link] [comments]  ( 44 min )
    [D] instruction tuning : what should I read?
    I think I have a decent grasp on transformers, LLMs, prompting, one/few shot learning, fine-tuning. But till now I haven't studied instruction fine tuning and the technique has outgrown my expectations. Where should I start reading about it? Do you know any good literature review article to suggest ? submitted by /u/bubudumbdumb [link] [comments]  ( 43 min )
    [D] An Instruct Version Of GPT-J Using Stanford Alpaca's Dataset
    I just released an instruct version of GPT-J using Stanford Alpaca's dataset.The result of this experiment is very cool and confirms that, when fine-tuned on the right data, GPT-J is a very powerful AI model!You can download the model from the HuggingFace hub: https://huggingface.co/nlpcloud/instruct-gpt-j-fp16 Here is an example: from transformers import pipeline import torch generator = pipeline(model="nlpcloud/instruct-gpt-j-fp16", torch_dtype=torch.float16, device=0) prompt = "Correct spelling and grammar from the following text.\nI do not wan to go\n" print(generator(prompt)) More details about this experiment here: https://nlpcloud.com/instruct-version-of-gpt-j-using-stanford-alpaca-dataset.html I hope it will be useful! Please don't hesitate to share some feedbacks! Julien submitted by /u/juliensalinas [link] [comments]  ( 44 min )
    [D] ACL 2023 paper reviews.
    The reviews for ACL 2023 papers are expected to be released soon, and this post aims to start a conversation about the same. Let's share our thoughts and feelings about the joys and pains of paper reviews! submitted by /u/mayanknagda [link] [comments]  ( 45 min )
    [D] PyTorch 2.0 Native Flash Attention 32k Context Window
    Hi, I did a quick experiment with Pytorch 2.0 Native scaled_dot_product_attention. I was able to a single forward pass within 9GB of memory which is astounding. I think by patching existing Pretrained GPT models and adding more positional encodings, one could easily fine-tune those models to 32k attention on a single A100 80GB. Here is the code I used: ​ https://preview.redd.it/6csxe28lv9oa1.png?width=607&format=png&auto=webp&s=ff8b48a77f49fab7d088fd8ba220f720860249bc I think it should be possible to replicate even GPT-4 with open source tools something like Bloom + FlashAttention & fine-tune on 32k tokens. Update: I was successfully able to start the training of GPT-2 (125M) with a context size of 8k and batch size of 1 on a 16GB GPU. Since memory scaled linearly from 4k to 8k. I am …  ( 49 min )
    [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM)
    I try the "Alpaca prompt" on RWKV 14B ctx8192, and to my surprise it works out of box without any finetuning (RWKV is a 100% RNN trained on 100% Pile v1 and nothing else): https://preview.redd.it/fciatottq7oa1.png?width=1046&format=png&auto=webp&s=f88304a77b09e367e8b9812ba4b841e028481645 You are welcome to try it in RWKV 14B Gradio (click examples below the panel): https://huggingface.co/spaces/BlinkDL/ChatRWKV-gradio Tips: try "Expert Response" or "Expert Long Response" or "Expert Full Response" too. https://preview.redd.it/qo71b85vq7oa1.png?width=2516&format=png&auto=webp&s=5d4467ba4bbc9016839760b3f3873f06c8b4bc6f ChatRWKV v2 is now using a CUDA kernel to optimize INT8 inference (23 token/s on 3090): https://github.com/BlinkDL/ChatRWKV Upgrade to latest code and "pip install rwkv --upgrade" to 0.5.0, and set os.environ["RWKV_CUDA_ON"] = '1' in v2/chat.py to enjoy the speed. The inference speed (and VRAM consumption) of RWKV is independent of ctxlen, because it's an RNN (note: currently the preprocessing of a long prompt takes more VRAM but that can be optimized because we can process in chunks). Meanwhile I find the latest RWKV-4-Pile-14B-20230313-ctx8192-test1050 model can utilize a long ctx: https://preview.redd.it/a68dw0hzq7oa1.png?width=398&format=png&auto=webp&s=80570ccc844fa31efa1282d5b2106b9986e35b5a submitted by /u/bo_peng [link] [comments]  ( 47 min )
    LLMs are getting much cheaper — business impact? [D]
    Saw this out of Stanford. Apologies if it’s been shared here already. We introduce Alpaca 7B, a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. On our preliminary evaluation of single-turn instruction following, Alpaca behaves qualitatively similarly to OpenAI’s text-davinci-003, while being surprisingly small and easy/cheap to reproduce (<600$). Basically, starting w an open source Meta 7B LLaMa model, they recruited GPT-3.5 to use for self-instruct training (as opposed to RLHF) and were able to produce a model that behaved similar to GPT-3.5. Amazingly, the process only took few weeks and $600 in compute cost. Any thoughts on how such low cost to train/deploy LLMs could affect companies like AMD, Nvidia and Intel etc? This seems like new idiom of AI tech and trying to wrap my head around CPU/GPU demand implications given the apparent orders of magnitude training cost reduction. Link: https://crfm.stanford.edu/2023/03/13/alpaca.html submitted by /u/DamnMyAPGoinCrazy [link] [comments]  ( 50 min )
    training KGE model [Project]
    I have knowledge graph : 24 relationships, 11 entities , > 20K facts (rows). What I need is the embedding for only one entity, out of those 11. Once the training is completed, I will extract those embeddings and use them to train a separate GNN model. My idea was to over-fit the KGE model and use all data for training. Given my use case I don't see why a test set is needed. Once the model is trained, I will evaluate it on the train set, if MRR/ Hits@10 are good, I would extract the embedding and mode forward. If not, I will test a different model and iterate. Am I doing something stupid ? submitted by /u/Amazing_Alarm6130 [link] [comments]  ( 44 min )
  • Open

    Meta Pseudo Labels (MPL) Algorithm
    submitted by /u/ABDULKADER90H [link] [comments]  ( 41 min )
    New AI builds or recreates entire Wordpress websites (including all unique text copy and images) in 2 minutes using just a text description of your business or a URL
    submitted by /u/HastyNationality [link] [comments]  ( 6 min )
    Any Advice?
    Hello, I'm an amateur software nerd with very little experience... I've been dreaming of making my own AI for a while now, just one that I can have simple conversations with. I was wondering if anyone knows of any code I can copy paste to simplify this? Or any advice on how to even get started on a project like this. submitted by /u/Kawwunyeko [link] [comments]  ( 41 min )
  • Open

    Intelligently search your organization’s Microsoft Teams data source with the Amazon Kendra connector for Microsoft Teams
    Organizations use messaging platforms like Microsoft Teams to bring the right people together to securely communicate with each other and collaborate to get work done. Microsoft Teams captures invaluable organizational knowledge in the form of the information that flows through it as users collaborate. However, making this knowledge easily and securely available to users can […]  ( 9 min )
  • Open

    Vid2Seq: a pretrained visual language model for describing multi-event videos
    Posted by Antoine Yang, Student Researcher, and Arsha Nagrani, Research Scientist, Google Research, Perception team Videos have become an increasingly important part of our daily lives, spanning fields such as entertainment, education, and communication. Understanding the content of videos, however, is a challenging task as videos often contain multiple events occurring at different time scales. For example, a video of a musher hitching up dogs to a dog sled before they all race away involves a long event (the dogs pulling the sled) and a short event (the dogs being hitched to the sled). One way to spur research in video understanding is via the task of dense video captioning, which consists of temporally localizing and describing all events in a minutes-long video. This differs from s…  ( 91 min )
  • Open

    Khan Academy Built Guardrails Around GPT-4. Are they enough?
    The same day OpenAI released Generative Pre-Trained Transformer 4 (GPT-4), online education tools nonprofit Khan Academy announced that GPT-4 had already been added to the organization’s interactive tutoring system. Founder Sal Khan acknowledged the shortfalls of large language models (LLMs) that Data Science Central readers are well familiar with. But he confirmed the nonprofit’s commitment… Read More »Khan Academy Built Guardrails Around GPT-4. Are they enough? The post Khan Academy Built Guardrails Around GPT-4. Are they enough? appeared first on Data Science Central.  ( 21 min )
  • Open

    Agent not learning? 3 problems and solutions
    When building a custom environment, I ran into several design issues. Now with 20/20 hindsight I can see what were the symptoms and clues. I was using A2C, but I think will apply to any algorithm that uses a neural network for reinforcement learning. Symptom #1: Model is not training and agent is always taking the same action I noticed my agent was doing the same action over and over again. It was running in straight lines, or turning on the spot. When I looked at the logits from the softmax of the policy output, sure enough it was output 100% for a single action. Problem: Large observation resulted in large action, which got softmax to 1 Neural networks are simple things. Numbers in, numbers out. Large magnitude numbers in, large magnitude numbers out. What I was doing wrong was I w…  ( 43 min )
    Jumpy 1.0 has now been released by the Farama Foundation
    Jumpy 1.0 is now live, and the project is stable and mature. ​ Jumpy is a lightweight project for easily switching between Jax and Numpy functions that can serve as a drop-in replacement for Jax. This allows for writing one codebase that can use either backend, allowing for creating codebases that work with either data structure type or easier debugging of code. This project is already being used in Gymnasium to create environment wrappers that can support both Numpy and Jax-based hardware accelerated environments. We plan to continue improving the project with support for PyTorch functions, all Numpy functions and more functionality to support enabling or disabling different backends You can read the full release notes here: https://github.com/Farama-Foundation/Jumpy/releases/tag/1.0.0 submitted by /u/jkterry1 [link] [comments]  ( 42 min )
    Why is that in RL almost everything is done with PyTorch?
    Hello, I'm new in the field, and I was wondering why almost all the code I find about RL is in PyTorch? Just curious. submitted by /u/mr_house7 [link] [comments]  ( 44 min )
    Variance Regularization in Offline Bandit Learning
    Hi guys, I am reading this paper: https://arxiv.org/pdf/1502.02362.pdf, where they propose an offline learning method for learning from bandit feedback data. The main objective function is expected reward, with importance sampling to correct for distribution mismatch, basically an off-policy learning objective. The main contribution of the paper is an addition of sample variance term in the off-policy objective to explicitly control for the variance of the IPS estimator (known to have high variance). I am stuck in Section 5.1, proposition 1 of the paper, where they simplify the variance term using taylor series approximation. I am not sure how the taylor series expansion is done here. From what I understand, taylor series approximation gives a linear (for first-order approx) approximation of the function. In prop 1, they are expanding it w.r.t w, so there should be 'w' terms in the expansion, but there are not. Can anyone help me understand the derivation of the expansion? submitted by /u/noob_simp_phd [link] [comments]  ( 42 min )
    Why there is a huge difference between MuJoCo environment random initializations?
    I am running some RL experiments with MuJoCo HopperAnd I found there is a huge difference between my training and evaluation episode rewards. My training and evaluation environments are set with different random seeds. Intuitionally I would say it is due to overfitting, however, the training episode rewards are very stable around 3.3K, whereas the evaluation episodes are around 1.8K consistently. Is there any problem with the environment itself, or is just my model overfitting too much? submitted by /u/Blasphemer666 [link] [comments]  ( 42 min )
    Understanding CQL(H) equation derivation in Conservative Q-Learning paper
    Hello all! I was wondering if anyone might be able to walk me through how in the CQL paper we go from equation 3: https://preview.redd.it/gul1e0lpa7oa1.png?width=791&format=png&auto=webp&s=47743e24ab8c2d32abbda0cc60889a7d4d7a386f to equation 4: https://preview.redd.it/cej6gaiua7oa1.png?width=810&format=png&auto=webp&s=604a34f973f7bf0ce2912ac0f683ef9aee004ba5 Conceptually I think I understand what's happening: this variant of CQL is doing maximum entropy regularization. However I'm missing a bit how we're ending up with the log_sum_exp function; I think the exponential is coming from essentially that we're taking the softmax of the Q functions as part of the KL divergence (where the normalization constant is removed). Would anyone be able to walk me through how exactly we get from eq. 3 to eq. 4 given the max entropy formulation? Thanks in advance! submitted by /u/1cedrake [link] [comments]  ( 7 min )
  • Open

    Fixed points of the Fourier transform
    This previous post looked at the hyperbolic secant distribution. This distribution has density and characteristic function sech(t). It’s curious that the density and characteristic function are so similar. The characteristic function is essentially the Fourier transform of the density function, so this says that the hyperbolic secant function, properly scaled, is a fixed point of […] Fixed points of the Fourier transform first appeared on John D. Cook.  ( 5 min )
    Hyperbolic secant distribution
    I hadn’t run into the hyperbolic secant distribution until I saw a paper by Peng Ding [1] recently. If C is a standard Cauchy random variable, then (2/π) log |C| has a hyperbolic secant distribution. Three applications of this distribution are given in [1]. Ding’s paper contains a plot comparing the density functions for the hyperbolic […] Hyperbolic secant distribution first appeared on John D. Cook.  ( 5 min )
  • Open

    What is MultiModal in AI?
    The multimodal model is an important concept in the field of artificial intelligence that refers to the integration of multiple modes of…  ( 5 min )
    Designing great AI products — Personality and emotion
    We tend to impute AI with human-like qualities. However, choosing to give your AI system a personality has its advantages and…  ( 18 min )
    The Ethics of AI: How Can We Ensure its Responsible Use?
    As artificial intelligence (AI) continues to advance and become more pervasive in our daily lives, it is crucial that we consider the…  ( 7 min )
    AI and Art: How Artists are Using Artificial Intelligence to Create New Forms of Art?
    Artificial Intelligence (AI) has transformed the way we live, work, and communicate, and it is now playing a significant role in the art…  ( 7 min )
    AI-Based Voice Assistance Capabilities
    Think of software as a self-sustaining unit capable of connecting seamlessly with the human mind, and your first thought is the virtual…  ( 11 min )
  • Open

    GPTs are GPTs: An early look at the labor market impact potential of large language models
    No content preview  ( 1 min )
  • Open

    FedPH: Privacy-enhanced Heterogeneous Federated Learning. (arXiv:2301.11705v2 [cs.LG] UPDATED)
    Federated Learning is a distributed machine-learning environment that allows clients to learn collaboratively without sharing private data. This is accomplished by exchanging parameters. However, the differences in data distributions and computing resources among clients make related studies difficult. To address these heterogeneous problems, we propose a novel Federated Learning method. Our method utilizes a pre-trained model as the backbone of the local model, with fully connected layers comprising the head. The backbone extracts features for the head, and the embedding vector of classes is shared between clients to improve the head and enhance the performance of the local model. By sharing the embedding vector of classes instead of gradient-based parameters, clients can better adapt to private data, and communication between the server and clients is more effective. To protect privacy, we propose a privacy-preserving hybrid method that adds noise to the embedding vector of classes. This method has a minimal effect on the performance of the local model when differential privacy is met. We conduct a comprehensive evaluation of our approach on a self-built vehicle dataset, comparing it with other Federated Learning methods under non-independent identically distributed(Non-IID).
    Web and Mobile Platforms for Managing Elections based on IoT And Machine Learning Algorithms. (arXiv:2303.09045v1 [cs.LG])
    The global pandemic situation has severely affected all countries. As a result, almost all countries had to adjust to online technologies to continue their processes. In addition, Sri Lanka is yearly spending ten billion on elections. We have examined a proper way of minimizing the cost of hosting these events online. To solve the existing problems and increase the time potency and cost reduction we have used IoT and ML-based technologies. IoT-based data will identify, register, and be used to secure from fraud, while ML algorithms manipulate the election data and produce winning predictions, weather-based voters attendance, and election violence. All the data will be saved in cloud computing and a standard database to store and access the data. This study mainly focuses on four aspects of an E-voting system. The most frequent problems across the world in E-voting are the security, accuracy, and reliability of the systems. E-government systems must be secured against various cyber-attacks and ensure that only authorized users can access valuable, and sometimes sensitive information. Being able to access a system without passwords but using biometric details has been there for a while now, however, our proposed system has a different approach to taking the credentials, processing, and combining the images, reformatting and producing the output, and tracking. In addition, we ensure to enhance e-voting safety. While ML-based algorithms use different data sets and provide predictions in advance.
    Sequential Gaussian Processes for Online Learning of Nonstationary Functions. (arXiv:1905.10003v4 [stat.ML] UPDATED)
    Many machine learning problems can be framed in the context of estimating functions, and often these are time-dependent functions that are estimated in real-time as observations arrive. Gaussian processes (GPs) are an attractive choice for modeling real-valued nonlinear functions due to their flexibility and uncertainty quantification. However, the typical GP regression model suffers from several drawbacks: 1) Conventional GP inference scales $O(N^{3})$ with respect to the number of observations; 2) Updating a GP model sequentially is not trivial; and 3) Covariance kernels typically enforce stationarity constraints on the function, while GPs with non-stationary covariance kernels are often intractable to use in practice. To overcome these issues, we propose a sequential Monte Carlo algorithm to fit infinite mixtures of GPs that capture non-stationary behavior while allowing for online, distributed inference. Our approach empirically improves performance over state-of-the-art methods for online GP estimation in the presence of non-stationarity in time-series data. To demonstrate the utility of our proposed online Gaussian process mixture-of-experts approach in applied settings, we show that we can sucessfully implement an optimization algorithm using online Gaussian process bandits.
    Perceptual Grouping in Contrastive Vision-Language Models. (arXiv:2210.09996v2 [cs.CV] UPDATED)
    Recent advances in zero-shot image recognition suggest that vision-language models learn generic visual representations with a high degree of semantic information that may be arbitrarily probed with natural language phrases. Understanding an image, however, is not just about understanding what content resides within an image, but importantly, where that content resides. In this work we examine how well vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery. We demonstrate how contemporary vision and language representation learning models based on contrastive losses and large web-based data capture limited object localization information. We propose a minimal set of modifications that results in models that uniquely learn both semantic and spatial information. We measure this performance in terms of zero-shot image recognition, unsupervised bottom-up and top-down semantic segmentations, as well as robustness analyses. We find that the resulting model achieves state-of-the-art results in terms of unsupervised segmentation, and demonstrate that the learned representations are uniquely robust to spurious correlations in datasets designed to probe the causal behavior of vision models.
    VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners. (arXiv:2212.04979v3 [cs.CV] UPDATED)
    We explore an efficient approach to establish a foundational video-text model. We present VideoCoCa that maximally reuses a pretrained image-text contrastive captioner (CoCa) model and adapt it to video-text tasks with minimal extra training. While previous works adapt image-text models with various cross-frame fusion modules, we find that the generative attentional pooling and contrastive attentional pooling layers in CoCa are instantly adaptable to flattened frame embeddings, yielding state-of-the-art results on zero-shot video classification and zero-shot text-to-video retrieval. Furthermore, we explore lightweight finetuning on top of VideoCoCa, and achieve strong results on video question-answering and video captioning.
    Model Parameter Identification via a Hyperparameter Optimization Scheme for Autonomous Racing Systems. (arXiv:2301.01470v3 [cs.RO] UPDATED)
    In this letter, we propose a model parameter identification method via a hyperparameter optimization scheme (MIHO). Our method adopts an efficient explore-exploit strategy to identify the parameters of dynamic models in a data-driven optimization manner. We utilize MIHO for model parameter identification of the AV-21, a full-scaled autonomous race vehicle. We then incorporate the optimized parameters for the design of model-based planning and control systems of our platform. In experiments, MIHO exhibits more than 13 times faster convergence than traditional parameter identification methods. Furthermore, the parametric models learned via MIHO demonstrate good fitness to the given datasets and show generalization ability in unseen dynamic scenarios. We further conduct extensive field tests to validate our model-based system, demonstrating stable obstacle avoidance and high-speed driving up to 217 km/h at the Indianapolis Motor Speedway and Las Vegas Motor Speedway. The source code for MIHO and videos of the tests are available at https://github.com/hynkis/MIHO.
    Learning Vortex Dynamics for Fluid Inference and Prediction. (arXiv:2301.11494v3 [cs.LG] UPDATED)
    We propose a novel differentiable vortex particle (DVP) method to infer and predict fluid dynamics from a single video. Lying at its core is a particle-based latent space to encapsulate the hidden, Lagrangian vortical evolution underpinning the observable, Eulerian flow phenomena. Our differentiable vortex particles are coupled with a learnable, vortex-to-velocity dynamics mapping to effectively capture the complex flow features in a physically-constrained, low-dimensional space. This representation facilitates the learning of a fluid simulator tailored to the input video that can deliver robust, long-term future predictions. The value of our method is twofold: first, our learned simulator enables the inference of hidden physics quantities (e.g., velocity field) purely from visual observation; secondly, it also supports future prediction, constructing the input video's sequel along with its future dynamics evolution. We compare our method with a range of existing methods on both synthetic and real-world videos, demonstrating improved reconstruction quality, visual plausibility, and physical integrity.
    On Neural Architectures for Deep Learning-based Source Separation of Co-Channel OFDM Signals. (arXiv:2303.06438v2 [eess.SP] UPDATED)
    We study the single-channel source separation problem involving orthogonal frequency-division multiplexing (OFDM) signals, which are ubiquitous in many modern-day digital communication systems. Related efforts have been pursued in monaural source separation, where state-of-the-art neural architectures have been adopted to train an end-to-end separator for audio signals (as 1-dimensional time series). In this work, through a prototype problem based on the OFDM source model, we assess -- and question -- the efficacy of using audio-oriented neural architectures in separating signals based on features pertinent to communication waveforms. Perhaps surprisingly, we demonstrate that in some configurations, where perfect separation is theoretically attainable, these audio-oriented neural architectures perform poorly in separating co-channel OFDM waveforms. Yet, we propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures, that can confer about 30 dB improvement in performance.
    Enhancing COVID-19 Severity Analysis through Ensemble Methods. (arXiv:2303.07130v2 [eess.IV] UPDATED)
    Computed Tomography (CT) scans provide a detailed image of the lungs, allowing clinicians to observe the extent of damage caused by COVID-19. The CT severity score (CTSS) based scoring method is used to identify the extent of lung involvement observed on a CT scan. This paper presents a domain knowledge-based pipeline for extracting regions of infection in COVID-19 patients using a combination of image-processing algorithms and a pre-trained UNET model. The severity of the infection is then classified into different categories using an ensemble of three machine-learning models: Extreme Gradient Boosting, Extremely Randomized Trees, and Support Vector Machine. The proposed system was evaluated on a validation dataset in the AI-Enabled Medical Image Analysis Workshop and COVID-19 Diagnosis Competition (AI-MIA-COV19D) and achieved a macro F1 score of 64\%. These results demonstrate the potential of combining domain knowledge with machine learning techniques for accurate COVID-19 diagnosis using CT scans. The implementation of the proposed system for severity analysis is available at \textit{https://github.com/aanandt/Enhancing-COVID-19-Severity-Analysis-through-Ensemble-Methods.git }
    Neural Networks Efficiently Learn Low-Dimensional Representations with SGD. (arXiv:2209.14863v2 [stat.ML] UPDATED)
    We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input $\boldsymbol{x}\in \mathbb{R}^d$ is Gaussian and the target $y \in \mathbb{R}$ follows a multiple-index model, i.e., $y=g(\langle\boldsymbol{u_1},\boldsymbol{x}\rangle,...,\langle\boldsymbol{u_k},\boldsymbol{x}\rangle)$ with a noisy link function $g$. We prove that the first-layer weights of the NN converge to the $k$-dimensional principal subspace spanned by the vectors $\boldsymbol{u_1},...,\boldsymbol{u_k}$ of the true model, when online SGD with weight decay is used for training. This phenomenon has several important consequences when $k \ll d$. First, by employing uniform convergence on this smaller subspace, we establish a generalization error bound of $O(\sqrt{{kd}/{T}})$ after $T$ iterations of SGD, which is independent of the width of the NN. We further demonstrate that, SGD-trained ReLU NNs can learn a single-index target of the form $y=f(\langle\boldsymbol{u},\boldsymbol{x}\rangle) + \epsilon$ by recovering the principal direction, with a sample complexity linear in $d$ (up to log factors), where $f$ is a monotonic function with at most polynomial growth, and $\epsilon$ is the noise. This is in contrast to the known $d^{\Omega(p)}$ sample requirement to learn any degree $p$ polynomial in the kernel regime, and it shows that NNs trained with SGD can outperform the neural tangent kernel at initialization. Finally, we also provide compressibility guarantees for NNs using the approximate low-rank structure produced by SGD.
    AR3n: A Reinforcement Learning-based Assist-As-Needed Controller for Robotic Rehabilitation. (arXiv:2303.00085v2 [cs.RO] UPDATED)
    In this paper, we present AR3n (pronounced as Aaron), an assist-as-needed (AAN) controller that utilizes reinforcement learning to supply adaptive assistance during a robot assisted handwriting rehabilitation task. Unlike previous AAN controllers, our method does not rely on patient specific controller parameters or physical models. We propose the use of a virtual patient model to generalize AR3n across multiple subjects. The system modulates robotic assistance in realtime based on a subject's tracking error, while minimizing the amount of robotic assistance. The controller is experimentally validated through a set of simulations and human subject experiments. Finally, a comparative study with a traditional rule-based controller is conducted to analyze differences in assistance mechanisms of the two controllers.
    Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis. (arXiv:2211.02408v2 [cs.LG] UPDATED)
    While text-to-image synthesis currently enjoys great popularity among researchers and the general public, the security of these models has been neglected so far. Many text-guided image generation models rely on pre-trained text encoders from external sources, and their users trust that the retrieved models will behave as promised. Unfortunately, this might not be the case. We introduce backdoor attacks against text-guided generative models and demonstrate that their text encoders pose a major tampering risk. Our attacks only slightly alter an encoder so that no suspicious model behavior is apparent for image generations with clean prompts. By then inserting a single character trigger into the prompt, e.g., a non-Latin character or emoji, the adversary can trigger the model to either generate images with pre-defined attributes or images following a hidden, potentially malicious description. We empirically demonstrate the high effectiveness of our attacks on Stable Diffusion and highlight that the injection process of a single backdoor takes less than two minutes. Besides phrasing our approach solely as an attack, it can also force an encoder to forget phrases related to certain concepts, such as nudity or violence, and help to make image generation safer.
    Toroidal Coordinates: Decorrelating Circular Coordinates With Lattice Reduction. (arXiv:2212.07201v2 [cs.CG] UPDATED)
    The circular coordinates algorithm of de Silva, Morozov, and Vejdemo-Johansson takes as input a dataset together with a cohomology class representing a $1$-dimensional hole in the data; the output is a map from the data into the circle that captures this hole, and that is of minimum energy in a suitable sense. However, when applied to several cohomology classes, the output circle-valued maps can be "geometrically correlated" even if the chosen cohomology classes are linearly independent. It is shown in the original work that less correlated maps can be obtained with suitable integer linear combinations of the cohomology classes, with the linear combinations being chosen by inspection. In this paper, we identify a formal notion of geometric correlation between circle-valued maps which, in the Riemannian manifold case, corresponds to the Dirichlet form, a bilinear form derived from the Dirichlet energy. We describe a systematic procedure for constructing low energy torus-valued maps on data, starting from a set of linearly independent cohomology classes. We showcase our procedure with computational examples. Our main algorithm is based on the Lenstra--Lenstra--Lov\'asz algorithm from computational number theory.
    Multi-Resolution Online Deterministic Annealing: A Hierarchical and Progressive Learning Architecture. (arXiv:2212.08189v2 [cs.LG] UPDATED)
    Hierarchical learning algorithms that gradually approximate a solution to a data-driven optimization problem are essential to decision-making systems, especially under limitations on time and computational resources. In this study, we introduce a general-purpose hierarchical learning architecture that is based on the progressive partitioning of a possibly multi-resolution data space. The optimal partition is gradually approximated by solving a sequence of optimization sub-problems that yield a sequence of partitions with increasing number of subsets. We show that the solution of each optimization problem can be estimated online using gradient-free stochastic approximation updates. As a consequence, a function approximation problem can be defined within each subset of the partition and solved using the theory of two-timescale stochastic approximation algorithms. This simulates an annealing process and defines a robust and interpretable heuristic method to gradually increase the complexity of the learning architecture in a task-agnostic manner, giving emphasis to regions of the data space that are considered more important according to a predefined criterion. Finally, by imposing a tree structure in the progression of the partitions, we provide a means to incorporate potential multi-resolution structure of the data space into this approach, significantly reducing its complexity, while introducing hierarchical variable-rate feature extraction properties similar to certain classes of deep learning architectures. Asymptotic convergence analysis and experimental results are provided for supervised and unsupervised learning problems.
    From Node Interaction to Hop Interaction: New Effective and Scalable Graph Learning Paradigm. (arXiv:2211.11761v2 [cs.LG] UPDATED)
    Existing Graph Neural Networks (GNNs) follow the message-passing mechanism that conducts information interaction among nodes iteratively. While considerable progress has been made, such node interaction paradigms still have the following limitation. First, the scalability limitation precludes the broad application of GNNs in large-scale industrial settings since the node interaction among rapidly expanding neighbors incurs high computation and memory costs. Second, the over-smoothing problem restricts the discrimination ability of nodes, i.e., node representations of different classes will converge to indistinguishable after repeated node interactions. In this work, we propose a novel hop interaction paradigm to address these limitations simultaneously. The core idea is to convert the interaction target among nodes to pre-processed multi-hop features inside each node. We design a simple yet effective HopGNN framework that can easily utilize existing GNNs to achieve hop interaction. Furthermore, we propose a multi-task learning strategy with a self-supervised learning objective to enhance HopGNN. We conduct extensive experiments on 12 benchmark datasets in a wide range of domains, scales, and smoothness of graphs. Experimental results show that our methods achieve superior performance while maintaining high scalability and efficiency. The code is at https://github.com/JC-202/HopGNN.
    Open-Set Representation Learning through Combinatorial Embedding. (arXiv:2106.15278v3 [cs.CV] UPDATED)
    Visual recognition tasks are often limited to dealing with a small subset of classes simply because the labels for the remaining classes are unavailable. We are interested in identifying novel concepts in a dataset through representation learning based on both labeled and unlabeled examples, and extending the horizon of recognition to both known and novel classes. To address this challenging task, we propose a combinatorial learning approach, which naturally clusters the examples in unseen classes using the compositional knowledge given by multiple supervised meta-classifiers on heterogeneous label spaces. The representations given by the combinatorial embedding are made more robust by unsupervised pairwise relation learning. The proposed algorithm discovers novel concepts via a joint optimization for enhancing the discrimitiveness of unseen classes as well as learning the representations of known classes generalizable to novel ones. Our extensive experiments demonstrate remarkable performance gains by the proposed approach on public datasets for image retrieval and image categorization with novel class discovery.
    Laplacian2Mesh: Laplacian-Based Mesh Understanding. (arXiv:2202.00307v2 [cs.CV] UPDATED)
    Geometric deep learning has sparked a rising interest in computer graphics to perform shape understanding tasks, such as shape classification and semantic segmentation. When the input is a polygonal surface, one has to suffer from the irregular mesh structure. Motivated by the geometric spectral theory, we introduce Laplacian2Mesh, a novel and flexible convolutional neural network (CNN) framework for coping with irregular triangle meshes (vertices may have any valence). By mapping the input mesh surface to the multi-dimensional Laplacian-Beltrami space, Laplacian2Mesh enables one to perform shape analysis tasks directly using the mature CNNs, without the need to deal with the irregular connectivity of the mesh structure. We further define a mesh pooling operation such that the receptive field of the network can be expanded while retaining the original vertex set as well as the connections between them. Besides, we introduce a channel-wise self-attention block to learn the individual importance of feature ingredients. Laplacian2Mesh not only decouples the geometry from the irregular connectivity of the mesh structure but also better captures the global features that are central to shape classification and segmentation. Extensive tests on various datasets demonstrate the effectiveness and efficiency of Laplacian2Mesh, particularly in terms of the capability of being vulnerable to noise to fulfill various learning tasks.
    Corrupted Image Modeling for Self-Supervised Visual Pre-Training. (arXiv:2202.03382v2 [cs.CV] UPDATED)
    We introduce Corrupted Image Modeling (CIM) for self-supervised visual pre-training. CIM uses an auxiliary generator with a small trainable BEiT to corrupt the input image instead of using artificial [MASK] tokens, where some patches are randomly selected and replaced with plausible alternatives sampled from the BEiT output distribution. Given this corrupted image, an enhancer network learns to either recover all the original image pixels, or predict whether each visual token is replaced by a generator sample or not. The generator and the enhancer are simultaneously trained and synergistically updated. After pre-training, the enhancer can be used as a high-capacity visual encoder for downstream tasks. CIM is a general and flexible visual pre-training framework that is suitable for various network architectures. For the first time, CIM demonstrates that both ViT and CNN can learn rich visual representations using a unified, non-Siamese framework. Experimental results show that our approach achieves compelling results in vision benchmarks, such as ImageNet classification and ADE20K semantic segmentation.
    Direct-Effect Risk Minimization for Domain Generalization. (arXiv:2211.14594v4 [cs.LG] UPDATED)
    We study the problem of out-of-distribution (o.o.d.) generalization where spurious correlations of attributes vary across training and test domains. This is known as the problem of correlation shift and has posed concerns on the reliability of machine learning. In this work, we introduce the concepts of direct and indirect effects from causal inference to the domain generalization problem. We argue that models that learn direct effects minimize the worst-case risk across correlation-shifted domains. To eliminate the indirect effects, our algorithm consists of two stages: in the first stage, we learn an indirect-effect representation by minimizing the prediction error of domain labels using the representation and the class labels; in the second stage, we remove the indirect effects learned in the first stage by matching each data with another data of similar indirect-effect representation but of different class labels in the training and validation phase. Our approach is shown to be compatible with existing methods and improve the generalization performance of them on correlation-shifted datasets. Experiments on 5 correlation-shifted datasets and the DomainBed benchmark verify the effectiveness of our approach.
    Human-Guided Fair Classification for Natural Language Processing. (arXiv:2212.10154v2 [cs.CL] UPDATED)
    Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between human intuition about these perturbations and the formal similarity specifications capturing them. While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully align with human intuition (e.g., in cases of asymmetric counterfactuals). This work proposes novel methods for bridging this gap by discovering expressive and intuitive individual fairness specifications. We show how to leverage unsupervised style transfer and GPT-3's zero-shot capabilities to automatically generate expressive candidate pairs of semantically similar sentences that differ along sensitive attributes. We then validate the generated pairs via an extensive crowdsourcing study, which confirms that a lot of these pairs align with human intuition about fairness in the context of toxicity classification. Finally, we show how limited amounts of human feedback can be leveraged to learn a similarity specification that can be used to train downstream fairness-aware models.
    Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration. (arXiv:2211.02397v2 [eess.AS] UPDATED)
    Diffusion-based generative models have had a high impact on the computer vision and speech processing communities these past years. Besides data generation tasks, they have also been employed for data restoration tasks like speech enhancement and dereverberation. While discriminative models have traditionally been argued to be more powerful e.g. for speech enhancement, generative diffusion approaches have recently been shown to narrow this performance gap considerably. In this paper, we systematically compare the performance of generative diffusion models and discriminative approaches on different speech restoration tasks. For this, we extend our prior contributions on diffusion-based speech enhancement in the complex time-frequency domain to the task of bandwith extension. We then compare it to a discriminatively trained neural network with the same network architecture on three restoration tasks, namely speech denoising, dereverberation and bandwidth extension. We observe that the generative approach performs globally better than its discriminative counterpart on all tasks, with the strongest benefit for non-additive distortion models, like in dereverberation and bandwidth extension. Code and audio examples can be found online at https://uhh.de/inf-sp-sgmsemultitask
    Hierarchical Policy Blending As Optimal Transport. (arXiv:2212.01938v2 [cs.RO] UPDATED)
    We present hierarchical policy blending as optimal transport (HiPBOT). This hierarchical framework adapts the weights of low-level reactive expert policies, adding a look-ahead planning layer on the parameter space of a product of expert policies and agents. Our high-level planner realizes a policy blending via unbalanced optimal transport, consolidating the scaling of underlying Riemannian motion policies, effectively adjusting their Riemannian matrix, and deciding over the priorities between experts and agents, guaranteeing safety and task success. Our experimental results in a range of application scenarios from low-dimensional navigation to high-dimensional whole-body control showcase the efficacy and efficiency of HiPBOT, which outperforms state-of-the-art baselines that either perform probabilistic inference or define a tree structure of experts, paving the way for new applications of optimal transport to robot control. More material at https://sites.google.com/view/hipobot
    Empirical Study of Named Entity Recognition Performance Using Distribution-aware Word Embedding. (arXiv:2109.01636v2 [cs.CL] UPDATED)
    With the fast development of Deep Learning techniques, Named Entity Recognition (NER) is becoming more and more important in the information extraction task. The greatest difficulty that the NER task faces is to keep the detectability even when types of NE and documents are unfamiliar. Realizing that the specificity information may contain potential meanings of a word and generate semantic-related features for word embedding, we develop a distribution-aware word embedding and implement three different methods to make use of the distribution information in a NER framework. And the result shows that the performance of NER will be improved if the word specificity is incorporated into existing NER methods.
    Natural Language Processing Methods to Identify Oncology Patients at High Risk for Acute Care with Clinical Notes. (arXiv:2209.13860v2 [cs.CL] UPDATED)
    Clinical notes are an essential component of a health record. This paper evaluates how natural language processing (NLP) can be used to identify the risk of acute care use (ACU) in oncology patients, once chemotherapy starts. Risk prediction using structured health data (SHD) is now standard, but predictions using free-text formats are complex. This paper explores the use of free-text notes for the prediction of ACU instead of SHD. Deep Learning models were compared to manually engineered language features. Results show that SHD models minimally outperform NLP models; an l1-penalised logistic regression with SHD achieved a C-statistic of 0.748 (95%-CI: 0.735, 0.762), while the same model with language features achieved 0.730 (95%-CI: 0.717, 0.745) and a transformer-based model achieved 0.702 (95%-CI: 0.688, 0.717). This paper shows how language models can be used in clinical applications and underlines how risk bias is different for diverse patient groups, even using only free-text data.
    Noisy Low-rank Matrix Optimization: Geometry of Local Minima and Convergence Rate. (arXiv:2203.03899v3 [math.OC] UPDATED)
    This paper is concerned with low-rank matrix optimization, which has found a wide range of applications in machine learning. This problem in the special case of matrix sensing has been studied extensively through the notion of Restricted Isometry Property (RIP), leading to a wealth of results on the geometric landscape of the problem and the convergence rate of common algorithms. However, the existing results can handle the problem in the case with a general objective function subject to noisy data only when the RIP constant is close to 0. In this paper, we develop a new mathematical framework to solve the above-mentioned problem with a far less restrictive RIP constant. We prove that as long as the RIP constant of the noiseless objective is less than $1/3$, any spurious local solution of the noisy optimization problem must be close to the ground truth solution. By working through the strict saddle property, we also show that an approximate solution can be found in polynomial time. We characterize the geometry of the spurious local minima of the problem in a local region around the ground truth in the case when the RIP constant is greater than $1/3$. Compared to the existing results in the literature, this paper offers the strongest RIP bound and provides a complete theoretical analysis on the global and local optimization landscapes of general low-rank optimization problems under random corruptions from any finite-variance family.
    Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance. (arXiv:2208.08664v2 [cs.CV] UPDATED)
    Denoising diffusion probabilistic models (DDPMs) are a recent family of generative models that achieve state-of-the-art results. In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier. While the idea is theoretically sound, deep learning-based classifiers are infamously susceptible to gradient-based adversarial attacks. Therefore, while traditional classifiers may achieve good accuracy scores, their gradients are possibly unreliable and might hinder the improvement of the generation results. Recent work discovered that adversarially robust classifiers exhibit gradients that are aligned with human perception, and these could better guide a generative process towards semantically meaningful images. We utilize this observation by defining and training a time-dependent adversarially robust classifier and use it as guidance for a generative diffusion model. In experiments on the highly challenging and diverse ImageNet dataset, our scheme introduces significantly more intelligible intermediate gradients, better alignment with theoretical findings, as well as improved generation results under several evaluation metrics. Furthermore, we conduct an opinion survey whose findings indicate that human raters prefer our method's results.
    Mine yOur owN Anatomy: Revisiting Medical Image Segmentation with Extremely Limited Labels. (arXiv:2209.13476v4 [eess.IV] UPDATED)
    Recent studies on contrastive learning have achieved remarkable performance solely by leveraging few labels in the context of medical image segmentation. Existing methods mainly focus on instance discrimination and invariant mapping. However, they face three common pitfalls: (1) tailness: medical image data usually follows an implicit long-tail class distribution. Blindly leveraging all pixels in training hence can lead to the data imbalance issues, and cause deteriorated performance; (2) consistency: it remains unclear whether a segmentation model has learned meaningful and yet consistent anatomical features due to the intra-class variations between different anatomical features; and (3) diversity: the intra-slice correlations within the entire dataset have received significantly less attention. This motivates us to seek a principled approach for strategically making use of the dataset itself to discover similar yet distinct samples from different anatomical views. In this paper, we introduce a novel semi-supervised 2D medical image segmentation framework termed Mine yOur owN Anatomy (MONA), and make three contributions. First, prior work argues that every pixel equally matters to the model training; we observe empirically that this alone is unlikely to define meaningful anatomical features, mainly due to lacking the supervision signal. We show two simple solutions towards learning invariances - through the use of stronger data augmentations and nearest neighbors. Second, we construct a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features in an unsupervised manner. Lastly, we both empirically and theoretically, demonstrate the efficacy of our MONA on three benchmark datasets with different labeled settings, achieving new state-of-the-art under different labeled semi-supervised settings
    MPCFormer: fast, performant and private Transformer inference with MPC. (arXiv:2211.01452v2 [cs.LG] UPDATED)
    Enabling private inference is crucial for many cloud inference services that are based on Transformer models. However, existing private inference solutions can increase the inference latency by more than 60x or significantly compromise the inference quality. In this paper, we design the framework MPCFORMER as a practical solution, using Secure Multi-Party Computation (MPC) and Knowledge Distillation (KD). Through extensive evaluations, we show that MPCFORMER significantly speeds up Transformer inference in MPC settings while achieving similar ML performance to the input model. On the IMDb dataset, it achieves similar performance to BERTBASE, while being 5.3x faster. On the GLUE benchmark, it achieves 97% performance of BERTBASE with a 2.2x speedup. MPCFORMER remains effective with different trained Transformer weights such as ROBERTABASE and larger models including BERTLarge. Code is available at https://github.com/MccRee177/MPCFormer.
    LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space. (arXiv:2209.12746v2 [cs.CV] UPDATED)
    As the methods evolve, inversion is mainly divided into two steps. The first step is Image Embedding, in which an encoder or optimization process embeds images to get the corresponding latent codes. Afterward, the second step aims to refine the inversion and editing results, which we named Result Refinement. Although the second step significantly improves fidelity, perception and editability are almost unchanged, deeply dependent on inverse latent codes attained in the first step. Therefore, a crucial problem is gaining the latent codes with better perception and editability while retaining the reconstruction fidelity. In this work, we first point out that these two characteristics are related to the degree of alignment (or disalignment) of the inverse codes with the synthetic distribution. Then, we propose Latent Space Alignment Inversion Paradigm (LSAP), which consists of evaluation metric and solution for this problem. Specifically, we introduce Normalized Style Space ($\mathcal{S^N}$ space) and $\mathcal{S^N}$ Cosine Distance (SNCD) to measure disalignment of inversion methods. Since our proposed SNCD is differentiable, it can be optimized in both encoder-based and optimization-based embedding methods to conduct a uniform solution. Extensive experiments in various domains demonstrate that SNCD effectively reflects perception and editability, and our alignment paradigm archives the state-of-the-art in both two steps. Code is available on https://github.com/caopulan/GANInverter/tree/main/configs/lsap.
    Beyond Object Recognition: A New Benchmark towards Object Concept Learning. (arXiv:2212.02710v2 [cs.CV] UPDATED)
    Understanding objects is a central building block of artificial intelligence, especially for embodied AI. Even though object recognition excels with deep learning, current machines still struggle to learn higher-level knowledge, e.g., what attributes an object has, and what can we do with an object. In this work, we propose a challenging Object Concept Learning (OCL) task to push the envelope of object understanding. It requires machines to reason out object affordances and simultaneously give the reason: what attributes make an object possesses these affordances. To support OCL, we build a densely annotated knowledge base including extensive labels for three levels of object concept (category, attribute, affordance), and the causal relations of three levels. By analyzing the causal structure of OCL, we present a baseline, Object Concept Reasoning Network (OCRN). It leverages causal intervention and concept instantiation to infer the three levels following their causal relations. In experiments, OCRN effectively infers the object knowledge while following the causalities well. Our data and code are available at https://mvig-rhos.com/ocl.
    Deep Incubation: Training Large Models by Divide-and-Conquering. (arXiv:2212.04129v2 [cs.CV] UPDATED)
    Recent years have witnessed a remarkable success of large deep learning models. However, training these models is challenging due to high computational costs, painfully slow convergence, and overfitting issues. In this paper, we present Deep Incubation, a novel approach that enables the efficient and effective training of large models by dividing them into smaller sub-modules that can be trained separately and assembled seamlessly. A key challenge for implementing this idea is to ensure the compatibility of the independently trained sub-modules. To address this issue, we first introduce a global, shared meta model, which is leveraged to implicitly link all the modules together, and can be designed as an extremely small network with negligible computational overhead. Then we propose a module incubation algorithm, which trains each sub-module to replace the corresponding component of the meta model and accomplish a given learning task. Despite the simplicity, our approach effectively encourages each sub-module to be aware of its role in the target large model, such that the finally-learned sub-modules can collaborate with each other smoothly after being assembled. Empirically, our method outperforms end-to-end (E2E) training in terms of both final accuracy and training efficiency. For example, on top of ViT-Huge, it improves the accuracy by 2.7% on ImageNet or achieves similar performance with 4x less training time. Notably, the gains are significant for downstream tasks as well (e.g., object detection and image segmentation on COCO and ADE20K). Code is available at https://github.com/LeapLabTHU/Deep-Incubation.
    Effectively Modeling Time Series with Simple Discrete State Spaces. (arXiv:2303.09489v1 [cs.LG])
    Time series modeling is a well-established problem, which often requires that methods (1) expressively represent complicated dependencies, (2) forecast long horizons, and (3) efficiently train over long sequences. State-space models (SSMs) are classical models for time series, and prior works combine SSMs with deep learning layers for efficient sequence modeling. However, we find fundamental limitations with these prior approaches, proving their SSM representations cannot express autoregressive time series processes. We thus introduce SpaceTime, a new state-space time series architecture that improves all three criteria. For expressivity, we propose a new SSM parameterization based on the companion matrix -- a canonical representation for discrete-time processes -- which enables SpaceTime's SSM layers to learn desirable autoregressive processes. For long horizon forecasting, we introduce a "closed-loop" variation of the companion SSM, which enables SpaceTime to predict many future time-steps by generating its own layer-wise inputs. For efficient training and inference, we introduce an algorithm that reduces the memory and compute of a forward pass with the companion matrix. With sequence length $\ell$ and state-space size $d$, we go from $\tilde{O}(d \ell)$ na\"ively to $\tilde{O}(d + \ell)$. In experiments, our contributions lead to state-of-the-art results on extensive and diverse benchmarks, with best or second-best AUROC on 6 / 7 ECG and speech time series classification, and best MSE on 14 / 16 Informer forecasting tasks. Furthermore, we find SpaceTime (1) fits AR($p$) processes that prior deep SSMs fail on, (2) forecasts notably more accurately on longer horizons than prior state-of-the-art, and (3) speeds up training on real-world ETTh1 data by 73% and 80% relative wall-clock time over Transformers and LSTMs.
    Language Model Decoding as Likelihood-Utility Alignment. (arXiv:2210.07228v2 [cs.CL] UPDATED)
    A critical component of a successful language generation pipeline is the decoding algorithm. However, the general principles that should guide the choice of a decoding algorithm remain unclear. Previous works only compare decoding algorithms in narrow scenarios, and their findings do not generalize across tasks. We argue that the misalignment between the model's likelihood and the task-specific notion of utility is the key factor to understanding the effectiveness of decoding algorithms. To structure the discussion, we introduce a taxonomy of misalignment mitigation strategies (MMSs), providing a unifying view of decoding as a tool for alignment. The MMS taxonomy groups decoding algorithms based on their implicit assumptions about likelihood--utility misalignment, yielding general statements about their applicability across tasks. Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide empirical evidence supporting the proposed taxonomy and a set of principles to structure reasoning when choosing a decoding algorithm. Crucially, our analysis is the first to relate likelihood-based decoding algorithms with algorithms that rely on external information, such as value-guided methods and prompting, and covers the most diverse set of tasks to date. Code, data, and models are available at https://github.com/epfl-dlab/understanding-decoding.
    Variational Principles for Mirror Descent and Mirror Langevin Dynamics. (arXiv:2303.09532v1 [math.OC])
    Mirror descent, introduced by Nemirovski and Yudin in the 1970s, is a primal-dual convex optimization method that can be tailored to the geometry of the optimization problem at hand through the choice of a strongly convex potential function. It arises as a basic primitive in a variety of applications, including large-scale optimization, machine learning, and control. This paper proposes a variational formulation of mirror descent and of its stochastic variant, mirror Langevin dynamics. The main idea, inspired by the classic work of Brezis and Ekeland on variational principles for gradient flows, is to show that mirror descent emerges as a closed-loop solution for a certain optimal control problem, and the Bellman value function is given by the Bregman divergence between the initial condition and the global minimizer of the objective function.
    DISCOVER: Deep identification of symbolically concise open-form PDEs via enhanced reinforcement-learning. (arXiv:2210.02181v2 [cs.LG] UPDATED)
    The working mechanisms of complex natural systems tend to abide by concise and profound partial differential equations (PDEs). Methods that directly mine equations from data are called PDE discovery, which reveals consistent physical laws and facilitates our adaptive interaction with the natural world. In this paper, an enhanced deep reinforcement-learning framework is proposed to uncover symbolically concise open-form PDEs with little prior knowledge. Particularly, based on a symbol library of basic operators and operands, a structure-aware recurrent neural network agent is designed and seamlessly combined with the sparse regression method to generate concise and open-form PDE expressions. All of the generated PDEs are evaluated by a meticulously designed reward function by balancing fitness to data and parsimony, and updated by the model-based reinforcement learning in an efficient way. Customized constraints and regulations are formulated to guarantee the rationality of PDEs in terms of physics and mathematics. The experiments demonstrate that our framework is capable of mining open-form governing equations of several dynamic systems, even with compound equation terms, fractional structure, and high-order derivatives, with excellent efficiency. Without the need for prior knowledge, this method shows great potential for knowledge discovery in more complicated circumstances with exceptional efficiency and scalability.
    Knowledge Discovery from Atomic Structures using Feature Importances. (arXiv:2303.09453v1 [cond-mat.mtrl-sci])
    Molecular-level understanding of the interactions between the constituents of an atomic structure is essential for designing novel materials in various applications. This need goes beyond the basic knowledge of the number and types of atoms, their chemical composition, and the character of the chemical interactions. The bigger picture takes place on the quantum level which can be addressed by using the Density-functional theory (DFT). Use of DFT, however, is a computationally taxing process, and its results do not readily provide easily interpretable insight into the atomic interactions which would be useful information in material design. An alternative way to address atomic interactions is to use an interpretable machine learning approach, where a predictive DFT surrogate is constructed and analyzed. The purpose of this paper is to propose such a procedure using a modification of the recently published interpretable distance-based regression method. Our tests with a representative benchmark set of molecules and a complex hybrid nanoparticle confirm the viability and usefulness of the proposed approach.
    Well-classified Examples are Underestimated in Classification with Deep Neural Networks. (arXiv:2110.06537v6 [cs.LG] UPDATED)
    The conventional wisdom behind learning deep classification models is to focus on bad-classified examples and ignore well-classified examples that are far from the decision boundary. For instance, when training with cross-entropy loss, examples with higher likelihoods (i.e., well-classified examples) contribute smaller gradients in back-propagation. However, we theoretically show that this common practice hinders representation learning, energy optimization, and margin growth. To counteract this deficiency, we propose to reward well-classified examples with additive bonuses to revive their contribution to the learning process. This counterexample theoretically addresses these three issues. We empirically support this claim by directly verifying the theoretical results or significant performance improvement with our counterexample on diverse tasks, including image classification, graph classification, and machine translation. Furthermore, this paper shows that we can deal with complex scenarios, such as imbalanced classification, OOD detection, and applications under adversarial attacks because our idea can solve these three issues. Code is available at: https://github.com/lancopku/well-classified-examples-are-underestimated.
    WebSHAP: Towards Explaining Any Machine Learning Models Anywhere. (arXiv:2303.09545v1 [cs.LG])
    As machine learning (ML) is increasingly integrated into our everyday Web experience, there is a call for transparent and explainable web-based ML. However, existing explainability techniques often require dedicated backend servers, which limit their usefulness as the Web community moves toward in-browser ML for lower latency and greater privacy. To address the pressing need for a client-side explainability solution, we present WebSHAP, the first in-browser tool that adapts the state-of-the-art model-agnostic explainability technique SHAP to the Web environment. Our open-source tool is developed with modern Web technologies such as WebGL that leverage client-side hardware capabilities and make it easy to integrate into existing Web ML applications. We demonstrate WebSHAP in a usage scenario of explaining ML-based loan approval decisions to loan applicants. Reflecting on our work, we discuss the opportunities and challenges for future research on transparent Web ML. WebSHAP is available at https://github.com/poloclub/webshap.
    Parametric Classification for Generalized Category Discovery: A Baseline Study. (arXiv:2211.11727v2 [cs.CV] UPDATED)
    Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. Previous studies argued that parametric classifiers are prone to overfitting to seen categories, and endorsed using a non-parametric classifier formed with semi-supervised k-means. However, in this study, we investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We demonstrate that two prediction biases exist: the classifier tends to predict seen classes more often, and produces an imbalanced distribution across seen and novel categories. Based on these findings, we propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers. We hope the investigation and proposed simple framework can serve as a strong baseline to facilitate future studies in this field. Our code is available at: https://github.com/CVMI-Lab/SimGCD.
    Approximation of functions with one-bit neural networks. (arXiv:2112.09181v2 [cs.LG] UPDATED)
    The celebrated universal approximation theorems for neural networks roughly state that any reasonable function can be arbitrarily well-approximated by a network whose parameters are appropriately chosen real numbers. This paper examines the approximation capabilities of one-bit neural networks -- those whose nonzero parameters are $\pm a$ for some fixed $a\not=0$. One of our main theorems shows that for any $f\in C^s([0,1]^d)$ with $\|f\|_\infty<1$ and error $\varepsilon$, there is a $f_{NN}$ such that $|f(\boldsymbol{x})-f_{NN}(\boldsymbol{x})|\leq \varepsilon$ for all $\boldsymbol{x}$ away from the boundary of $[0,1]^d$, and $f_{NN}$ is either implementable by a $\{\pm 1\}$ quadratic network with $O(\varepsilon^{-2d/s})$ parameters or a $\{\pm \frac 1 2 \}$ ReLU network with $O(\varepsilon^{-2d/s}\log (1/\varepsilon))$ parameters, as $\varepsilon\to0$. We establish new approximation results for iterated multivariate Bernstein operators, error estimates for noise-shaping quantization on the Bernstein basis, and novel implementation of the Bernstein polynomials by one-bit quadratic and ReLU neural networks.
    Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval. (arXiv:2212.04267v2 [cs.CV] UPDATED)
    Vision-Language Pretraining (VLP) and Foundation models have been the go-to recipe for achieving SoTA performance on general benchmarks. However, leveraging these powerful techniques for more complex vision-language tasks, such as cooking applications, with more structured input data, is still little investigated. In this work, we propose to leverage these techniques for structured-text based computational cuisine tasks. Our strategy, dubbed VLPCook, first transforms existing image-text pairs to image and structured-text pairs. This allows to pretrain our VLPCook model using VLP objectives adapted to the strutured data of the resulting datasets, then finetuning it on downstream computational cooking tasks. During finetuning, we also enrich the visual encoder, leveraging pretrained foundation models (e.g. CLIP) to provide local and global textual context. VLPCook outperforms current SoTA by a significant margin (+3.3 Recall@1 absolute improvement) on the task of Cross-Modal Food Retrieval on the large Recipe1M dataset. We conduct further experiments on VLP to validate their importance, especially on the Recipe1M+ dataset. Finally, we validate the generalization of the approach to other tasks (i.e, Food Recognition) and domains with structured text such as the Medical domain on the ROCO dataset. The code is available here: https://github.com/mshukor/VLPCook
    Necessary and Sufficient Conditions for Inverse Reinforcement Learning of Bayesian Stopping Time Problems. (arXiv:2007.03481v6 [cs.LG] UPDATED)
    This paper presents an inverse reinforcement learning~(IRL) framework for Bayesian stopping time problems. By observing the actions of a Bayesian decision maker, we provide a necessary and sufficient condition to identify if these actions are consistent with optimizing a cost function. In a Bayesian (partially observed) setting, the inverse learner can at best identify optimality wrt the observed strategies. Our IRL algorithm identifies optimality and then constructs set-valued estimates of the cost function.To achieve this IRL objective, we use novel ideas from Bayesian revealed preferences stemming from microeconomics. We illustrate the proposed IRL scheme using two important examples of stopping time problems, namely, sequential hypothesis testing and Bayesian search. As a real-world example, we illustrate using a YouTube dataset comprising metadata from 190000 videos how the proposed IRL method predicts user engagement in online multimedia platforms with high accuracy. Finally, for finite datasets, we propose an IRL detection algorithm and give finite sample bounds on its error probabilities.
    Transient Stability Analysis with Physics-Informed Neural Networks. (arXiv:2106.13638v3 [cs.LG] UPDATED)
    We explore the possibility to use physics-informed neural networks to drastically accelerate the solution of ordinary differential-algebraic equations that govern the power system dynamics. When it comes to transient stability assessment, the traditionally applied methods either carry a significant computational burden, require model simplifications, or use overly conservative surrogate models. Conventional neural networks can circumvent these limitations but are faced with high demand of high-quality training datasets, while they ignore the underlying governing equations. Physics-informed neural networks are different: they incorporate the power system differential algebraic equations directly into the neural network training and drastically reduce the need for training data. This paper takes a deep dive into the performance of physics-informed neural networks for power system transient stability assessment. Introducing a new neural network training procedure to facilitate a thorough comparison, we explore how physics-informed neural networks compare with conventional differential-algebraic solvers and classical neural networks in terms of computation time, requirements in data, and prediction accuracy. We illustrate the findings on the Kundur two-area system, and assess the opportunities and challenges of physics-informed neural networks to serve as a transient stability analysis tool, highlighting possible pathways to further develop this method.
    Optimal learning of quantum Hamiltonians from high-temperature Gibbs states. (arXiv:2108.04842v3 [quant-ph] UPDATED)
    We study the problem of learning a Hamiltonian $H$ to precision $\varepsilon$, supposing we are given copies of its Gibbs state $\rho=\exp(-\beta H)/\operatorname{Tr}(\exp(-\beta H))$ at a known inverse temperature $\beta$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (Nature Physics, 2021, arXiv:2004.07266) recently studied the sample complexity (number of copies of $\rho$ needed) of this problem for geometrically local $N$-qubit Hamiltonians. In the high-temperature (low $\beta$) regime, their algorithm has sample complexity poly$(N, 1/\beta,1/\varepsilon)$ and can be implemented with polynomial, but suboptimal, time complexity. In this paper, we study the same question for a more general class of Hamiltonians. We show how to learn the coefficients of a Hamiltonian to error $\varepsilon$ with sample complexity $S = O(\log N/(\beta\varepsilon)^{2})$ and time complexity linear in the sample size, $O(S N)$. Furthermore, we prove a matching lower bound showing that our algorithm's sample complexity is optimal, and hence our time complexity is also optimal. In the appendix, we show that virtually the same algorithm can be used to learn $H$ from a real-time evolution unitary $e^{-it H}$ in a small $t$ regime with similar sample and time complexity.
    SemDeDup: Data-efficient learning at web-scale through semantic deduplication. (arXiv:2303.09540v1 [cs.LG])
    Progress in machine learning has been driven in large part by massive increases in data. However, large web-scale datasets such as LAION are largely uncurated beyond searches for exact duplicates, potentially leaving much redundancy. Here, we introduce SemDeDup, a method which leverages embeddings from pre-trained models to identify and remove semantic duplicates: data pairs which are semantically similar, but not exactly identical. Removing semantic duplicates preserves performance and speeds up learning. Analyzing a subset of LAION, we show that SemDeDup can remove 50% of the data with minimal performance loss, effectively halving training time. Moreover, performance increases out of distribution. Also, analyzing language models trained on C4, a partially curated dataset, we show that SemDeDup improves over prior approaches while providing efficiency gains. SemDeDup provides an example of how simple ways of leveraging quality embeddings can be used to make models learn faster with less data.
    Towards Lower Bounds on the Depth of ReLU Neural Networks. (arXiv:2105.14835v4 [cs.LG] UPDATED)
    We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). As a by-product of our investigations, we settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We also present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.
    Text-to-ECG: 12-Lead Electrocardiogram Synthesis conditioned on Clinical Text Reports. (arXiv:2303.09395v1 [cs.CL])
    Electrocardiogram (ECG) synthesis is the area of research focused on generating realistic synthetic ECG signals for medical use without concerns over annotation costs or clinical data privacy restrictions. Traditional ECG generation models consider a single ECG lead and utilize GAN-based generative models. These models can only generate single lead samples and require separate training for each diagnosis class. The diagnosis classes of ECGs are insufficient to capture the intricate differences between ECGs depending on various features (e.g. patient demographic details, co-existing diagnosis classes, etc.). To alleviate these challenges, we present a text-to-ECG task, in which textual inputs are used to produce ECG outputs. Then we propose Auto-TTE, an autoregressive generative model conditioned on clinical text reports to synthesize 12-lead ECGs, for the first time to our knowledge. We compare the performance of our model with other representative models in text-to-speech and text-to-image. Experimental results show the superiority of our model in various quantitative evaluations and qualitative analysis. Finally, we conduct a user study with three board-certified cardiologists to confirm the fidelity and semantic alignment of generated samples. our code will be available at https://github.com/TClife/text_to_ecg
    SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments. (arXiv:2303.09555v1 [cs.RO])
    While significant research progress has been made in robot learning for control, unique challenges arise when simultaneously co-optimizing morphology. Existing work has typically been tailored for particular environments or representations. In order to more fully understand inherent design and performance tradeoffs and accelerate the development of new breeds of soft robots, a comprehensive virtual platform with well-established tasks, environments, and evaluation metrics is needed. In this work, we introduce SoftZoo, a soft robot co-design platform for locomotion in diverse environments. SoftZoo supports an extensive, naturally-inspired material set, including the ability to simulate environments such as flat ground, desert, wetland, clay, ice, snow, shallow water, and ocean. Further, it provides a variety of tasks relevant for soft robotics, including fast locomotion, agile turning, and path following, as well as differentiable design representations for morphology and control. Combined, these elements form a feature-rich platform for analysis and development of soft robot co-design algorithms. We benchmark prevalent representations and co-design algorithms, and shed light on 1) the interplay between environment, morphology, and behavior; 2) the importance of design space representations; 3) the ambiguity in muscle formation and controller synthesis; and 4) the value of differentiable physics. We envision that SoftZoo will serve as a standard platform and template an approach toward the development of novel representations and algorithms for co-designing soft robots' behavioral and morphological intelligence.
    3D Masked Autoencoding and Pseudo-labeling for Domain Adaptive Segmentation of Heterogeneous Infant Brain MRI. (arXiv:2303.09373v1 [cs.CV])
    Robust segmentation of infant brain MRI across multiple ages, modalities, and sites remains challenging due to the intrinsic heterogeneity caused by different MRI scanners, vendors, or acquisition sequences, as well as varying stages of neurodevelopment. To address this challenge, previous studies have explored domain adaptation (DA) algorithms from various perspectives, including feature alignment, entropy minimization, contrast synthesis (style transfer), and pseudo-labeling. This paper introduces a novel framework called MAPSeg (Masked Autoencoding and Pseudo-labelling Segmentation) to address the challenges of cross-age, cross-modality, and cross-site segmentation of subcortical regions in infant brain MRI. Utilizing 3D masked autoencoding as well as masked pseudo-labeling, the model is able to jointly learn from labeled source domain data and unlabeled target domain data. We evaluated our framework on expert-annotated datasets acquired from different ages and sites. MAPSeg consistently outperformed other methods, including previous state-of-the-art supervised baselines, domain generalization, and domain adaptation frameworks in segmenting subcortical regions regardless of age, modality, or acquisition site. The code and pretrained encoder will be publicly available at https://github.com/XuzheZ/MAPSeg
    WaveMix: A Resource-efficient Neural Network for Image Analysis. (arXiv:2205.14375v4 [cs.CV] UPDATED)
    We propose WaveMix -- a novel neural architecture for computer vision that is resource-efficient yet generalizable and scalable. WaveMix networks achieve comparable or better accuracy than the state-of-the-art convolutional neural networks, vision transformers, and token mixers for several tasks, establishing new benchmarks for segmentation on Cityscapes; and for classification on Places-365, five EMNIST datasets, and iNAT-mini. Remarkably, WaveMix architectures require fewer parameters to achieve these benchmarks compared to the previous state-of-the-art. Moreover, when controlled for the number of parameters, WaveMix requires lesser GPU RAM, which translates to savings in time, cost, and energy. To achieve these gains we used multi-level two-dimensional discrete wavelet transform (2D-DWT) in WaveMix blocks, which has the following advantages: (1) It reorganizes spatial information based on three strong image priors -- scale-invariance, shift-invariance, and sparseness of edges, (2) in a lossless manner without adding parameters, (3) while also reducing the spatial sizes of feature maps, which reduces the memory and time required for forward and backward passes, and (4) expanding the receptive field faster than convolutions do. The whole architecture is a stack of self-similar and resolution-preserving WaveMix blocks, which allows architectural flexibility for various tasks and levels of resource availability. Our code and trained models are publicly available.
    Fast Distributed k-Means with a Small Number of Rounds. (arXiv:2201.13217v2 [cs.DC] UPDATED)
    We propose a new algorithm for k-means clustering in a distributed setting, where the data is distributed across many machines, and a coordinator communicates with these machines to calculate the output clustering. Our algorithm guarantees a cost approximation factor and a number of communication rounds that depend only on the computational capacity of the coordinator. Moreover, the algorithm includes a built-in stopping mechanism, which allows it to use fewer communication rounds whenever possible. We show both theoretically and empirically that in many natural cases, indeed 1-4 rounds suffice. In comparison with the popular k-means|| algorithm, our approach allows exploiting a larger coordinator capacity to obtain a smaller number of rounds. Our experiments show that the k-means cost obtained by the proposed algorithm is usually better than the cost obtained by k-means||, even when the latter is allowed a larger number of rounds. Moreover, the machine running time in our approach is considerably smaller than that of k-means||. Code for running the algorithm and experiments is available at https://github.com/selotape/distributed_k_means.
    FibeRed: Fiberwise Dimensionality Reduction of Topologically Complex Data with Vector Bundles. (arXiv:2206.06513v2 [cs.CG] UPDATED)
    Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the fibers, while preserving the large scale topology. We formalize this point of view, and, as an application, we describe an algorithm which takes as input a dataset together with an initial representation of it in Euclidean space, assumed to recover part of its large scale topology, and outputs a new representation that integrates local representations, obtained through local linear dimensionality reduction, along the initial global representation. We demonstrate this algorithm on examples coming from dynamical systems and chemistry. In these examples, our algorithm is able to learn topologically faithful embeddings of the data in lower target dimension than various well known metric-based dimensionality reduction algorithms.
    An adaptation of InfoMap to absorbing random walks using absorption-scaled graphs. (arXiv:2112.10953v2 [cs.SI] UPDATED)
    InfoMap is a popular approach for detecting densely connected "communities" of nodes in networks. To detect such communities, InfoMap uses random walks and ideas from information theory. Motivated by the dynamics of disease spread on networks, whose nodes may have heterogeneous disease-removal rates, we adapt InfoMap to absorbing random walks. To do this, we use absorption-scaled graphs, in which the edge weights are scaled according to absorption rates, along with Markov time sweeping. One of our adaptations of InfoMap converges to the standard version of InfoMap in the limit in which the node-absorption rates approach $0$. The community structure that we obtain using our adaptations of InfoMap can differ markedly from the community structure that one detects using methods that do not take node-absorption rates into account. Additionally, we demonstrate that the community structure that is induced by local dynamics can have important implications for susceptible-infected-recovered (SIR) dynamics on ring-lattice networks. For example, we find situations in which the outbreak duration is maximized when a moderate number of nodes have large node-absorption rates.
    LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations. (arXiv:2303.09384v1 [cs.SE])
    Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of generating code snippets from Natural Language (NL) descriptions by learning languages and programming practices from public GitHub repositories. Although LLMs promise an effortless NL-driven deployment of software applications, the security of the code they generate has not been extensively investigated nor documented. In this work, we present LLMSecEval, a dataset containing 150 NL prompts that can be leveraged for assessing the security performance of such models. Such prompts are NL descriptions of code snippets prone to various security vulnerabilities listed in MITRE's Top 25 Common Weakness Enumeration (CWE) ranking. Each prompt in our dataset comes with a secure implementation example to facilitate comparative evaluations against code produced by LLMs. As a practical application, we show how LLMSecEval can be used for evaluating the security of snippets automatically generated from NL descriptions.
    Deep Metric Learning for Unsupervised Remote Sensing Change Detection. (arXiv:2303.09536v1 [cs.CV])
    Remote Sensing Change Detection (RS-CD) aims to detect relevant changes from Multi-Temporal Remote Sensing Images (MT-RSIs), which aids in various RS applications such as land cover, land use, human development analysis, and disaster response. The performance of existing RS-CD methods is attributed to training on large annotated datasets. Furthermore, most of these models are less transferable in the sense that the trained model often performs very poorly when there is a domain gap between training and test datasets. This paper proposes an unsupervised CD method based on deep metric learning that can deal with both of these issues. Given an MT-RSI, the proposed method generates corresponding change probability map by iteratively optimizing an unsupervised CD loss without training it on a large dataset. Our unsupervised CD method consists of two interconnected deep networks, namely Deep-Change Probability Generator (D-CPG) and Deep-Feature Extractor (D-FE). The D-CPG is designed to predict change and no change probability maps for a given MT-RSI, while D-FE is used to extract deep features of MT-RSI that will be further used in the proposed unsupervised CD loss. We use transfer learning capability to initialize the parameters of D-FE. We iteratively optimize the parameters of D-CPG and D-FE for a given MT-RSI by minimizing the proposed unsupervised ``similarity-dissimilarity loss''. This loss is motivated by the principle of metric learning where we simultaneously maximize the distance between change pair-wise pixels while minimizing the distance between no-change pair-wise pixels in bi-temporal image domain and their deep feature domain. The experiments conducted on three CD datasets show that our unsupervised CD method achieves significant improvements over the state-of-the-art supervised and unsupervised CD methods. Code available at https://github.com/wgcban/Metric-CD
    Improving CNN-base Stock Trading By Considering Data Heterogeneity and Burst. (arXiv:2303.09407v1 [q-fin.ST])
    In recent years, there have been quite a few attempts to apply intelligent techniques to financial trading, i.e., constructing automatic and intelligent trading framework based on historical stock price. Due to the unpredictable, uncertainty and volatile nature of financial market, researchers have also resorted to deep learning to construct the intelligent trading framework. In this paper, we propose to use CNN as the core functionality of such framework, because it is able to learn the spatial dependency (i.e., between rows and columns) of the input data. However, different with existing deep learning-based trading frameworks, we develop novel normalization process to prepare the stock data. In particular, we first empirically observe that the stock data is intrinsically heterogeneous and bursty, and then validate the heterogeneity and burst nature of stock data from a statistical perspective. Next, we design the data normalization method in a way such that the data heterogeneity is preserved and bursty events are suppressed. We verify out developed CNN-based trading framework plus our new normalization method on 29 stocks. Experiment results show that our approach can outperform other comparing approaches.
    Geometric Analysis of Noisy Low-rank Matrix Recovery in the Exact Parameterized and the Overparameterized Regimes. (arXiv:2105.08232v3 [math.OC] UPDATED)
    The matrix sensing problem is an important low-rank optimization problem that has found a wide range of applications, such as matrix completion, phase synchornization/retrieval, robust PCA, and power system state estimation. In this work, we focus on the general matrix sensing problem with linear measurements that are corrupted by random noise. We investigate the scenario where the search rank $r$ is equal to the true rank $r^*$ of the unknown ground truth (the exact parametrized case), as well as the scenario where $r$ is greater than $r^*$ (the overparametrized case). We quantify the role of the restricted isometry property (RIP) in shaping the landscape of the non-convex factorized formulation and assisting with the success of local search algorithms. First, we develop a global guarantee on the maximum distance between an arbitrary local minimizer of the non-convex problem and the ground truth under the assumption that the RIP constant is smaller than $1/(1+\sqrt{r^*/r})$. We then present a local guarantee for problems with an arbitrary RIP constant, which states that any local minimizer is either considerably close to the ground truth or far away from it. More importantly, we prove that this noisy, overparametrized problem exhibits the strict saddle property, which leads to the global convergence of perturbed gradient descent algorithm in polynomial time. The results of this work provide a comprehensive understanding of the geometric landscape of the matrix sensing problem in the noisy and overparametrized regime.
    PyVBMC: Efficient Bayesian inference in Python. (arXiv:2303.09519v1 [stat.ML])
    PyVBMC is a Python implementation of the Variational Bayesian Monte Carlo (VBMC) algorithm for posterior and model inference for black-box computational models (Acerbi, 2018, 2020). VBMC is an approximate inference method designed for efficient parameter estimation and model assessment when model evaluations are mildly-to-very expensive (e.g., a second or more) and/or noisy. Specifically, VBMC computes: - a flexible (non-Gaussian) approximate posterior distribution of the model parameters, from which statistics and posterior samples can be easily extracted; - an approximation of the model evidence or marginal likelihood, a metric used for Bayesian model selection. PyVBMC can be applied to any computational or statistical model with up to roughly 10-15 continuous parameters, with the only requirement that the user can provide a Python function that computes the target log likelihood of the model, or an approximation thereof (e.g., an estimate of the likelihood obtained via simulation or Monte Carlo methods). PyVBMC is particularly effective when the model takes more than about a second per evaluation, with dramatic speed-ups of 1-2 orders of magnitude when compared to traditional approximate inference methods. Extensive benchmarks on both artificial test problems and a large number of real models from the computational sciences, particularly computational and cognitive neuroscience, show that VBMC generally - and often vastly - outperforms alternative methods for sample-efficient Bayesian inference, and is applicable to both exact and simulator-based models (Acerbi, 2018, 2019, 2020). PyVBMC brings this state-of-the-art inference algorithm to Python, along with an easy-to-use Pythonic interface for running the algorithm and manipulating and visualizing its results.
    Goal-conditioned Offline Reinforcement Learning through State Space Partitioning. (arXiv:2303.09367v1 [cs.LG])
    Offline reinforcement learning (RL) aims to infer sequential decision policies using only offline datasets. This is a particularly difficult setup, especially when learning to achieve multiple different goals or outcomes under a given scenario with only sparse rewards. For offline learning of goal-conditioned policies via supervised learning, previous work has shown that an advantage weighted log-likelihood loss guarantees monotonic policy improvement. In this work we argue that, despite its benefits, this approach is still insufficient to fully address the distribution shift and multi-modality problems. The latter is particularly severe in long-horizon tasks where finding a unique and optimal policy that goes from a state to the desired goal is challenging as there may be multiple and potentially conflicting solutions. To tackle these challenges, we propose a complementary advantage-based weighting scheme that introduces an additional source of inductive bias: given a value-based partitioning of the state space, the contribution of actions expected to lead to target regions that are easier to reach, compared to the final goal, is further increased. Empirically, we demonstrate that the proposed approach, Dual-Advantage Weighted Offline Goal-conditioned RL (DAWOG), outperforms several competing offline algorithms in commonly used benchmarks. Analytically, we offer a guarantee that the learnt policy is never worse than the underlying behaviour policy.
    GLASU: A Communication-Efficient Algorithm for Federated Learning with Vertically Distributed Graph Data. (arXiv:2303.09531v1 [cs.LG])
    Vertical federated learning (VFL) is a distributed learning paradigm, where computing clients collectively train a model based on the partial features of the same set of samples they possess. Current research on VFL focuses on the case when samples are independent, but it rarely addresses an emerging scenario when samples are interrelated through a graph. For graph-structured data, graph neural networks (GNNs) are competitive machine learning models, but a naive implementation in the VFL setting causes a significant communication overhead. Moreover, the analysis of the training is faced with a challenge caused by the biased stochastic gradients. In this paper, we propose a model splitting method that splits a backbone GNN across the clients and the server and a communication-efficient algorithm, GLASU, to train such a model. GLASU adopts lazy aggregation and stale updates to skip aggregation when evaluating the model and skip feature exchanges during training, greatly reducing communication. We offer a theoretical analysis and conduct extensive numerical experiments on real-world datasets, showing that the proposed algorithm effectively trains a GNN model, whose performance matches that of the backbone GNN when trained in a centralized manner.
    Fairness-aware Differentially Private Collaborative Filtering. (arXiv:2303.09527v1 [cs.IR])
    Recently, there has been an increasing adoption of differential privacy guided algorithms for privacy-preserving machine learning tasks. However, the use of such algorithms comes with trade-offs in terms of algorithmic fairness, which has been widely acknowledged. Specifically, we have empirically observed that the classical collaborative filtering method, trained by differentially private stochastic gradient descent (DP-SGD), results in a disparate impact on user groups with respect to different user engagement levels. This, in turn, causes the original unfair model to become even more biased against inactive users. To address the above issues, we propose \textbf{DP-Fair}, a two-stage framework for collaborative filtering based algorithms. Specifically, it combines differential privacy mechanisms with fairness constraints to protect user privacy while ensuring fair recommendations. The experimental results, based on Amazon datasets, and user history logs collected from Etsy, one of the largest e-commerce platforms, demonstrate that our proposed method exhibits superior performance in terms of both overall accuracy and user group fairness on both shallow and deep recommendation models compared to vanilla DP-SGD.
    Speech Modeling with a Hierarchical Transformer Dynamical VAE. (arXiv:2303.09404v1 [eess.AS])
    The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors. In almost all the DVAEs of the literature, the temporal dependencies within each sequence and across the two sequences are modeled with recurrent neural networks. In this paper, we propose to model speech signals with the Hierarchical Transformer DVAE (HiT-DVAE), which is a DVAE with two levels of latent variable (sequence-wise and frame-wise) and in which the temporal dependencies are implemented with the Transformer architecture. We show that HiT-DVAE outperforms several other DVAEs for speech spectrogram modeling, while enabling a simpler training procedure, revealing its high potential for downstream low-level speech processing tasks such as speech enhancement.
    Exploring Resiliency to Natural Image Corruptions in Deep Learning using Design Diversity. (arXiv:2303.09283v1 [cs.LG])
    In this paper, we investigate the relationship between diversity metrics, accuracy, and resiliency to natural image corruptions of Deep Learning (DL) image classifier ensembles. We investigate the potential of an attribution-based diversity metric to improve the known accuracy-diversity trade-off of the typical prediction-based diversity. Our motivation is based on analytical studies of design diversity that have shown that a reduction of common failure modes is possible if diversity of design choices is achieved. Using ResNet50 as a comparison baseline, we evaluate the resiliency of multiple individual DL model architectures against dataset distribution shifts corresponding to natural image corruptions. We compare ensembles created with diverse model architectures trained either independently or through a Neural Architecture Search technique and evaluate the correlation of prediction-based and attribution-based diversity to the final ensemble accuracy. We evaluate a set of diversity enforcement heuristics based on negative correlation learning to assess the final ensemble resilience to natural image corruptions and inspect the resulting prediction, activation, and attribution diversity. Our key observations are: 1) model architecture is more important for resiliency than model size or model accuracy, 2) attribution-based diversity is less negatively correlated to the ensemble accuracy than prediction-based diversity, 3) a balanced loss function of individual and ensemble accuracy creates more resilient ensembles for image natural corruptions, 4) architecture diversity produces more diversity in all explored diversity metrics: predictions, attributions, and activations.
    The Challenges of HTR Model Training: Feedback from the Project Donner le gout de l'archive a l'ere numerique. (arXiv:2212.11146v2 [cs.CV] UPDATED)
    The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.
    Explaining Groups of Instances Counterfactually for XAI: A Use Case, Algorithm and User Study for Group-Counterfactuals. (arXiv:2303.09297v1 [cs.AI])
    Counterfactual explanations are an increasingly popular form of post hoc explanation due to their (i) applicability across problem domains, (ii) proposed legal compliance (e.g., with GDPR), and (iii) reliance on the contrastive nature of human explanation. Although counterfactual explanations are normally used to explain individual predictive-instances, we explore a novel use case in which groups of similar instances are explained in a collective fashion using ``group counterfactuals'' (e.g., to highlight a repeating pattern of illness in a group of patients). These group counterfactuals meet a human preference for coherent, broad explanations covering multiple events/instances. A novel, group-counterfactual algorithm is proposed to generate high-coverage explanations that are faithful to the to-be-explained model. This explanation strategy is also evaluated in a large, controlled user study (N=207), using objective (i.e., accuracy) and subjective (i.e., confidence, explanation satisfaction, and trust) psychological measures. The results show that group counterfactuals elicit modest but definite improvements in people's understanding of an AI system. The implications of these findings for counterfactual methods and for XAI are discussed.
    Interpretability from a new lens: Integrating Stratification and Domain knowledge for Biomedical Applications. (arXiv:2303.09322v1 [cs.LG])
    The use of machine learning (ML) techniques in the biomedical field has become increasingly important, particularly with the large amounts of data generated by the aftermath of the COVID-19 pandemic. However, due to the complex nature of biomedical datasets and the use of black-box ML models, a lack of trust and adoption by domain experts can arise. In response, interpretable ML (IML) approaches have been developed, but the curse of dimensionality in biomedical datasets can lead to model instability. This paper proposes a novel computational strategy for the stratification of biomedical problem datasets into k-fold cross-validation (CVs) and integrating domain knowledge interpretation techniques embedded into the current state-of-the-art IML frameworks. This approach can improve model stability, establish trust, and provide explanations for outcomes generated by trained IML models. Specifically, the model outcome, such as aggregated feature weight importance, can be linked to further domain knowledge interpretations using techniques like pathway functional enrichment, drug targeting, and repurposing databases. Additionally, involving end-users and clinicians in focus group discussions before and after the choice of IML framework can help guide testable hypotheses, improve performance metrics, and build trustworthy and usable IML solutions in the biomedical field. Overall, this study highlights the potential of combining advanced computational techniques with domain knowledge interpretation to enhance the effectiveness of IML solutions in the context of complex biomedical datasets.
    Steering Prototype with Prompt-tuning for Rehearsal-free Continual Learning. (arXiv:2303.09447v1 [cs.LG])
    Prototype, as a representation of class embeddings, has been explored to reduce memory footprint or mitigate forgetting for continual learning scenarios. However, prototype-based methods still suffer from abrupt performance deterioration due to semantic drift and prototype interference. In this study, we propose Contrastive Prototypical Prompt (CPP) and show that task-specific prompt-tuning, when optimized over a contrastive learning objective, can effectively address both obstacles and significantly improve the potency of prototypes. Our experiments demonstrate that CPP excels in four challenging class-incremental learning benchmarks, resulting in 4% to 6% absolute improvements over state-of-the-art methods. Moreover, CPP does not require a rehearsal buffer and it largely bridges the performance gap between continual learning and offline joint-learning, showcasing a promising design scheme for continual learning systems under a Transformer architecture.
    $P+$: Extended Textual Conditioning in Text-to-Image Generation. (arXiv:2303.09522v1 [cs.CV])
    We introduce an Extended Textual Conditioning space in text-to-image models, referred to as $P+$. This space consists of multiple textual conditions, derived from per-layer prompts, each corresponding to a layer of the denoising U-net of the diffusion model. We show that the extended space provides greater disentangling and control over image synthesis. We further introduce Extended Textual Inversion (XTI), where the images are inverted into $P+$, and represented by per-layer tokens. We show that XTI is more expressive and precise, and converges faster than the original Textual Inversion (TI) space. The extended inversion method does not involve any noticeable trade-off between reconstruction and editability and induces more regular inversions. We conduct a series of extensive experiments to analyze and understand the properties of the new space, and to showcase the effectiveness of our method for personalizing text-to-image models. Furthermore, we utilize the unique properties of this space to achieve previously unattainable results in object-style mixing using text-to-image models. Project page: https://prompt-plus.github.io
    On the Existence of a Complexity in Fixed Budget Bandit Identification. (arXiv:2303.09468v1 [stat.ML])
    In fixed budget bandit identification, an algorithm sequentially observes samples from several distributions up to a given final time. It then answers a query about the set of distributions. A good algorithm will have a small probability of error. While that probability decreases exponentially with the final time, the best attainable rate is not known precisely for most identification tasks. We show that if a fixed budget task admits a complexity, defined as a lower bound on the probability of error which is attained by a single algorithm on all bandit problems, then that complexity is determined by the best non-adaptive sampling procedure for that problem. We show that there is no such complexity for several fixed budget identification tasks including Bernoulli best arm identification with two arms: there is no single algorithm that attains everywhere the best possible rate.
    ODIN: On-demand Data Formulation to Mitigate Dataset Lock-in. (arXiv:2303.06832v2 [cs.LG] UPDATED)
    ODIN is an innovative approach that addresses the problem of dataset constraints by integrating generative AI models. Traditional zero-shot learning methods are constrained by the training dataset. To fundamentally overcome this limitation, ODIN attempts to mitigate the dataset constraints by generating on-demand datasets based on user requirements. ODIN consists of three main modules: a prompt generator, a text-to-image generator, and an image post-processor. To generate high-quality prompts and images, we adopted a large language model (e.g., ChatGPT), and a text-to-image diffusion model (e.g., Stable Diffusion), respectively. We evaluated ODIN on various datasets in terms of model accuracy and data diversity to demonstrate its potential, and conducted post-experiments for further investigation. Overall, ODIN is a feasible approach that enables Al to learn unseen knowledge beyond the training dataset.
    Cryptocurrency Price Prediction using Twitter Sentiment Analysis. (arXiv:2303.09397v1 [q-fin.ST])
    The cryptocurrency ecosystem has been the centre of discussion on many social media platforms, following its noted volatility and varied opinions. Twitter is rapidly being utilised as a news source and a medium for bitcoin discussion. Our algorithm seeks to use historical prices and sentiment of tweets to forecast the price of Bitcoin. In this study, we develop an end-to-end model that can forecast the sentiment of a set of tweets (using a Bidirectional Encoder Representations from Transformers - based Neural Network Model) and forecast the price of Bitcoin (using Gated Recurrent Unit) using the predicted sentiment and other metrics like historical cryptocurrency price data, tweet volume, a user's following, and whether or not a user is verified. The sentiment prediction gave a Mean Absolute Percentage Error of 9.45%, an average of real-time data, and test data. The mean absolute percent error for the price prediction was 3.6%.
    The NCI Imaging Data Commons as a platform for reproducible research in computational pathology. (arXiv:2303.09354v1 [cs.CV])
    Objective: Reproducibility is critical for translating machine learning-based (ML) solutions in computational pathology (CompPath) into practice. However, an increasing number of studies report difficulties in reproducing ML results. The NCI Imaging Data Commons (IDC) is a public repository of >120 cancer image collections, including >38,000 whole-slide images (WSIs), that is designed to be used with cloud-based ML services. Here, we explore the potential of the IDC to facilitate reproducibility of CompPath research. Materials and Methods: The IDC realizes the FAIR principles: All images are encoded according to the DICOM standard, persistently identified, discoverable via rich metadata, and accessible via open tools. Taking advantage of this, we implemented two experiments in which a representative ML-based method for classifying lung tumor tissue was trained and/or evaluated on different datasets from the IDC. To assess reproducibility, the experiments were run multiple times with independent but identically configured sessions of common ML services. Results: The AUC values of different runs of the same experiment were generally consistent and in the same order of magnitude as a similar, previously published study. However, there were occasional small variations in AUC values of up to 0.044, indicating a practical limit to reproducibility. Discussion and conclusion: By realizing the FAIR principles, the IDC enables other researchers to reuse exactly the same datasets. Cloud-based ML services enable others to run CompPath experiments in an identically configured computing environment without having to own high-performance hardware. The combination of both makes it possible to approach the reproducibility limit.
    Generative Adversarial Network for Personalized Art Therapy in Melanoma Disease Management. (arXiv:2303.09232v1 [eess.IV])
    Melanoma is the most lethal type of skin cancer. Patients are vulnerable to mental health illnesses which can reduce the effectiveness of the cancer treatment and the patients adherence to drug plans. It is crucial to preserve the mental health of patients while they are receiving treatment. However, current art therapy approaches are not personal and unique to the patient. We aim to provide a well-trained image style transfer model that can quickly generate unique art from personal dermoscopic melanoma images as an additional tool for art therapy in disease management of melanoma. Visual art appreciation as a common form of art therapy in disease management that measurably reduces the degree of psychological distress. We developed a network based on the cycle-consistent generative adversarial network for style transfer that generates personalized and unique artworks from dermoscopic melanoma images. We developed a model that converts melanoma images into unique flower-themed artworks that relate to the shape of the lesion and are therefore personal to the patient. Further, we altered the initial framework and made comparisons and evaluations of the results. With this, we increased the options in the toolbox for art therapy in disease management of melanoma. The development of an easy-to-use user interface ensures the availability of the approach to stakeholders. The transformation of melanoma into flower-themed artworks is achieved by the proposed model and the graphical user interface. This contribution opens a new field of GANs in art therapy and could lead to more personalized disease management.
    Controlling High-Dimensional Data With Sparse Input. (arXiv:2303.09446v1 [eess.AS])
    We address the problem of human-in-the-loop control for generating highly-structured data. This task is challenging because existing generative models lack an efficient interface through which users can modify the output. Users have the option to either manually explore a non-interpretable latent space, or to laboriously annotate the data with conditioning labels. To solve this, we introduce a novel framework whereby an encoder maps a sparse, human interpretable control space onto the latent space of a generative model. We apply this framework to the task of controlling prosody in text-to-speech synthesis. We propose a model, called Multiple-Instance CVAE (MICVAE), that is specifically designed to encode sparse prosodic features and output complete waveforms. We show empirically that MICVAE displays desirable qualities of a sparse human-in-the-loop control mechanism: efficiency, robustness, and faithfulness. With even a very small number of input values (~4), MICVAE enables users to improve the quality of the output significantly, in terms of listener preference (4:1).
    Learning Spatio-Temporal Aggregations for Large-Scale Capacity Expansion Problems. (arXiv:2303.08996v1 [cs.LG])
    Effective investment planning decisions are crucial to ensure cyber-physical infrastructures satisfy performance requirements over an extended time horizon. Computing these decisions often requires solving Capacity Expansion Problems (CEPs). In the context of regional-scale energy systems, these problems are prohibitively expensive to solve due to large network sizes, heterogeneous node characteristics, and a large number of operational periods. To maintain tractability, traditional approaches aggregate network nodes and/or select a set of representative time periods. Often, these reductions do not capture supply-demand variations that crucially impact CEP costs and constraints, leading to suboptimal decisions. Here, we propose a novel graph convolutional autoencoder approach for spatio-temporal aggregation of a generic CEP with heterogeneous nodes (CEPHN). Our architecture leverages graph pooling to identify nodes with similar characteristics and minimizes a multi-objective loss function. This loss function is tailored to induce desirable spatial and temporal aggregations with regard to tractability and optimality. In particular, the output of the graph pooling provides a spatial aggregation while clustering the low-dimensional encoded representations yields a temporal aggregation. We apply our approach to generation expansion planning of a coupled 88-node power and natural gas system in New England. The resulting aggregation leads to a simpler CEPHN with 6 nodes and a small set of representative days selected from one year. We evaluate aggregation outcomes over a range of hyperparameters governing the loss function and compare resulting upper bounds on the original problem with those obtained using benchmark methods. We show that our approach provides upper bounds that are 33% (resp. 10%) lower those than obtained from benchmark spatial (resp. temporal) aggregation approaches.
    Unsupervised domain adaptation by learning using privileged information. (arXiv:2303.09350v1 [cs.LG])
    Successful unsupervised domain adaptation (UDA) is guaranteed only under strong assumptions such as covariate shift and overlap between input domains. The latter is often violated in high-dimensional applications such as image classification which, despite this challenge, continues to serve as inspiration and benchmark for algorithm development. In this work, we show that access to side information about examples from the source and target domains can help relax these assumptions and increase sample efficiency in learning, at the cost of collecting a richer variable set. We call this domain adaptation by learning using privileged information (DALUPI). Tailored for this task, we propose a simple two-stage learning algorithm inspired by our analysis and a practical end-to-end algorithm for multi-label image classification. In a suite of experiments, including an application to medical image analysis, we demonstrate that incorporating privileged information in learning can reduce errors in domain transfer compared to classical learning.
    Combining Distance to Class Centroids and Outlier Discounting for Improved Learning with Noisy Labels. (arXiv:2303.09470v1 [cs.LG])
    In this paper, we propose a new approach for addressing the challenge of training machine learning models in the presence of noisy labels. By combining a clever usage of distance to class centroids in the items' latent space with a discounting strategy to reduce the importance of samples far away from all the class centroids (i.e., outliers), our method effectively addresses the issue of noisy labels. Our approach is based on the idea that samples farther away from their respective class centroid in the early stages of training are more likely to be noisy. We demonstrate the effectiveness of our method through extensive experiments on several popular benchmark datasets. Our results show that our approach outperforms the state-of-the-art in this area, achieving significant improvements in classification accuracy when the dataset contains noisy labels.
    VFP: Converting Tabular Data for IIoT into Images Considering Correlations of Attributes for Convolutional Neural Networks. (arXiv:2303.09068v1 [cs.CV])
    For tabular data generated from IIoT devices, traditional machine learning (ML) techniques based on the decision tree algorithm have been employed. However, these methods have limitations in processing tabular data where real number attributes dominate. To address this issue, DeepInsight, REFINED, and IGTD were proposed to convert tabular data into images for utilizing convolutional neural networks (CNNs). They gather similar features in some specific spots of an image to make the converted image look like an actual image. Gathering similar features contrasts with traditional ML techniques for tabular data, which drops some highly correlated attributes to avoid overfitting. Also, previous converting methods fixed the image size, and there are wasted or insufficient pixels according to the number of attributes of tabular data. Therefore, this paper proposes a new converting method, Vortex Feature Positioning (VFP). VFP considers the correlation of features and places similar features far away from each. Features are positioned in the vortex shape from the center of an image, and the number of attributes determines the image size. VFP shows better test performance than traditional ML techniques for tabular data and previous converting methods in five datasets: Iris, Wine, Dry Bean, Epileptic Seizure, and SECOM, which have differences in the number of attributes.
    CS-TGN: Community Search via Temporal Graph Neural Networks. (arXiv:2303.08964v1 [cs.SI])
    Searching for local communities is an important research challenge that allows for personalized community discovery and supports advanced data analysis in various complex networks, such as the World Wide Web, social networks, and brain networks. The evolution of these networks over time has motivated several recent studies to identify local communities in temporal networks. Given any query nodes, Community Search aims to find a densely connected subgraph containing query nodes. However, existing community search approaches in temporal networks have two main limitations: (1) they adopt pre-defined subgraph patterns to model communities, which cannot find communities that do not conform to these patterns in real-world networks, and (2) they only use the aggregation of disjoint structural information to measure quality, missing the dynamic of connections and temporal properties. In this paper, we propose a query-driven Temporal Graph Convolutional Network (CS-TGN) that can capture flexible community structures by learning from the ground-truth communities in a data-driven manner. CS-TGN first combines the local query-dependent structure and the global graph embedding in each snapshot of the network and then uses a GRU cell with contextual attention to learn the dynamics of interactions and update node embeddings over time. We demonstrate how this model can be used for interactive community search in an online setting, allowing users to evaluate the found communities and provide feedback. Experiments on real-world temporal graphs with ground-truth communities validate the superior quality of the solutions obtained and the efficiency of our model in both temporal and interactive static settings.
    Improving Automated Hemorrhage Detection in Sparse-view Computed Tomography via Deep Convolutional Neural Network based Artifact Reduction. (arXiv:2303.09340v1 [eess.IV])
    Intracranial hemorrhage poses a serious health problem requiring rapid and often intensive medical treatment. For diagnosis, a Cranial Computed Tomography (CCT) scan is usually performed. However, the increased health risk caused by radiation is a concern. The most important strategy to reduce this potential risk is to keep the radiation dose as low as possible and consistent with the diagnostic task. Sparse-view CT can be an effective strategy to reduce dose by reducing the total number of views acquired, albeit at the expense of image quality. In this work, we use a U-Net architecture to reduce artifacts from sparse-view CCTs, predicting fully sampled reconstructions from sparse-view ones. We evaluate the hemorrhage detectability in the predicted CCTs with a hemorrhage classification convolutional neural network, trained on fully sampled CCTs to detect and classify different sub-types of hemorrhages. Our results suggest that the automated classification and detection accuracy of hemorrhages in sparse-view CCTs can be improved substantially by the U-Net. This demonstrates the feasibility of rapid automated hemorrhage detection on low-dose CT data to assist radiologists in routine clinical practice.
    Learning Feasibility Constraints for Control Barrier Functions. (arXiv:2303.09403v1 [math.OC])
    It has been shown that optimizing quadratic costs while stabilizing affine control systems to desired (sets of) states subject to state and control constraints can be reduced to a sequence of Quadratic Programs (QPs) by using Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs). In this paper, we employ machine learning techniques to ensure the feasibility of these QPs, which is a challenging problem, especially for high relative degree constraints where High Order CBFs (HOCBFs) are required. To this end, we propose a sampling-based learning approach to learn a new feasibility constraint for CBFs; this constraint is then enforced by another HOCBF added to the QPs. The accuracy of the learned feasibility constraint is recursively improved by a recurrent training algorithm. We demonstrate the advantages of the proposed learning approach to constrained optimal control problems with specific focus on a robot control problem and on autonomous driving in an unknown environment.
    Challenges and Opportunities in Quantum Machine Learning. (arXiv:2303.09491v1 [quant-ph])
    At the intersection of machine learning and quantum computing, Quantum Machine Learning (QML) has the potential of accelerating data analysis, especially for quantum data, with applications for quantum materials, biochemistry, and high-energy physics. Nevertheless, challenges remain regarding the trainability of QML models. Here we review current methods and applications for QML. We highlight differences between quantum and classical machine learning, with a focus on quantum neural networks and quantum deep learning. Finally, we discuss opportunities for quantum advantage with QML.
    A Novel Autoencoders-LSTM Model for Stroke Outcome Prediction using Multimodal MRI Data. (arXiv:2303.09484v1 [cs.CV])
    Patient outcome prediction is critical in management of ischemic stroke. In this paper, a novel machine learning model is proposed for stroke outcome prediction using multimodal Magnetic Resonance Imaging (MRI). The proposed model consists of two serial levels of Autoencoders (AEs), where different AEs at level 1 are used for learning unimodal features from different MRI modalities and a AE at level 2 is used to combine the unimodal features into compressed multimodal features. The sequences of multimodal features of a given patient are then used by an LSTM network for predicting outcome score. The proposed AE2-LSTM model is proved to be an effective approach for better addressing the multimodality and volumetric nature of MRI data. Experimental results show that the proposed AE2-LSTM outperforms the existing state-of-the art models by achieving highest AUC=0.71 and lowest MAE=0.34.
    Feature Alignment by Uncertainty and Self-Training for Source-Free Unsupervised Domain Adaptation. (arXiv:2208.14888v2 [cs.CV] UPDATED)
    Most unsupervised domain adaptation (UDA) methods assume that labeled source images are available during model adaptation. However, this assumption is often infeasible owing to confidentiality issues or memory constraints on mobile devices. Some recently developed approaches do not require source images during adaptation, but they show limited performance on perturbed images. To address these problems, we propose a novel source-free UDA method that uses only a pre-trained source model and unlabeled target images. Our method captures the aleatoric uncertainty by incorporating data augmentation and trains the feature generator with two consistency objectives. The feature generator is encouraged to learn consistent visual features away from the decision boundaries of the head classifier. Thus, the adapted model becomes more robust to image perturbations. Inspired by self-supervised learning, our method promotes inter-space alignment between the prediction space and the feature space while incorporating intra-space consistency within the feature space to reduce the domain gap between the source and target domains. We also consider epistemic uncertainty to boost the model adaptation performance. Extensive experiments on popular UDA benchmark datasets demonstrate that the proposed source-free method is comparable or even superior to vanilla UDA methods. Moreover, the adapted models show more robust results when input images are perturbed.
    Stock Price Prediction Using Temporal Graph Model with Value Chain Data. (arXiv:2303.09406v1 [q-fin.ST])
    Stock price prediction is a crucial element in financial trading as it allows traders to make informed decisions about buying, selling, and holding stocks. Accurate predictions of future stock prices can help traders optimize their trading strategies and maximize their profits. In this paper, we introduce a neural network-based stock return prediction method, the Long Short-Term Memory Graph Convolutional Neural Network (LSTM-GCN) model, which combines the Graph Convolutional Network (GCN) and Long Short-Term Memory (LSTM) Cells. Specifically, the GCN is used to capture complex topological structures and spatial dependence from value chain data, while the LSTM captures temporal dependence and dynamic changes in stock returns data. We evaluated the LSTM-GCN model on two datasets consisting of constituents of Eurostoxx 600 and S&P 500. Our experiments demonstrate that the LSTM-GCN model can capture additional information from value chain data that are not fully reflected in price data, and the predictions outperform baseline models on both datasets.
    Learning Cross-lingual Visual Speech Representations. (arXiv:2303.09455v1 [cs.CL])
    Cross-lingual self-supervised learning has been a growing research topic in the last few years. However, current works only explored the use of audio signals to create representations. In this work, we study cross-lingual self-supervised visual representation learning. We use the recently-proposed Raw Audio-Visual Speech Encoders (RAVEn) framework to pre-train an audio-visual model with unlabelled multilingual data, and then fine-tune the visual model on labelled transcriptions. Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance; (2) multi-lingual outperforms English-only pre-training; (3) using languages which are more similar yields better results; and (4) fine-tuning on unseen languages is competitive to using the target language in the pre-training set. We hope our study inspires future research on non-English-only speech representation learning.
    Gradient flow on extensive-rank positive semi-definite matrix denoising. (arXiv:2303.09474v1 [stat.ML])
    In this work, we present a new approach to analyze the gradient flow for a positive semi-definite matrix denoising problem in an extensive-rank and high-dimensional regime. We use recent linear pencil techniques of random matrix theory to derive fixed point equations which track the complete time evolution of the matrix-mean-square-error of the problem. The predictions of the resulting fixed point equations are validated by numerical experiments. In this short note we briefly illustrate a few predictions of our formalism by way of examples, and in particular we uncover continuous phase transitions in the extensive-rank and high-dimensional regime, which connect to the classical phase transitions of the low-rank problem in the appropriate limit. The formalism has much wider applicability than shown in this communication.
    Adaptive Modeling of Uncertainties for Traffic Forecasting. (arXiv:2303.09273v1 [cs.LG])
    Deep neural networks (DNNs) have emerged as a dominant approach for developing traffic forecasting models. These models are typically trained to minimize error on averaged test cases and produce a single-point prediction, such as a scalar value for traffic speed or travel time. However, single-point predictions fail to account for prediction uncertainty that is critical for many transportation management scenarios, such as determining the best- or worst-case arrival time. We present QuanTraffic, a generic framework to enhance the capability of an arbitrary DNN model for uncertainty modeling. QuanTraffic requires little human involvement and does not change the base DNN architecture during deployment. Instead, it automatically learns a standard quantile function during the DNN model training to produce a prediction interval for the single-point prediction. The prediction interval defines a range where the true value of the traffic prediction is likely to fall. Furthermore, QuanTraffic develops an adaptive scheme that dynamically adjusts the prediction interval based on the location and prediction window of the test input. We evaluated QuanTraffic by applying it to five representative DNN models for traffic forecasting across seven public datasets. We then compared QuanTraffic against five uncertainty quantification methods. Compared to the baseline uncertainty modeling techniques, QuanTraffic with base DNN architectures delivers consistently better and more robust performance than the existing ones on the reported datasets.
    Enhanced detection of the presence and severity of COVID-19 from CT scans using lung segmentation. (arXiv:2303.09440v1 [eess.IV])
    Improving automated analysis of medical imaging will provide clinicians more options in providing care for patients. The 2023 AI-enabled Medical Image Analysis Workshop and Covid-19 Diagnosis Competition (AI-MIA-COV19D) provides an opportunity to test and refine machine learning methods for detecting the presence and severity of COVID-19 in patients from CT scans. This paper presents version 2 of Cov3d, a deep learning model submitted in the 2022 competition. The model has been improved through a preprocessing step which segments the lungs in the CT scan and crops the input to this region. It results in a validation macro F1 score for predicting the presence of COVID-19 in the CT scans at 92.2% which is significantly above the baseline of 74%. It gives a macro F1 score for predicting the severity of COVID-19 on the validation set for task 2 as 67% which is above the baseline of 38%.
    Image Classifiers Leak Sensitive Attributes About Their Classes. (arXiv:2303.09289v1 [cs.LG])
    Neural network-based image classifiers are powerful tools for computer vision tasks, but they inadvertently reveal sensitive attribute information about their classes, raising concerns about their privacy. To investigate this privacy leakage, we introduce the first Class Attribute Inference Attack (Caia), which leverages recent advances in text-to-image synthesis to infer sensitive attributes of individual classes in a black-box setting, while remaining competitive with related white-box attacks. Our extensive experiments in the face recognition domain show that Caia can accurately infer undisclosed sensitive attributes, such as an individual's hair color, gender and racial appearance, which are not part of the training labels. Interestingly, we demonstrate that adversarial robust models are even more vulnerable to such privacy leakage than standard models, indicating that a trade-off between robustness and privacy exists.
    The Scope of In-Context Learning for the Extraction of Medical Temporal Constraints. (arXiv:2303.09366v1 [cs.CL])
    Medications often impose temporal constraints on everyday patient activity. Violations of such medical temporal constraints (MTCs) lead to a lack of treatment adherence, in addition to poor health outcomes and increased healthcare expenses. These MTCs are found in drug usage guidelines (DUGs) in both patient education materials and clinical texts. Computationally representing MTCs in DUGs will advance patient-centric healthcare applications by helping to define safe patient activity patterns. We define a novel taxonomy of MTCs found in DUGs and develop a novel context-free grammar (CFG) based model to computationally represent MTCs from unstructured DUGs. Additionally, we release three new datasets with a combined total of N = 836 DUGs labeled with normalized MTCs. We develop an in-context learning (ICL) solution for automatically extracting and normalizing MTCs found in DUGs, achieving an average F1 score of 0.62 across all datasets. Finally, we rigorously investigate ICL model performance against a baseline model, across datasets and MTC types, and through in-depth error analysis.
    Maximum Margin Learning of t-SPNs for Cell Classification with Filtering. (arXiv:2303.09065v1 [cs.LG])
    An algorithm based on a deep probabilistic architecture referred to as a tree-structured sum-product network (t-SPN) is considered for cell classification. The t-SPN is constructed such that the unnormalized probability is represented as conditional probabilities of a subset of most similar cell classes. The constructed t-SPN architecture is learned by maximizing the margin, which is the difference in the conditional probability between the true and the most competitive false label. To enhance the generalization ability of the architecture, L2-regularization (REG) is considered along with the maximum margin (MM) criterion in the learning process. To highlight cell features, this paper investigates the effectiveness of two generic high-pass filters: ideal high-pass filtering and the Laplacian of Gaussian (LOG) filtering. On both HEp-2 and Feulgen benchmark datasets, the t-SPN architecture learned based on the max-margin criterion with regularization produced the highest accuracy rate compared to other state-of-the-art algorithms that include convolutional neural network (CNN) based algorithms. The ideal high-pass filter was more effective on the HEp-2 dataset, which is based on immunofluorescence staining, while the LOG was more effective on the Feulgen dataset, which is based on Feulgen staining.
    Learning Logic Specifications for Soft Policy Guidance in POMCP. (arXiv:2303.09172v1 [cs.AI])
    Partially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs). It allows scaling to large state spaces by computing an approximation of the optimal policy locally and online, using a Monte Carlo Tree Search based strategy. However, POMCP suffers from sparse reward function, namely, rewards achieved only when the final goal is reached, particularly in environments with large state spaces and long horizons. Recently, logic specifications have been integrated into POMCP to guide exploration and to satisfy safety requirements. However, such policy-related rules require manual definition by domain experts, especially in real-world scenarios. In this paper, we use inductive logic programming to learn logic specifications from traces of POMCP executions, i.e., sets of belief-action pairs generated by the planner. Specifically, we learn rules expressed in the paradigm of answer set programming. We then integrate them inside POMCP to provide soft policy bias toward promising actions. In the context of two benchmark scenarios, rocksample and battery, we show that the integration of learned rules from small task instances can improve performance with fewer Monte Carlo simulations and in larger task instances. We make our modified version of POMCP publicly available at https://github.com/GiuMaz/pomcp_clingo.git.
    Embedding Theory of Reservoir Computing and Reducing Reservoir Network Using Time Delays. (arXiv:2303.09042v1 [cs.LG])
    Reservoir computing (RC), a particular form of recurrent neural network, is under explosive development due to its exceptional efficacy and high performance in reconstruction or/and prediction of complex physical systems. However, the mechanism triggering such effective applications of RC is still unclear, awaiting deep and systematic exploration. Here, combining the delayed embedding theory with the generalized embedding theory, we rigorously prove that RC is essentially a high dimensional embedding of the original input nonlinear dynamical system. Thus, using this embedding property, we unify into a universal framework the standard RC and the time-delayed RC where we novelly introduce time delays only into the network's output layer, and we further find a trade-off relation between the time delays and the number of neurons in RC. Based on this finding, we significantly reduce the network size of RC for reconstructing and predicting some representative physical systems, and, more surprisingly, only using a single neuron reservoir with time delays is sometimes sufficient for achieving those tasks.
    High-Dimensional Penalized Bernstein Support Vector Machines. (arXiv:2303.09066v1 [stat.ML])
    The support vector machines (SVM) is a powerful classifier used for binary classification to improve the prediction accuracy. However, the non-differentiability of the SVM hinge loss function can lead to computational difficulties in high dimensional settings. To overcome this problem, we rely on Bernstein polynomial and propose a new smoothed version of the SVM hinge loss called the Bernstein support vector machine (BernSVM), which is suitable for the high dimension $p >> n$ regime. As the BernSVM objective loss function is of the class $C^2$, we propose two efficient algorithms for computing the solution of the penalized BernSVM. The first algorithm is based on coordinate descent with maximization-majorization (MM) principle and the second one is IRLS-type algorithm (iterative re-weighted least squares). Under standard assumptions, we derive a cone condition and a restricted strong convexity to establish an upper bound for the weighted Lasso BernSVM estimator. Using a local linear approximation, we extend the latter result to penalized BernSVM with non convex penalties SCAD and MCP. Our bound holds with high probability and achieves a rate of order $\sqrt{s\log(p)/n}$, where $s$ is the number of active features. Simulation studies are considered to illustrate the prediction accuracy of BernSVM to its competitors and also to compare the performance of the two algorithms in terms of computational timing and error estimation. The use of the proposed method is illustrated through analysis of three large-scale real data examples.
    On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits. (arXiv:2303.09390v1 [cs.LG])
    We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class up to a bounded misspecification level $\zeta>0$. We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. We show that, when the misspecification level $\zeta$ is dominated by $\tilde O (\Delta / \sqrt{d})$ with $\Delta$ being the minimal sub-optimality gap and $d$ being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound $\tilde O (d^2/\Delta)$ as in the well-specified setting up to logarithmic factors. In addition, we show that an existing algorithm SupLinUCB (Chu et al., 2011) can also achieve a gap-dependent constant regret bound without the knowledge of sub-optimality gap $\Delta$. Together with a lower bound adapted from Lattimore et al. (2020), our result suggests an interplay between misspecification level and the sub-optimality gap: (1) the linear contextual bandit model is efficiently learnable when $\zeta \leq \tilde O(\Delta / \sqrt{d})$; and (2) it is not efficiently learnable when $\zeta \geq \tilde \Omega({\Delta} / {\sqrt{d}})$. Experiments on both synthetic and real-world datasets corroborate our theoretical results.
    Bayesian Generalization Error in Linear Neural Networks with Concept Bottleneck Structure and Multitask Formulation. (arXiv:2303.09154v1 [stat.ML])
    Concept bottleneck model (CBM) is a ubiquitous method that can interpret neural networks using concepts. In CBM, concepts are inserted between the output layer and the last intermediate layer as observable values. This helps in understanding the reason behind the outputs generated by the neural networks: the weights corresponding to the concepts from the last hidden layer to the output layer. However, it has not yet been possible to understand the behavior of the generalization error in CBM since a neural network is a singular statistical model in general. When the model is singular, a one to one map from the parameters to probability distributions cannot be created. This non-identifiability makes it difficult to analyze the generalization performance. In this study, we mathematically clarify the Bayesian generalization error and free energy of CBM when its architecture is three-layered linear neural networks. We also consider a multitask problem where the neural network outputs not only the original output but also the concepts. The results show that CBM drastically changes the behavior of the parameter region and the Bayesian generalization error in three-layered linear neural networks as compared with the standard version, whereas the multitask formulation does not.
    ExoplANNET: A deep learning algorithm to detect and identify planetary signals in radial velocity data. (arXiv:2303.09335v1 [astro-ph.EP])
    The detection of exoplanets with the radial velocity method consists in detecting variations of the stellar velocity caused by an unseen sub-stellar companion. Instrumental errors, irregular time sampling, and different noise sources originating in the intrinsic variability of the star can hinder the interpretation of the data, and even lead to spurious detections. In recent times, work began to emerge in the field of extrasolar planets that use Machine Learning algorithms, some with results that exceed those obtained with the traditional techniques in the field. We seek to explore the scope of the neural networks in the radial velocity method, in particular for exoplanet detection in the presence of correlated noise of stellar origin. In this work, a neural network is proposed to replace the computation of the significance of the signal detected with the radial velocity method and to classify it as of planetary origin or not. The algorithm is trained using synthetic data of systems with and without planetary companions. We injected realistic correlated noise in the simulations, based on previous studies of the behaviour of stellar activity. The performance of the network is compared to the traditional method based on null hypothesis significance testing. The network achieves 28 % fewer false positives. The improvement is observed mainly in the detection of small-amplitude signals associated with low-mass planets. In addition, its execution time is five orders of magnitude faster than the traditional method. The superior performance exhibited by the algorithm has only been tested on simulated radial velocity data so far. Although in principle it should be straightforward to adapt it for use in real time series, its performance has to be tested thoroughly. Future work should permit evaluating its potential for adoption as a valuable tool for exoplanet detection.
    The Tiny Time-series Transformer: Low-latency High-throughput Classification of Astronomical Transients using Deep Model Compression. (arXiv:2303.08951v1 [astro-ph.IM])
    A new golden age in astronomy is upon us, dominated by data. Large astronomical surveys are broadcasting unprecedented rates of information, demanding machine learning as a critical component in modern scientific pipelines to handle the deluge of data. The upcoming Legacy Survey of Space and Time (LSST) of the Vera C. Rubin Observatory will raise the big-data bar for time-domain astronomy, with an expected 10 million alerts per-night, and generating many petabytes of data over the lifetime of the survey. Fast and efficient classification algorithms that can operate in real-time, yet robustly and accurately, are needed for time-critical events where additional resources can be sought for follow-up analyses. In order to handle such data, state-of-the-art deep learning architectures coupled with tools that leverage modern hardware accelerators are essential. We showcase how the use of modern deep compression methods can achieve a $18\times$ reduction in model size, whilst preserving classification performance. We also show that in addition to the deep compression techniques, careful choice of file formats can improve inference latency, and thereby throughput of alerts, on the order of $8\times$ for local processing, and $5\times$ in a live production setting. To test this in a live setting, we deploy this optimised version of the original time-series transformer, t2, into the community alert broking system of FINK on real Zwicky Transient Facility (ZTF) alert data, and compare throughput performance with other science modules that exist in FINK. The results shown herein emphasise the time-series transformer's suitability for real-time classification at LSST scale, and beyond, and introduce deep model compression as a fundamental tool for improving deploy-ability and scalable inference of deep learning models for transient classification.
    Copyright Protection and Accountability of Generative AI:Attack, Watermarking and Attribution. (arXiv:2303.09272v1 [cs.LG])
    Generative AI (e.g., Generative Adversarial Networks - GANs) has become increasingly popular in recent years. However, Generative AI introduces significant concerns regarding the protection of Intellectual Property Rights (IPR) (resp. model accountability) pertaining to images (resp. toxic images) and models (resp. poisoned models) generated. In this paper, we propose an evaluation framework to provide a comprehensive overview of the current state of the copyright protection measures for GANs, evaluate their performance across a diverse range of GAN architectures, and identify the factors that affect their performance and future research directions. Our findings indicate that the current IPR protection methods for input images, model watermarking, and attribution networks are largely satisfactory for a wide range of GANs. We highlight that further attention must be directed towards protecting training sets, as the current approaches fail to provide robust IPR protection and provenance tracing on training sets.
    Multi-modal Differentiable Unsupervised Feature Selection. (arXiv:2303.09381v1 [cs.LG])
    Multi-modal high throughput biological data presents a great scientific opportunity and a significant computational challenge. In multi-modal measurements, every sample is observed simultaneously by two or more sets of sensors. In such settings, many observed variables in both modalities are often nuisance and do not carry information about the phenomenon of interest. Here, we propose a multi-modal unsupervised feature selection framework: identifying informative variables based on coupled high-dimensional measurements. Our method is designed to identify features associated with two types of latent low-dimensional structures: (i) shared structures that govern the observations in both modalities and (ii) differential structures that appear in only one modality. To that end, we propose two Laplacian-based scoring operators. We incorporate the scores with differentiable gates that mask nuisance features and enhance the accuracy of the structure captured by the graph Laplacian. The performance of the new scheme is illustrated using synthetic and real datasets, including an extended biological application to single-cell multi-omics.
    Comparing Feature Importance and Rule Extraction for Interpretability on Text Data. (arXiv:2207.01420v1 [cs.LG] CROSS LISTED)
    Complex machine learning algorithms are used more and more often in critical tasks involving text data, leading to the development of interpretability methods. Among local methods, two families have emerged: those computing importance scores for each feature and those extracting simple logical rules. In this paper we show that using different methods can lead to unexpectedly different explanations, even when applied to simple models for which we would expect qualitative coincidence. To quantify this effect, we propose a new approach to compare explanations produced by different methods.
    Visual Prompting for Adversarial Robustness. (arXiv:2210.06284v2 [cs.CV] UPDATED)
    In this work, we leverage visual prompting (VP) to improve adversarial robustness of a fixed, pre-trained model at testing time. Compared to conventional adversarial defenses, VP allows us to design universal (i.e., data-agnostic) input prompting templates, which have plug-and-play capabilities at testing time to achieve desired model performance without introducing much computation overhead. Although VP has been successfully applied to improving model generalization, it remains elusive whether and how it can be used to defend against adversarial attacks. We investigate this problem and show that the vanilla VP approach is not effective in adversarial defense since a universal input prompt lacks the capacity for robust learning against sample-specific adversarial perturbations. To circumvent it, we propose a new VP method, termed Class-wise Adversarial Visual Prompting (C-AVP), to generate class-wise visual prompts so as to not only leverage the strengths of ensemble prompts but also optimize their interrelations to improve model robustness. Our experiments show that C-AVP outperforms the conventional VP method, with 2.1X standard accuracy gain and 2X robust accuracy gain. Compared to classical test-time defenses, C-AVP also yields a 42X inference time speedup.
    Model Based Explanations of Concept Drift. (arXiv:2303.09331v1 [cs.LG])
    The notion of concept drift refers to the phenomenon that the distribution generating the observed data changes over time. If drift is present, machine learning models can become inaccurate and need adjustment. While there do exist methods to detect concept drift or to adjust models in the presence of observed drift, the question of explaining drift, i.e., describing the potentially complex and high dimensional change of distribution in a human-understandable fashion, has hardly been considered so far. This problem is of importance since it enables an inspection of the most prominent characteristics of how and where drift manifests itself. Hence, it enables human understanding of the change and it increases acceptance of life-long learning models. In this paper, we present a novel technology characterizing concept drift in terms of the characteristic change of spatial features based on various explanation techniques. To do so, we propose a methodology to reduce the explanation of concept drift to an explanation of models that are trained in a suitable way extracting relevant information regarding the drift. This way a large variety of explanation schemes is available. Thus, a suitable method can be selected for the problem of drift explanation at hand. We outline the potential of this approach and demonstrate its usefulness in several examples.
    Patch-Token Aligned Bayesian Prompt Learning for Vision-Language Models. (arXiv:2303.09100v1 [cs.CV])
    For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tuning as a point estimation problem, may fail to describe diverse characteristics of categories and limit their applications. We introduce a Bayesian probabilistic resolution to prompt learning, where the label-specific stochastic prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model. Importantly, we semantically regularize prompt learning with the visual knowledge and view images and the corresponding prompts as patch and token sets under optimal transport, which pushes the prompt tokens to faithfully capture the label-specific visual concepts, instead of overfitting the training categories. Moreover, the proposed model can also be straightforwardly extended to the conditional case where the instance-conditional prompts are generated to improve the generalizability. Extensive experiments on 15 datasets show promising transferability and generalization performance of our proposed model.
    Robust Evaluation of Diffusion-Based Adversarial Purification. (arXiv:2303.09051v1 [cs.CV])
    We question the current evaluation practice on diffusion-based purification methods. Diffusion-based purification methods aim to remove adversarial effects from an input data point at test time. The approach gains increasing attention as an alternative to adversarial training due to the disentangling between training and testing. Well-known white-box attacks are often employed to measure the robustness of the purification. However, it is unknown whether these attacks are the most effective for the diffusion-based purification since the attacks are often tailored for adversarial training. We analyze the current practices and provide a new guideline for measuring the robustness of purification methods against adversarial attacks. Based on our analysis, we further propose a new purification strategy showing competitive results against the state-of-the-art adversarial training approaches.
    Block-wise Bit-Compression of Transformer-based Models. (arXiv:2303.09184v1 [cs.CL])
    With the popularity of the recent Transformer-based models represented by BERT, GPT-3 and ChatGPT, there has been state-of-the-art performance in a range of natural language processing tasks. However, the massive computations, huge memory footprint, and thus high latency of Transformer-based models is an inevitable challenge for the cloud with high real-time requirement. To tackle the issue, we propose BBCT, a method of block-wise bit-compression for transformer without retraining. Our method achieves more fine-grained compression of the whole transformer, including embedding, matrix multiplication, GELU, softmax, layer normalization, and all the intermediate results. As a case, we compress an efficient BERT with the method of BBCT. Our benchmark test results on General Language Understanding Evaluation (GLUE) show that BBCT can achieve less than 1% accuracy drop in most tasks.
    Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and Reducing Overfitting. (arXiv:2303.08986v1 [cs.LG])
    In this work, we present some applications of random matrix theory for the training of deep neural networks. Recently, random matrix theory (RMT) has been applied to the overfitting problem in deep learning. Specifically, it has been shown that the spectrum of the weight layers of a deep neural network (DNN) can be studied and understood using techniques from RMT. In this work, these RMT techniques will be used to determine which and how many singular values should be removed from the weight layers of a DNN during training, via singular value decomposition (SVD), so as to reduce overfitting and increase accuracy. We show the results on a simple DNN model trained on MNIST. In general, these techniques may be applied to any fully connected layer of a pretrained DNN to reduce the number of parameters in the layer while preserving and sometimes increasing the accuracy of the DNN.
    Evaluation of distance-based approaches for forensic comparison: Application to hand odor evidence. (arXiv:2303.09126v1 [cs.LG])
    The issue of distinguishing between the same-source and different-source hypotheses based on various types of traces is a generic problem in forensic science. This problem is often tackled with Bayesian approaches, which are able to provide a likelihood ratio that quantifies the relative strengths of evidence supporting each of the two competing hypotheses. Here, we focus on distance-based approaches, whose robustness and specifically whose capacity to deal with high-dimensional evidence are very different, and need to be evaluated and optimized. A unified framework for direct methods based on estimating the likelihoods of the distance between traces under each of the two competing hypotheses, and indirect methods using logistic regression to discriminate between same-source and different-source distance distributions, is presented. Whilst direct methods are more flexible, indirect methods are more robust and quite natural in machine learning. Moreover, indirect methods also enable the use of a vectorial distance, thus preventing the severe information loss suffered by scalar distance approaches.Direct and indirect methods are compared in terms of sensitivity, specificity and robustness, with and without dimensionality reduction, with and without feature selection, on the example of hand odor profiles, a novel and challenging type of evidence in the field of forensics. Empirical evaluations on a large panel of 534 subjects and their 1690 odor traces show the significant superiority of the indirect methods, especially without dimensionality reduction, be it with or without feature selection.
    Gate Recurrent Unit Network based on Hilbert-Schmidt Independence Criterion for State-of-Health Estimation. (arXiv:2303.09497v1 [cs.LG])
    State-of-health (SOH) estimation is a key step in ensuring the safe and reliable operation of batteries. Due to issues such as varying data distribution and sequence length in different cycles, most existing methods require health feature extraction technique, which can be time-consuming and labor-intensive. GRU can well solve this problem due to the simple structure and superior performance, receiving widespread attentions. However, redundant information still exists within the network and impacts the accuracy of SOH estimation. To address this issue, a new GRU network based on Hilbert-Schmidt Independence Criterion (GRU-HSIC) is proposed. First, a zero masking network is used to transform all battery data measured with varying lengths every cycle into sequences of the same length, while still retaining information about the original data size in each cycle. Second, the Hilbert-Schmidt Independence Criterion (HSIC) bottleneck, which evolved from Information Bottleneck (IB) theory, is extended to GRU to compress the information from hidden layers. To evaluate the proposed method, we conducted experiments on datasets from the Center for Advanced Life Cycle Engineering (CALCE) of the University of Maryland and NASA Ames Prognostics Center of Excellence. Experimental results demonstrate that our model achieves higher accuracy than other recurrent models.
    {\mu}Split: efficient image decomposition for microscopy data. (arXiv:2211.12872v2 [cs.CV] UPDATED)
    We present uSplit, a dedicated approach for trained image decomposition in the context of fluorescence microscopy images. We find that best results using regular deep architectures are achieved when large image patches are used during training, making memory consumption the limiting factor to further improving performance. We therefore introduce lateral contextualization (LC), a memory efficient way to train powerful networks and show that LC leads to consistent and significant improvements on the task at hand. We integrate LC with U-Nets, Hierarchical AEs, and Hierarchical VAEs, for which we formulate a modified ELBO loss. Additionally, LC enables training deeper hierarchical models than otherwise possible and, interestingly, helps to reduce tiling artefacts that are inherently impossible to avoid when using tiled VAE predictions. We apply uSplit to five decomposition tasks, one on a synthetic dataset, four others derived from real microscopy data. LC achieves SOTA results (average improvements to the best baseline of 2.36 dB PSNR), while simultaneously requiring considerably less GPU memory.
    Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling. (arXiv:2303.09033v1 [cs.LG])
    Most bandit algorithms assume that the reward variance or its upper bound is known. While variance overestimation is usually safe and sound, it increases regret. On the other hand, an underestimated variance may lead to linear regret due to committing early to a suboptimal arm. This motivated prior works on variance-aware frequentist algorithms. We lay foundations for the Bayesian setting. In particular, we study multi-armed bandits with known and \emph{unknown heterogeneous reward variances}, and develop Thompson sampling algorithms for both and bound their Bayes regret. Our regret bounds decrease with lower reward variances, which make learning easier. The bound for unknown reward variances captures the effect of the prior on learning reward variances and is the first of its kind. Our experiments show the superiority of variance-aware Bayesian algorithms and also highlight their robustness.
    Active Semi-Supervised Learning by Exploring Per-Sample Uncertainty and Consistency. (arXiv:2303.08978v1 [cs.CV])
    Active Learning (AL) and Semi-supervised Learning are two techniques that have been studied to reduce the high cost of deep learning by using a small amount of labeled data and a large amount of unlabeled data. To improve the accuracy of models at a lower cost, we propose a method called Active Semi-supervised Learning (ASSL), which combines AL and SSL. To maximize the synergy between AL and SSL, we focused on the differences between ASSL and AL. ASSL involves more dynamic model updates than AL due to the use of unlabeled data in the training process, resulting in the temporal instability of the predicted probabilities of the unlabeled data. This makes it difficult to determine the true uncertainty of the unlabeled data in ASSL. To address this, we adopted techniques such as exponential moving average (EMA) and upper confidence bound (UCB) used in reinforcement learning. Additionally, we analyzed the effect of label noise on unsupervised learning by using weak and strong augmentation pairs to address datainconsistency. By considering both uncertainty and datainconsistency, we acquired data samples that were used in the proposed ASSL method. Our experiments showed that ASSL achieved about 5.3 times higher computational efficiency than SSL while achieving the same performance, and it outperformed the state-of-the-art AL method.
    Enhancing Data Space Semantic Interoperability through Machine Learning: a Visionary Perspective. (arXiv:2303.08932v1 [cs.DB])
    Our vision paper outlines a plan to improve the future of semantic interoperability in data spaces through the application of machine learning. The use of data spaces, where data is exchanged among members in a self-regulated environment, is becoming increasingly popular. However, the current manual practices of managing metadata and vocabularies in these spaces are time-consuming, prone to errors, and may not meet the needs of all stakeholders. By leveraging the power of machine learning, we believe that semantic interoperability in data spaces can be significantly improved. This involves automatically generating and updating metadata, which results in a more flexible vocabulary that can accommodate the diverse terminologies used by different sub-communities. Our vision for the future of data spaces addresses the limitations of conventional data exchange and makes data more accessible and valuable for all members of the community.
    Predicting nonlinear reshaping of periodic signals in optical fibre with a neural network. (arXiv:2303.09133v1 [physics.optics])
    We deploy a supervised machine-learning model based on a neural network to predict the temporal and spectral reshaping of a simple sinusoidal modulation into a pulse train having a comb structure in the frequency domain, which occurs upon nonlinear propagation in an optical fibre. Both normal and anomalous second-order dispersion regimes of the fibre are studied, and the speed of the neural network is leveraged to probe the space of input parameters for the generation of custom combs or the occurrence of significant temporal or spectral focusing.
    Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP Platforms. (arXiv:2303.09048v1 [cs.SD])
    In this paper, we present a method for fine-tuning models trained on the Deep Noise Suppression (DNS) 2020 Challenge to improve their performance on Voice over Internet Protocol (VoIP) applications. Our approach involves adapting the DNS 2020 models to the specific acoustic characteristics of VoIP communications, which includes distortion and artifacts caused by compression, transmission, and platform-specific processing. To this end, we propose a multi-task learning framework for VoIP-DNS that jointly optimizes noise suppression and VoIP-specific acoustics for speech enhancement. We evaluate our approach on a diverse VoIP scenarios and show that it outperforms both industry performance and state-of-the-art methods for speech enhancement on VoIP applications. Our results demonstrate the potential of models trained on DNS-2020 to be improved and tailored to different VoIP platforms using VoIP-DNS, whose findings have important applications in areas such as speech recognition, voice assistants, and telecommunication.
    Identifiability Results for Multimodal Contrastive Learning. (arXiv:2303.09166v1 [cs.LG])
    Contrastive learning is a cornerstone underlying recent progress in multi-view and multimodal learning, e.g., in representation learning with image/caption pairs. While its effectiveness is not yet fully understood, a line of recent work reveals that contrastive learning can invert the data generating process and recover ground truth latent factors shared between views. In this work, we present new identifiability results for multimodal contrastive learning, showing that it is possible to recover shared factors in a more general setup than the multi-view setting studied previously. Specifically, we distinguish between the multi-view setting with one generative mechanism (e.g., multiple cameras of the same type) and the multimodal setting that is characterized by distinct mechanisms (e.g., cameras and microphones). Our work generalizes previous identifiability results by redefining the generative process in terms of distinct mechanisms with modality-specific latent variables. We prove that contrastive learning can block-identify latent factors shared between modalities, even when there are nontrivial dependencies between factors. We empirically verify our identifiability results with numerical simulations and corroborate our findings on a complex multimodal dataset of image/text pairs. Zooming out, our work provides a theoretical basis for multimodal representation learning and explains in which settings multimodal contrastive learning can be effective in practice.
    Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning. (arXiv:2303.09032v1 [cs.LG])
    Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL). In this paper, we propose an exploration method that efficiently encourages cooperative exploration based on the idea of the theoretically justified tree search algorithm UCT (Upper Confidence bounds applied to Trees). The high-level intuition is that to perform optimism-based exploration, agents would achieve cooperative strategies if each agent's optimism estimate captures a structured dependency relationship with other agents. At each node (i.e., action) of the search tree, UCT performs optimism-based exploration using a bonus derived by conditioning on the visitation count of its parent node. We provide a perspective to view MARL as tree search iterations and develop a method called Conditionally Optimistic Exploration (COE). We assume agents take actions following a sequential order, and consider nodes at the same depth of the search tree as actions of one individual agent. COE computes each agent's state-action value estimate with an optimistic bonus derived from the visitation count of the state and joint actions taken by agents up to the current agent. COE is adaptable to any value decomposition method for centralized training with decentralized execution. Experiments across various cooperative MARL benchmarks show that COE outperforms current state-of-the-art exploration methods on hard-exploration tasks.
    A Picture is Worth a Thousand Words: Language Models Plan from Pixels. (arXiv:2303.09031v1 [cs.CL])
    Planning is an important capability of artificial agents that perform long-horizon tasks in real-world environments. In this work, we explore the use of pre-trained language models (PLMs) to reason about plan sequences from text instructions in embodied visual environments. Prior PLM based approaches for planning either assume observations are available in the form of text (e.g., provided by a captioning model), reason about plans from the instruction alone, or incorporate information about the visual environment in limited ways (such as a pre-trained affordance function). In contrast, we show that PLMs can accurately plan even when observations are directly encoded as input prompts for the PLM. We show that this simple approach outperforms prior approaches in experiments on the ALFWorld and VirtualHome benchmarks.
    A Multimodal Data-driven Framework for Anxiety Screening. (arXiv:2303.09041v1 [cs.LG])
    Early screening for anxiety and appropriate interventions are essential to reduce the incidence of self-harm and suicide in patients. Due to limited medical resources, traditional methods that overly rely on physician expertise and specialized equipment cannot simultaneously meet the needs for high accuracy and model interpretability. Multimodal data can provide more objective evidence for anxiety screening to improve the accuracy of models. The large amount of noise in multimodal data and the unbalanced nature of the data make the model prone to overfitting. However, it is a non-differentiable problem when high-dimensional and multimodal feature combinations are used as model inputs and incorporated into model training. This causes existing anxiety screening methods based on machine learning and deep learning to be inapplicable. Therefore, we propose a multimodal data-driven anxiety screening framework, namely MMD-AS, and conduct experiments on the collected health data of over 200 seafarers by smartphones. The proposed framework's feature extraction, dimension reduction, feature selection, and anxiety inference are jointly trained to improve the model's performance. In the feature selection step, a feature selection method based on the Improved Fireworks Algorithm is used to solve the non-differentiable problem of feature combination to remove redundant features and search for the ideal feature subset. The experimental results show that our framework outperforms the comparison methods.
    Dataflow graphs as complete causal graphs. (arXiv:2303.09552v1 [cs.SE])
    Component-based development is one of the core principles behind modern software engineering practices. Understanding of causal relationships between components of a software system can yield significant benefits to developers. Yet modern software design approaches make it difficult to track and discover such relationships at system scale, which leads to growing intellectual debt. In this paper we consider an alternative approach to software design, flow-based programming (FBP), and draw the attention of the community to the connection between dataflow graphs produced by FBP and structural causal models. With expository examples we show how this connection can be leveraged to improve day-to-day tasks in software projects, including fault localisation, business analysis and experimentation.
    SigVIC: Spatial Importance Guided Variable-Rate Image Compression. (arXiv:2303.09112v1 [eess.IV])
    Variable-rate mechanism has improved the flexibility and efficiency of learning-based image compression that trains multiple models for different rate-distortion tradeoffs. One of the most common approaches for variable-rate is to channel-wisely or spatial-uniformly scale the internal features. However, the diversity of spatial importance is instructive for bit allocation of image compression. In this paper, we introduce a Spatial Importance Guided Variable-rate Image Compression (SigVIC), in which a spatial gating unit (SGU) is designed for adaptively learning a spatial importance mask. Then, a spatial scaling network (SSN) takes the spatial importance mask to guide the feature scaling and bit allocation for variable-rate. Moreover, to improve the quality of decoded image, Top-K shallow features are selected to refine the decoded features through a shallow feature fusion module (SFFM). Experiments show that our method outperforms other learning-based methods (whether variable-rate or not) and traditional codecs, with storage saving and high flexibility.
    Learning Rewards to Optimize Global Performance Metrics in Deep Reinforcement Learning. (arXiv:2303.09027v1 [cs.LG])
    When applying reinforcement learning (RL) to a new problem, reward engineering is a necessary, but often difficult and error-prone task a system designer has to face. To avoid this step, we propose LR4GPM, a novel (deep) RL method that can optimize a global performance metric, which is supposed to be available as part of the problem description. LR4GPM alternates between two phases: (1) learning a (possibly vector) reward function used to fit the performance metric, and (2) training a policy to optimize an approximation of this performance metric based on the learned rewards. Such RL training is not straightforward since both the reward function and the policy are trained using non-stationary data. To overcome this issue, we propose several training tricks. We demonstrate the efficiency of LR4GPM on several domains. Notably, LR4GPM outperforms the winner of a recent autonomous driving competition organized at DAI'2020.
    Preoperative Prognosis Assessment of Lumbar Spinal Surgery for Low Back Pain and Sciatica Patients based on Multimodalities and Multimodal Learning. (arXiv:2303.09085v1 [cs.LG])
    Low back pain (LBP) and sciatica may require surgical therapy when they are symptomatic of severe pain. However, there is no effective measures to evaluate the surgical outcomes in advance. This work combined elements of Eastern medicine and machine learning, and developed a preoperative assessment tool to predict the prognosis of lumbar spinal surgery in LBP and sciatica patients. Standard operative assessments, traditional Chinese medicine body constitution assessments, planned surgical approach, and vowel pronunciation recordings were collected and stored in different modalities. Our work provides insights into leveraging modality combinations, multimodals, and fusion strategies. The interpretability of models and correlations between modalities were also inspected. Based on the recruited 105 patients, we found that combining standard operative assessments, body constitution assessments, and planned surgical approach achieved the best performance in 0.81 accuracy. Our approach is effective and can be widely applied in general practice due to simplicity and effective.
    Agglomeration of Polygonal Grids using Graph Neural Networks with applications to Multigrid solvers. (arXiv:2210.17457v2 [math.NA] UPDATED)
    Agglomeration-based strategies are important both within adaptive refinement algorithms and to construct scalable multilevel algebraic solvers. In order to automatically perform agglomeration of polygonal grids, we propose the use of Machine Learning (ML) strategies, that can naturally exploit geometrical information about the mesh in order to preserve the grid quality, enhancing performance of numerical methods and reducing the overall computational cost. In particular, we employ the k-means clustering algorithm and Graph Neural Networks (GNNs) to partition the connectivity graph of a computational mesh. Moreover, GNNs have high online inference speed and the advantage to process naturally and simultaneously both the graph structure of mesh and the geometrical information, such as the areas of the elements or their barycentric coordinates. These techniques are compared with METIS, a standard algorithm for graph partitioning, which is meant to process only the graph information of the mesh. We demonstrate that performance in terms of quality metrics is enhanced for ML strategies. Such models also show a good degree of generalization when applied to more complex geometries, such as brain MRI scans, and the capability of preserving the quality of the grid. The effectiveness of these strategies is demonstrated also when applied to MultiGrid (MG) solvers in a Polygonal Discontinuous Galerkin (PolyDG) framework. In the considered experiments, GNNs show overall the best performance in terms of inference speed, accuracy and flexibility of the approach.
    Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDP. (arXiv:2303.09528v1 [cs.LG])
    Continuous-time Markov decision processes (CTMDPs) are canonical models to express sequential decision-making under dense-time and stochastic environments. When the stochastic evolution of the environment is only available via sampling, model-free reinforcement learning (RL) is the algorithm-of-choice to compute optimal decision sequence. RL, on the other hand, requires the learning objective to be encoded as scalar reward signals. Since doing such translations manually is both tedious and error-prone, a number of techniques have been proposed to translate high-level objectives (expressed in logic or automata formalism) to scalar rewards for discrete-time Markov decision processes (MDPs). Unfortunately, no automatic translation exists for CTMDPs. We consider CTMDP environments against the learning objectives expressed as omega-regular languages. Omega-regular languages generalize regular languages to infinite-horizon specifications and can express properties given in popular linear-time logic LTL. To accommodate the dense-time nature of CTMDPs, we consider two different semantics of omega-regular objectives: 1) satisfaction semantics where the goal of the learner is to maximize the probability of spending positive time in the good states, and 2) expectation semantics where the goal of the learner is to optimize the long-run expected average time spent in the ``good states" of the automaton. We present an approach enabling correct translation to scalar reward signals that can be readily used by off-the-shelf RL algorithms for CTMDPs. We demonstrate the effectiveness of the proposed algorithms by evaluating it on some popular CTMDP benchmarks with omega-regular objectives.
    SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference. (arXiv:2303.09266v1 [cs.CL])
    Dynamic early exiting has been proven to improve the inference speed of the pre-trained language model like BERT. However, all samples must go through all consecutive layers before early exiting and more complex samples usually go through more layers, which still exists redundant computation. In this paper, we propose a novel dynamic early exiting combined with layer skipping for BERT inference named SmartBERT, which adds a skipping gate and an exiting operator into each layer of BERT. SmartBERT can adaptively skip some layers and adaptively choose whether to exit. Besides, we propose cross-layer contrastive learning and combine it into our training phases to boost the intermediate layers and classifiers which would be beneficial for early exiting. To keep the consistent usage of skipping gates between training and inference phases, we propose a hard weight mechanism during training phase. We conduct experiments on eight classification datasets of the GLUE benchmark. Experimental results show that SmartBERT achieves 2-3x computation reduction with minimal accuracy drops compared with BERT and our method outperforms previous methods in both efficiency and accuracy. Moreover, in some complex datasets like RTE and WNLI, we prove that the early exiting based on entropy hardly works, and the skipping mechanism is essential for reducing computation.
    Constrained Monotonic Neural Networks. (arXiv:2205.11775v3 [cs.LG] UPDATED)
    Deep neural networks are becoming increasingly popular in approximating arbitrary functions from noisy data. But wider adoption is being hindered by the need to explain such models and to impose additional constraints on them. Monotonicity constraint is one of the most requested properties in real-world scenarios and is the focus of this paper. One of the oldest ways to construct a monotonic fully connected neural network is to constrain its weights to be non-negative while employing a monotonic activation function. Unfortunately, this construction does not work with popular non-saturated activation functions such as ReLU, ELU, SELU etc, as it can only approximate convex functions. We show this shortcoming can be fixed by employing the original activation function for a part of the neurons in the layer, and employing its point reflection for the other part. Our experiments show this approach of building monotonic deep neural networks have matching or better accuracy when compared to other state-of-the-art methods such as deep lattice networks or monotonic networks obtained by heuristic regularization. This method is the simplest one in the sense of having the least number of parameters, not requiring any modifications to the learning procedure or steps post-learning steps.
    Arbitrary Order Meta-Learning with Simple Population-Based Evolution. (arXiv:2303.09478v1 [cs.LG])
    Meta-learning, the notion of learning to learn, enables learning systems to quickly and flexibly solve new tasks. This usually involves defining a set of outer-loop meta-parameters that are then used to update a set of inner-loop parameters. Most meta-learning approaches use complicated and computationally expensive bi-level optimisation schemes to update these meta-parameters. Ideally, systems should perform multiple orders of meta-learning, i.e. to learn to learn to learn and so on, to accelerate their own learning. Unfortunately, standard meta-learning techniques are often inappropriate for these higher-order meta-parameters because the meta-optimisation procedure becomes too complicated or unstable. Inspired by the higher-order meta-learning we observe in real-world evolution, we show that using simple population-based evolution implicitly optimises for arbitrarily-high order meta-parameters. First, we theoretically prove and empirically show that population-based evolution implicitly optimises meta-parameters of arbitrarily-high order in a simple setting. We then introduce a minimal self-referential parameterisation, which in principle enables arbitrary-order meta-learning. Finally, we show that higher-order meta-learning improves performance on time series forecasting tasks.
    Topology optimization with physics-informed neural networks: application to noninvasive detection of hidden geometries. (arXiv:2303.09280v1 [cs.LG])
    Detecting hidden geometrical structures from surface measurements under electromagnetic, acoustic, or mechanical loading is the goal of noninvasive imaging techniques in medical and industrial applications. Solving the inverse problem can be challenging due to the unknown topology and geometry, the sparsity of the data, and the complexity of the physical laws. Physics-informed neural networks (PINNs) have shown promise as a simple-yet-powerful tool for problem inversion, but they have yet to be applied to general problems with a priori unknown topology. Here, we introduce a topology optimization framework based on PINNs that solves geometry detection problems without prior knowledge of the number or types of shapes. We allow for arbitrary solution topology by representing the geometry using a material density field that approaches binary values thanks to a novel eikonal regularization. We validate our framework by detecting the number, locations, and shapes of hidden voids and inclusions in linear and nonlinear elastic bodies using measurements of outer surface displacement from a single mechanical loading experiment. Our methodology opens a pathway for PINNs to solve various engineering problems targeting geometry optimization.
    Tackling Clutter in Radar Data -- Label Generation and Detection Using PointNet++. (arXiv:2303.09530v1 [cs.CV])
    Radar sensors employed for environment perception, e.g. in autonomous vehicles, output a lot of unwanted clutter. These points, for which no corresponding real objects exist, are a major source of errors in following processing steps like object detection or tracking. We therefore present two novel neural network setups for identifying clutter. The input data, network architectures and training configuration are adjusted specifically for this task. Special attention is paid to the downsampling of point clouds composed of multiple sensor scans. In an extensive evaluation, the new setups display substantially better performance than existing approaches. Because there is no suitable public data set in which clutter is annotated, we design a method to automatically generate the respective labels. By applying it to existing data with object annotations and releasing its code, we effectively create the first freely available radar clutter data set representing real-world driving scenarios. Code and instructions are accessible at www.github.com/kopp-j/clutter-ds.
    SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning. (arXiv:2303.09079v1 [cs.CR])
    Self-supervised learning (SSL) is a commonly used approach to learning and encoding data representations. By using a pre-trained SSL image encoder and training a downstream classifier on top of it, impressive performance can be achieved on various tasks with very little labeled data. The increasing usage of SSL has led to an uptick in security research related to SSL encoders and the development of various Trojan attacks. The danger posed by Trojan attacks inserted in SSL encoders lies in their ability to operate covertly and spread widely among various users and devices. The presence of backdoor behavior in Trojaned encoders can inadvertently be inherited by downstream classifiers, making it even more difficult to detect and mitigate the threat. Although current Trojan detection methods in supervised learning can potentially safeguard SSL downstream classifiers, identifying and addressing triggers in the SSL encoder before its widespread dissemination is a challenging task. This is because downstream tasks are not always known, dataset labels are not available, and even the original training dataset is not accessible during the SSL encoder Trojan detection. This paper presents an innovative technique called SSL-Cleanse that is designed to detect and mitigate backdoor attacks in SSL encoders. We evaluated SSL-Cleanse on various datasets using 300 models, achieving an average detection success rate of 83.7% on ImageNet-100. After mitigating backdoors, on average, backdoored encoders achieve 0.24% attack success rate without great accuracy loss, proving the effectiveness of SSL-Cleanse.
    Conditional Synthetic Food Image Generation. (arXiv:2303.09005v1 [cs.CV])
    Generative Adversarial Networks (GAN) have been widely investigated for image synthesis based on their powerful representation learning ability. In this work, we explore the StyleGAN and its application of synthetic food image generation. Despite the impressive performance of GAN for natural image generation, food images suffer from high intra-class diversity and inter-class similarity, resulting in overfitting and visual artifacts for synthetic images. Therefore, we aim to explore the capability and improve the performance of GAN methods for food image generation. Specifically, we first choose StyleGAN3 as the baseline method to generate synthetic food images and analyze the performance. Then, we identify two issues that can cause performance degradation on food images during the training phase: (1) inter-class feature entanglement during multi-food classes training and (2) loss of high-resolution detail during image downsampling. To address both issues, we propose to train one food category at a time to avoid feature entanglement and leverage image patches cropped from high-resolution datasets to retain fine details. We evaluate our method on the Food-101 dataset and show improved quality of generated synthetic food images compared with the baseline. Finally, we demonstrate the great potential of improving the performance of downstream tasks, such as food image classification by including high-quality synthetic training samples in the data augmentation.
    Transformer-based Planning for Symbolic Regression. (arXiv:2303.06833v2 [cs.LG] UPDATED)
    Symbolic regression (SR) is a challenging task in machine learning that involves finding a mathematical expression for a function based on its values. Recent advancements in SR have demonstrated the efficacy of pretrained transformer-based models for generating equations as sequences, which benefit from large-scale pretraining on synthetic datasets and offer considerable advantages over GP-based methods in terms of inference time. However, these models focus on supervised pretraining goals borrowed from text generation and ignore equation-specific objectives like accuracy and complexity. To address this, we propose TPSR, a Transformer-based Planning strategy for Symbolic Regression that incorporates Monte Carlo Tree Search into the transformer decoding process. TPSR, as opposed to conventional decoding strategies, allows for the integration of non-differentiable feedback, such as fitting accuracy and complexity, as external sources of knowledge into the equation generation process. Extensive experiments on various datasets show that our approach outperforms state-of-the-art methods, enhancing the model's fitting-complexity trade-off, extrapolation abilities, and robustness to noise. We also demonstrate that the utilization of various caching mechanisms can further enhance the efficiency of TPSR.
    Exploring Distributional Shifts in Large Language Models for Code Analysis. (arXiv:2303.09128v1 [cs.CL])
    We systematically study the capacity of two large language models for code - CodeT5 and Codex - to generalize to out-of-domain data. In this study, we consider two fundamental applications - code summarization, and code generation. We split data into domains following its natural boundaries - by an organization, by a project, and by a module within the software project. This makes recognition of in-domain vs out-of-domain data at the time of deployment trivial. We establish that samples from each new domain present both models with a significant challenge of distribution shift. We study how well different established methods can adapt models to better generalize to new domains. Our experiments show that while multitask learning alone is a reasonable baseline, combining it with few-shot finetuning on examples retrieved from training data can achieve very strong performance. In fact, according to our experiments, this solution can outperform direct finetuning for very low-data scenarios. Finally, we consider variations of this approach to create a more broadly applicable method to adapt to multiple domains at once. We find that in the case of code generation, a model adapted to multiple domains simultaneously performs on par with those adapted to each domain individually.
    Plant Disease Detection using Region-Based Convolutional Neural Network. (arXiv:2303.09063v1 [cs.CV])
    Agriculture plays an important role in the food and economy of Bangladesh. The rapid growth of population over the years also has increased the demand for food production. One of the major reasons behind low crop production is numerous bacteria, virus and fungal plant diseases. Early detection of plant diseases and proper usage of pesticides and fertilizers are vital for preventing the diseases and boost the yield. Most of the farmers use generalized pesticides and fertilizers in the entire fields without specifically knowing the condition of the plants. Thus the production cost oftentimes increases, and, not only that, sometimes this becomes detrimental to the yield. Deep Learning models are found to be very effective to automatically detect plant diseases from images of plants, thereby reducing the need for human specialists. This paper aims at building a lightweight deep learning model for predicting leaf disease in tomato plants. By modifying the region-based convolutional neural network, we design an efficient and effective model that demonstrates satisfactory empirical performance on a benchmark dataset. Our proposed model can easily be deployed in a larger system where drones take images of leaves and these images will be fed into our model to know the health condition.
    Controlled Descent Training. (arXiv:2303.09216v1 [math.OC])
    In this work, a novel and model-based artificial neural network (ANN) training method is developed supported by optimal control theory. The method augments training labels in order to robustly guarantee training loss convergence and improve training convergence rate. Dynamic label augmentation is proposed within the framework of gradient descent training where the convergence of training loss is controlled. First, we capture the training behavior with the help of empirical Neural Tangent Kernels (NTK) and borrow tools from systems and control theory to analyze both the local and global training dynamics (e.g. stability, reachability). Second, we propose to dynamically alter the gradient descent training mechanism via fictitious labels as control inputs and an optimal state feedback policy. In this way, we enforce locally $\mathcal{H}_2$ optimal and convergent training behavior. The novel algorithm, \textit{Controlled Descent Training} (CDT), guarantees local convergence. CDT unleashes new potentials in the analysis, interpretation, and design of ANN architectures. The applicability of the method is demonstrated on standard regression and classification problems.
    Large-scale End-of-Life Prediction of Hard Disks in Distributed Datacenters. (arXiv:2303.08955v1 [cs.LG])
    On a daily basis, data centers process huge volumes of data backed by the proliferation of inexpensive hard disks. Data stored in these disks serve a range of critical functional needs from financial, and healthcare to aerospace. As such, premature disk failure and consequent loss of data can be catastrophic. To mitigate the risk of failures, cloud storage providers perform condition-based monitoring and replace hard disks before they fail. By estimating the remaining useful life of hard disk drives, one can predict the time-to-failure of a particular device and replace it at the right time, ensuring maximum utilization whilst reducing operational costs. In this work, large-scale predictive analyses are performed using severely skewed health statistics data by incorporating customized feature engineering and a suite of sequence learners. Past work suggests using LSTMs as an excellent approach to predicting remaining useful life. To this end, we present an encoder-decoder LSTM model where the context gained from understanding health statistics sequences aid in predicting an output sequence of the number of days remaining before a disk potentially fails. The models developed in this work are trained and tested across an exhaustive set of all of the 10 years of S.M.A.R.T. health data in circulation from Backblaze and on a wide variety of disk instances. It closes the knowledge gap on what full-scale training achieves on thousands of devices and advances the state-of-the-art by providing tangible metrics for evaluation and generalization for practitioners looking to extend their workflow to all years of health data in circulation across disk manufacturers. The encoder-decoder LSTM posted an RMSE of 0.83 on an exhaustive set while being able to generalize competitively over the other Seagate family hard drives.
    Machine learning based biomedical image processing for echocardiographic images. (arXiv:2303.09103v1 [eess.IV])
    The popularity of Artificial intelligence and machine learning have prompted researchers to use it in the recent researches. The proposed method uses K-Nearest Neighbor (KNN) algorithm for segmentation of medical images, extracting of image features for analysis by classifying the data based on the neural networks. Classification of the images in medical imaging is very important, KNN is one suitable algorithm which is simple, conceptual and computational, which provides very good accuracy in results. KNN algorithm is a unique user-friendly approach with wide range of applications in machine learning algorithms which are majorly used for the various image processing applications including classification, segmentation and regression issues of the image processing. The proposed system uses gray level co-occurrence matrix features. The trained neural network has been tested successfully on a group of echocardiographic images, errors were compared using regression plot. The results of the algorithm are tested using various quantitative as well as qualitative metrics and proven to exhibit better performance in terms of both quantitative and qualitative metrics in terms of current state-of-the-art methods in the related area. To compare the performance of trained neural network the regression analysis performed showed a good correlation.
    Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning. (arXiv:2303.09483v1 [cs.LG])
    In contrast to the natural capabilities of humans to learn new tasks in a sequential fashion, neural networks are known to suffer from catastrophic forgetting, where the model's performances on old tasks drop dramatically after being optimized for a new task. Since then, the continual learning (CL) community has proposed several solutions aiming to equip the neural network with the ability to learn the current task (plasticity) while still achieving high accuracy on the previous tasks (stability). Despite remarkable improvements, the plasticity-stability trade-off is still far from being solved and its underlying mechanism is poorly understood. In this work, we propose Auxiliary Network Continual Learning (ANCL), a novel method that applies an additional auxiliary network which promotes plasticity to the continually learned model which mainly focuses on stability. More concretely, the proposed framework materializes in a regularizer that naturally interpolates between plasticity and stability, surpassing strong baselines on task incremental and class incremental scenarios. Through extensive analyses on ANCL solutions, we identify some essential principles beneath the stability-plasticity trade-off.
    Finding Minimum-Cost Explanations for Predictions made by Tree Ensembles. (arXiv:2303.09271v1 [cs.LG])
    The ability to explain why a machine learning model arrives at a particular prediction is crucial when used as decision support by human operators of critical systems. The provided explanations must be provably correct, and preferably without redundant information, called minimal explanations. In this paper, we aim at finding explanations for predictions made by tree ensembles that are not only minimal, but also minimum with respect to a cost function. To this end, we first present a highly efficient oracle that can determine the correctness of explanations, surpassing the runtime performance of current state-of-the-art alternatives by several orders of magnitude when computing minimal explanations. Secondly, we adapt an algorithm called MARCO from related works (calling it m-MARCO) for the purpose of computing a single minimum explanation per prediction, and demonstrate an overall speedup factor of two compared to the MARCO algorithm which enumerates all minimal explanations. Finally, we study the obtained explanations from a range of use cases, leading to further insights of their characteristics. In particular, we observe that in several cases, there are more than 100,000 minimal explanations to choose from for a single prediction. In these cases, we see that only a small portion of the minimal explanations are also minimum, and that the minimum explanations are significantly less verbose, hence motivating the aim of this work.
    Stock Trend Prediction: A Semantic Segmentation Approach. (arXiv:2303.09323v1 [q-fin.ST])
    Market financial forecasting is a trending area in deep learning. Deep learning models are capable of tackling the classic challenges in stock market data, such as its extremely complicated dynamics as well as long-term temporal correlation. To capture the temporal relationship among these time series, recurrent neural networks are employed. However, it is difficult for recurrent models to learn to keep track of long-term information. Convolutional Neural Networks have been utilized to better capture the dynamics and extract features for both short- and long-term forecasting. However, semantic segmentation and its well-designed fully convolutional networks have never been studied for time-series dense classification. We present a novel approach to predict long-term daily stock price change trends with fully 2D-convolutional encoder-decoders. We generate input frames with daily prices for a time-frame of T days. The aim is to predict future trends by pixel-wise classification of the current price frame. We propose a hierarchical CNN structure to encode multiple price frames to multiscale latent representation in parallel using Atrous Spatial Pyramid Pooling blocks and take that temporal coarse feature stacks into account in the decoding stages. Our hierarchical structure of CNNs makes it capable of capturing both long and short-term temporal relationships effectively. The effect of increasing the input time horizon via incrementing parallel encoders has been studied with interesting and substantial changes in the output segmentation masks. We achieve overall accuracy and AUC of %78.18 and 0.88 for joint trend prediction over the next 20 days, surpassing other semantic segmentation approaches. We compared our proposed model with several deep models specifically designed for technical analysis and found that for different output horizons, our proposed models outperformed other models.
    Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement. (arXiv:2303.08983v1 [cs.CV])
    We propose Dataset Reinforcement, a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. We propose a Dataset Reinforcement strategy based on data augmentation and knowledge distillation. Our generic strategy is designed based on extensive analysis across CNN- and transformer-based models and performing large-scale study of distillation with state-of-the-art models with various data augmentations. We create a reinforced version of the ImageNet training dataset, called ImageNet+, as well as reinforced datasets CIFAR-100+, Flowers-102+, and Food-101+. Models trained with ImageNet+ are more accurate, robust, and calibrated, and transfer well to downstream tasks (e.g., segmentation and detection). As an example, the accuracy of ResNet-50 improves by 1.7% on the ImageNet validation set, 3.5% on ImageNetV2, and 10.0% on ImageNet-R. Expected Calibration Error (ECE) on the ImageNet validation set is also reduced by 9.9%. Using this backbone with Mask-RCNN for object detection on MS-COCO, the mean average precision improves by 0.8%. We reach similar gains for MobileNets, ViTs, and Swin-Transformers. For MobileNetV3 and Swin-Tiny we observe significant improvements on ImageNet-R/A/C of up to 10% improved robustness. Models pretrained on ImageNet+ and fine-tuned on CIFAR-100+, Flowers-102+, and Food-101+, reach up to 3.4% improved accuracy.
    Physics-Informed Neural Networks for Time-Domain Simulations: Accuracy, Computational Cost, and Flexibility. (arXiv:2303.08994v1 [eess.SY])
    The simulation of power system dynamics poses a computationally expensive task. Considering the growing uncertainty of generation and demand patterns, thousands of scenarios need to be continuously assessed to ensure the safety of power systems. Physics-Informed Neural Networks (PINNs) have recently emerged as a promising solution for drastically accelerating computations of non-linear dynamical systems. This work investigates the applicability of these methods for power system dynamics, focusing on the dynamic response to load disturbances. Comparing the prediction of PINNs to the solution of conventional solvers, we find that PINNs can be 10 to 1000 times faster than conventional solvers. At the same time, we find them to be sufficiently accurate and numerically stable even for large time steps. To facilitate a deeper understanding, this paper also presents a new regularisation of Neural Network (NN) training by introducing a gradient-based term in the loss function. The resulting NNs, which we call dtNNs, help us deliver a comprehensive analysis about the strengths and weaknesses of the NN based approaches, how incorporating knowledge of the underlying physics affects NN performance, and how this compares with conventional solvers for power system dynamics.
    Applying unsupervised keyphrase methods on concepts extracted from discharge sheets. (arXiv:2303.08928v1 [cs.CL])
    Clinical notes containing valuable patient information are written by different health care providers with various scientific levels and writing styles. It might be helpful for clinicians and researchers to understand what information is essential when dealing with extensive electronic medical records. Entities recognizing and mapping them to standard terminologies is crucial in reducing ambiguity in processing clinical notes. Although named entity recognition and entity linking are critical steps in clinical natural language processing, they can also result in the production of repetitive and low-value concepts. In other hand, all parts of a clinical text do not share the same importance or content in predicting the patient's condition. As a result, it is necessary to identify the section in which each content is recorded and also to identify key concepts to extract meaning from clinical texts. In this study, these challenges have been addressed by using clinical natural language processing techniques. In addition, in order to identify key concepts, a set of popular unsupervised key phrase extraction methods has been verified and evaluated. Considering that most of the clinical concepts are in the form of multi-word expressions and their accurate identification requires the user to specify n-gram range, we have proposed a shortcut method to preserve the structure of the expression based on TF-IDF. In order to evaluate the pre-processing method and select the concepts, we have designed two types of downstream tasks (multiple and binary classification) using the capabilities of transformer-based models. The obtained results show the superiority of proposed method in combination with SciBERT model, also offer an insight into the efficacy of general extracting essential phrase methods for clinical notes.  ( 3 min )
    Machine Learning for Flow Cytometry Data Analysis. (arXiv:2303.09007v1 [cs.LG])
    Flow cytometry mainly used for detecting the characteristics of a number of biochemical substances based on the expression of specific markers in cells. It is particularly useful for detecting membrane surface receptors, antigens, ions, or during DNA/RNA expression. Not only can it be employed as a biomedical research tool for recognising distinctive types of cells in mixed populations, but it can also be used as a diagnostic tool for classifying abnormal cell populations connected with disease. Modern flow cytometers can rapidly analyse tens of thousands of cells at the same time while also measuring multiple parameters from a single cell. However, the rapid development of flow cytometers makes it challenging for conventional analysis methods to interpret flow cytometry data. Researchers need to be able to distinguish interesting-looking cell populations manually in multi-dimensional data collected from millions of cells. Thus, it is essential to find a robust approach for analysing flow cytometry data automatically, specifically in identifying cell populations automatically. This thesis mainly concerns discover the potential shortcoming of current automated-gating algorithms in both real datasets and synthetic datasets. Three representative automated clustering algorithms are selected to be applied, compared and evaluated by completely and partially automated gating. A subspace clustering ProClus also implemented in this thesis. The performance of ProClus in flow cytometry is not well, but it is still a useful algorithm to detect noise.  ( 2 min )
    Certifiable (Multi)Robustness Against Patch Attacks Using ERM. (arXiv:2303.08944v1 [cs.LG])
    Consider patch attacks, where at test-time an adversary manipulates a test image with a patch in order to induce a targeted misclassification. We consider a recent defense to patch attacks, Patch-Cleanser (Xiang et al. [2022]). The Patch-Cleanser algorithm requires a prediction model to have a ``two-mask correctness'' property, meaning that the prediction model should correctly classify any image when any two blank masks replace portions of the image. Xiang et al. learn a prediction model to be robust to two-mask operations by augmenting the training set with pairs of masks at random locations of training images and performing empirical risk minimization (ERM) on the augmented dataset. However, in the non-realizable setting when no predictor is perfectly correct on all two-mask operations on all images, we exhibit an example where ERM fails. To overcome this challenge, we propose a different algorithm that provably learns a predictor robust to all two-mask operations using an ERM oracle, based on prior work by Feige et al. [2015]. We also extend this result to a multiple-group setting, where we can learn a predictor that achieves low robust loss on all groups simultaneously.  ( 2 min )
    Gated Compression Layers for Efficient Always-On Models. (arXiv:2303.08970v1 [cs.LG])
    Mobile and embedded machine learning developers frequently have to compromise between two inferior on-device deployment strategies: sacrifice accuracy and aggressively shrink their models to run on dedicated low-power cores; or sacrifice battery by running larger models on more powerful compute cores such as neural processing units or the main application processor. In this paper, we propose a novel Gated Compression layer that can be applied to transform existing neural network architectures into Gated Neural Networks. Gated Neural Networks have multiple properties that excel for on-device use cases that help significantly reduce power, boost accuracy, and take advantage of heterogeneous compute cores. We provide results across five public image and audio datasets that demonstrate the proposed Gated Compression layer effectively stops up to 96% of negative samples, compresses 97% of positive samples, while maintaining or improving model accuracy.  ( 2 min )
    Forecasting Particle Accelerator Interruptions Using Logistic LASSO Regression. (arXiv:2303.08984v1 [physics.acc-ph])
    Unforeseen particle accelerator interruptions, also known as interlocks, lead to abrupt operational changes despite being necessary safety measures. These may result in substantial loss of beam time and perhaps even equipment damage. We propose a simple yet powerful binary classification model aiming to forecast such interruptions, in the case of the High Intensity Proton Accelerator complex at the Paul Scherrer Institut. The model is formulated as logistic regression penalized by least absolute shrinkage and selection operator, based on a statistical two sample test to distinguish between unstable and stable states of the accelerator. The primary objective for receiving alarms prior to interlocks is to allow for countermeasures and reduce beam time loss. Hence, a continuous evaluation metric is developed to measure the saved beam time in any period, given the assumption that interlocks could be circumvented by reducing the beam current. The best-performing interlock-to-stable classifier can potentially increase the beam time by around 5 min in a day. Possible instrumentation for fast adjustment of the beam current is also listed and discussed.  ( 2 min )
    NESS: Learning Node Embeddings from Static SubGraphs. (arXiv:2303.08958v1 [cs.LG])
    We present a framework for learning Node Embeddings from Static Subgraphs (NESS) using a graph autoencoder (GAE) in a transductive setting. Moreover, we propose a novel approach for contrastive learning in the same setting. We demonstrate that using static subgraphs during training with a GAE improves node representation for link prediction tasks compared to current autoencoding methods using the entire graph or stochastic subgraphs. NESS consists of two steps: 1) Partitioning the training graph into subgraphs using random edge split (RES) during data pre-processing, and 2) Aggregating the node representations learned from each subgraph to obtain a joint representation of the graph at test time. Our experiments show that NESS improves the performance of a wide range of graph encoders and achieves state-of-the-art (SOTA) results for link prediction on multiple benchmark datasets.  ( 2 min )
    Self-Inspection Method of Unmanned Aerial Vehicles in Power Plants Using Deep Q-Network Reinforcement Learning. (arXiv:2303.09013v1 [cs.RO])
    For the purpose of inspecting power plants, autonomous robots can be built using reinforcement learning techniques. The method replicates the environment and employs a simple reinforcement learning (RL) algorithm. This strategy might be applied in several sectors, including the electricity generation sector. A pre-trained model with perception, planning, and action is suggested by the research. To address optimization problems, such as the Unmanned Aerial Vehicle (UAV) navigation problem, Deep Q-network (DQN), a reinforcement learning-based framework that Deepmind launched in 2015, incorporates both deep learning and Q-learning. To overcome problems with current procedures, the research proposes a power plant inspection system incorporating UAV autonomous navigation and DQN reinforcement learning. These training processes set reward functions with reference to states and consider both internal and external effect factors, which distinguishes them from other reinforcement learning training techniques now in use. The key components of the reinforcement learning segment of the technique, for instance, introduce states such as the simulation of a wind field, the battery charge level of an unmanned aerial vehicle, the height the UAV reached, etc. The trained model makes it more likely that the inspection strategy will be applied in practice by enabling the UAV to move around on its own in difficult environments. The average score of the model converges to 9,000. The trained model allowed the UAV to make the fewest number of rotations necessary to go to the target point.  ( 2 min )
    Knowledge Transfer for Pseudo-code Generation from Low Resource Programming Language. (arXiv:2303.09062v1 [cs.SE])
    Generation of pseudo-code descriptions of legacy source code for software maintenance is a manually intensive task. Recent encoder-decoder language models have shown promise for automating pseudo-code generation for high resource programming languages such as C++, but are heavily reliant on the availability of a large code-pseudocode corpus. Soliciting such pseudocode annotations for codes written in legacy programming languages (PL) is a time consuming and costly affair requiring a thorough understanding of the source PL. In this paper, we focus on transferring the knowledge acquired by the code-to-pseudocode neural model trained on a high resource PL (C++) using parallel code-pseudocode data. We aim to transfer this knowledge to a legacy PL (C) with no PL-pseudocode parallel data for training. To achieve this, we utilize an Iterative Back Translation (IBT) approach with a novel test-cases based filtration strategy, to adapt the trained C++-to-pseudocode model to C-to-pseudocode model. We observe an improvement of 23.27% in the success rate of the generated C codes through back translation, over the successive IBT iteration, illustrating the efficacy of our approach.  ( 2 min )
    Class-Guided Image-to-Image Diffusion: Cell Painting from Brightfield Images with Class Labels. (arXiv:2303.08863v1 [cs.CV])
    Image-to-image reconstruction problems with free or inexpensive metadata in the form of class labels appear often in biological and medical image domains. Existing text-guided or style-transfer image-to-image approaches do not translate to datasets where additional information is provided as discrete classes. We introduce and implement a model which combines image-to-image and class-guided denoising diffusion probabilistic models. We train our model on a real-world dataset of microscopy images used for drug discovery, with and without incorporating metadata labels. By exploring the properties of image-to-image diffusion with relevant labels, we show that class-guided image-to-image diffusion can improve the meaningful content of the reconstructed images and outperform the unguided model in useful downstream tasks.  ( 2 min )
    Bayesian Quadrature for Neural Ensemble Search. (arXiv:2303.08874v1 [stat.ML])
    Ensembling can improve the performance of Neural Networks, but existing approaches struggle when the architecture likelihood surface has dispersed, narrow peaks. Furthermore, existing methods construct equally weighted ensembles, and this is likely to be vulnerable to the failure modes of the weaker architectures. By viewing ensembling as approximately marginalising over architectures we construct ensembles using the tools of Bayesian Quadrature -- tools which are well suited to the exploration of likelihood surfaces with dispersed, narrow peaks. Additionally, the resulting ensembles consist of architectures weighted commensurate with their performance. We show empirically -- in terms of test likelihood, accuracy, and expected calibration error -- that our method outperforms state-of-the-art baselines, and verify via ablation studies that its components do so independently.  ( 2 min )
    Learning ground states of gapped quantum Hamiltonians with Kernel Methods. (arXiv:2303.08902v1 [quant-ph])
    Neural network approaches to approximate the ground state of quantum hamiltonians require the numerical solution of a highly nonlinear optimization problem. We introduce a statistical learning approach that makes the optimization trivial by using kernel methods. Our scheme is an approximate realization of the power method, where supervised learning is used to learn the next step of the power iteration. We show that the ground state properties of arbitrary gapped quantum hamiltonians can be reached with polynomial resources under the assumption that the supervised learning is efficient. Using kernel ridge regression, we provide numerical evidence that the learning assumption is verified by applying our scheme to find the ground states of several prototypical interacting many-body quantum systems, both in one and two dimensions, showing the flexibility of our approach.  ( 2 min )
    Discrete-Time Nonlinear Feedback Linearization via Physics-Informed Machine Learning. (arXiv:2303.08884v1 [cs.LG])
    We present a physics-informed machine learning (PIML) scheme for the feedback linearization of nonlinear discrete-time dynamical systems. The PIML finds the nonlinear transformation law, thus ensuring stability via pole placement, in one step. In order to facilitate convergence in the presence of steep gradients in the nonlinear transformation law, we address a greedy-wise training procedure. We assess the performance of the proposed PIML approach via a benchmark nonlinear discrete map for which the feedback linearization transformation law can be derived analytically; the example is characterized by steep gradients, due to the presence of singularities, in the domain of interest. We show that the proposed PIML outperforms, in terms of numerical approximation accuracy, the traditional numerical implementation, which involves the construction--and the solution in terms of the coefficients of a power-series expansion--of a system of homological equations as well as the implementation of the PIML in the entire domain, thus highlighting the importance of continuation techniques in the training procedure of PIML.  ( 2 min )
    LRDB: LSTM Raw data DNA Base-caller based on long-short term models in an active learning environment. (arXiv:2303.08915v1 [q-bio.GN])
    The first important step in extracting DNA characters is using the output data of MinION devices in the form of electrical current signals. Various cutting-edge base callers use this data to detect the DNA characters based on the input. In this paper, we discuss several shortcomings of prior base callers in the case of time-critical applications, privacy-aware design, and the problem of catastrophic forgetting. Next, we propose the LRDB model, a lightweight open-source model for private developments with a better read-identity (0.35% increase) for the target bacterial samples in the paper. We have limited the extent of training data and benefited from the transfer learning algorithm to make the active usage of the LRDB viable in critical applications. Henceforth, less training time for adapting to new DNA samples (in our case, Bacterial samples) is needed. Furthermore, LRDB can be modified concerning the user constraints as the results show a negligible accuracy loss in case of using fewer parameters. We have also assessed the noise-tolerance property, which offers about a 1.439% decline in accuracy for a 15dB noise injection, and the performance metrics show that the model executes in a medium speed range compared with current cutting-edge models.  ( 2 min )
    EvalAttAI: A Holistic Approach to Evaluating Attribution Maps in Robust and Non-Robust Models. (arXiv:2303.08866v1 [cs.LG])
    The expansion of explainable artificial intelligence as a field of research has generated numerous methods of visualizing and understanding the black box of a machine learning model. Attribution maps are generally used to highlight the parts of the input image that influence the model to make a specific decision. On the other hand, the robustness of machine learning models to natural noise and adversarial attacks is also being actively explored. This paper focuses on evaluating methods of attribution mapping to find whether robust neural networks are more explainable. We explore this problem within the application of classification for medical imaging. Explainability research is at an impasse. There are many methods of attribution mapping, but no current consensus on how to evaluate them and determine the ones that are the best. Our experiments on multiple datasets (natural and medical imaging) and various attribution methods reveal that two popular evaluation metrics, Deletion and Insertion, have inherent limitations and yield contradictory results. We propose a new explainability faithfulness metric (called EvalAttAI) that addresses the limitations of prior metrics. Using our novel evaluation, we found that Bayesian deep neural networks using the Variational Density Propagation technique were consistently more explainable when used with the best performing attribution method, the Vanilla Gradient. However, in general, various types of robust neural networks may not be more explainable, despite these models producing more visually plausible attribution maps.  ( 3 min )
    Wireless Sensor Networks anomaly detection using Machine Learning: A Survey. (arXiv:2303.08823v1 [cs.LG])
    Wireless Sensor Networks (WSNs) have become increasingly valuable in various civil/military applications like industrial process control, civil engineering applications such as buildings structural strength monitoring, environmental monitoring, border intrusion, IoT (Internet of Things), and healthcare. However, the sensed data generated by WSNs is often noisy and unreliable, making it a challenge to detect and diagnose anomalies. Machine learning (ML) techniques have been widely used to address this problem by detecting and identifying unusual patterns in the sensed data. This survey paper provides an overview of the state of the art applications of ML techniques for data anomaly detection in WSN domains. We first introduce the characteristics of WSNs and the challenges of anomaly detection in WSNs. Then, we review various ML techniques such as supervised, unsupervised, and semi-supervised learning that have been applied to WSN data anomaly detection. We also compare different ML-based approaches and their performance evaluation metrics. Finally, we discuss open research challenges and future directions for applying ML techniques in WSNs sensed data anomaly detection.  ( 2 min )
    Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning. (arXiv:2303.08909v1 [cs.LG])
    Sequential decision making in the real world often requires finding a good balance of conflicting objectives. In general, there exist a plethora of Pareto-optimal policies that embody different patterns of compromises between objectives, and it is technically challenging to obtain them exhaustively using deep neural networks. In this work, we propose a novel multi-objective reinforcement learning (MORL) algorithm that trains a single neural network via policy gradient to approximately obtain the entire Pareto set in a single run of training, without relying on linear scalarization of objectives. The proposed method works in both continuous and discrete action spaces with no design change of the policy network. Numerical experiments in benchmark environments demonstrate the practicality and efficacy of our approach in comparison to standard MORL baselines.  ( 2 min )
    Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems. (arXiv:2303.08873v1 [cs.PL])
    Heterogeneity has become a mainstream architecture design choice for building High Performance Computing systems. However, heterogeneity poses significant challenges for achieving performance portability of execution. Adapting a program to a new heterogeneous platform is laborious and requires developers to manually explore a vast space of execution parameters. To address those challenges, this paper proposes new extensions to OpenMP for autonomous, machine learning-driven adaptation. Our solution includes a set of novel language constructs, compiler transformations, and runtime support. We propose a producer-consumer pattern to flexibly define multiple, different variants of OpenMP code regions to enable adaptation. Those regions are transparently profiled at runtime to autonomously learn optimizing machine learning models that dynamically select the fastest variant. Our approach significantly reduces users' efforts of programming adaptive applications on heterogeneous architectures by leveraging machine learning techniques and code generation capabilities of OpenMP compilation. Using a complete reference implementation in Clang/LLVM we evaluate three use-cases of adaptive CPU-GPU execution. Experiments with HPC proxy applications and benchmarks demonstrate that the proposed adaptive OpenMP extensions automatically choose the best performing code variants for various adaptation possibilities, in several different heterogeneous platforms of CPUs and GPUs.  ( 2 min )
    A Multifidelity deep operator network approach to closure for multiscale systems. (arXiv:2303.08893v1 [physics.comp-ph])
    Projection-based reduced order models (PROMs) have shown promise in representing the behavior of multiscale systems using a small set of generalized (or latent) variables. Despite their success, PROMs can be susceptible to inaccuracies, even instabilities, due to the improper accounting of the interaction between the resolved and unresolved scales of the multiscale system (known as the closure problem). In the current work, we interpret closure as a multifidelity problem and use a multifidelity deep operator network (DeepONet) framework to address it. In addition, to enhance the stability and/or accuracy of the multifidelity-based closure, we employ the recently developed "in-the-loop" training approach from the literature on coupling physics and machine learning models. The resulting approach is tested on shock advection for the one-dimensional viscous Burgers equation and vortex merging for the two-dimensional Navier-Stokes equations. The numerical experiments show significant improvement of the predictive ability of the closure-corrected PROM over the un-corrected one both in the interpolative and the extrapolative regimes.  ( 2 min )
    Boosting Convolutional Neural Networks' Protein Binding Site Prediction Capacity Using SE(3)-invariant transformers, Transfer Learning and Homology-based Augmentation. (arXiv:2303.08818v1 [q-bio.QM])
    Figuring out small molecule binding sites in target proteins, in the resolution of either pocket or residue, is critical in many virtual and real drug-discovery scenarios. Since it is not always easy to find such binding sites based on domain knowledge or traditional methods, different deep learning methods that predict binding sites out of protein structures have been developed in recent years. Here we present a new such deep learning algorithm, that significantly outperformed all state-of-the-art baselines in terms of the both resolutions$\unicode{x2013}$pocket and residue. This good performance was also demonstrated in a case study involving the protein human serum albumin and its binding sites. Our algorithm included new ideas both in the model architecture and in the training method. For the model architecture, it incorporated SE(3)-invariant geometric self-attention layers that operate on top of residue-level CNN outputs. This residue-level processing of the model allowed a transfer learning between the two resolutions, which turned out to significantly improve the binding pocket prediction. Moreover, we developed novel augmentation method based on protein homology, which prevented our model from over-fitting. Overall, we believe that our contribution to the literature is twofold. First, we provided a new computational method for binding site prediction that is relevant to real-world applications, as shown by the good performance on different benchmarks and case study. Second, the novel ideas in our method$\unicode{x2013}$the model architecture, transfer learning and the homology augmentation$\unicode{x2013}$would serve as useful components in future works.  ( 3 min )
    On the Benefits of Leveraging Structural Information in Planning Over the Learned Model. (arXiv:2303.08856v1 [cs.LG])
    Model-based Reinforcement Learning (RL) integrates learning and planning and has received increasing attention in recent years. However, learning the model can incur a significant cost (in terms of sample complexity), due to the need to obtain a sufficient number of samples for each state-action pair. In this paper, we investigate the benefits of leveraging structural information about the system in terms of reducing sample complexity. Specifically, we consider the setting where the transition probability matrix is a known function of a number of structural parameters, whose values are initially unknown. We then consider the problem of estimating those parameters based on the interactions with the environment. We characterize the difference between the Q estimates and the optimal Q value as a function of the number of samples. Our analysis shows that there can be a significant saving in sample complexity by leveraging structural information about the model. We illustrate the findings by considering several problems including controlling a queuing system with heterogeneous servers, and seeking an optimal path in a stochastic windy gridworld.  ( 2 min )
  • Open

    Necessary and Sufficient Conditions for Inverse Reinforcement Learning of Bayesian Stopping Time Problems. (arXiv:2007.03481v6 [cs.LG] UPDATED)
    This paper presents an inverse reinforcement learning~(IRL) framework for Bayesian stopping time problems. By observing the actions of a Bayesian decision maker, we provide a necessary and sufficient condition to identify if these actions are consistent with optimizing a cost function. In a Bayesian (partially observed) setting, the inverse learner can at best identify optimality wrt the observed strategies. Our IRL algorithm identifies optimality and then constructs set-valued estimates of the cost function.To achieve this IRL objective, we use novel ideas from Bayesian revealed preferences stemming from microeconomics. We illustrate the proposed IRL scheme using two important examples of stopping time problems, namely, sequential hypothesis testing and Bayesian search. As a real-world example, we illustrate using a YouTube dataset comprising metadata from 190000 videos how the proposed IRL method predicts user engagement in online multimedia platforms with high accuracy. Finally, for finite datasets, we propose an IRL detection algorithm and give finite sample bounds on its error probabilities.
    GLASU: A Communication-Efficient Algorithm for Federated Learning with Vertically Distributed Graph Data. (arXiv:2303.09531v1 [cs.LG])
    Vertical federated learning (VFL) is a distributed learning paradigm, where computing clients collectively train a model based on the partial features of the same set of samples they possess. Current research on VFL focuses on the case when samples are independent, but it rarely addresses an emerging scenario when samples are interrelated through a graph. For graph-structured data, graph neural networks (GNNs) are competitive machine learning models, but a naive implementation in the VFL setting causes a significant communication overhead. Moreover, the analysis of the training is faced with a challenge caused by the biased stochastic gradients. In this paper, we propose a model splitting method that splits a backbone GNN across the clients and the server and a communication-efficient algorithm, GLASU, to train such a model. GLASU adopts lazy aggregation and stale updates to skip aggregation when evaluating the model and skip feature exchanges during training, greatly reducing communication. We offer a theoretical analysis and conduct extensive numerical experiments on real-world datasets, showing that the proposed algorithm effectively trains a GNN model, whose performance matches that of the backbone GNN when trained in a centralized manner.
    Gradient flow on extensive-rank positive semi-definite matrix denoising. (arXiv:2303.09474v1 [stat.ML])
    In this work, we present a new approach to analyze the gradient flow for a positive semi-definite matrix denoising problem in an extensive-rank and high-dimensional regime. We use recent linear pencil techniques of random matrix theory to derive fixed point equations which track the complete time evolution of the matrix-mean-square-error of the problem. The predictions of the resulting fixed point equations are validated by numerical experiments. In this short note we briefly illustrate a few predictions of our formalism by way of examples, and in particular we uncover continuous phase transitions in the extensive-rank and high-dimensional regime, which connect to the classical phase transitions of the low-rank problem in the appropriate limit. The formalism has much wider applicability than shown in this communication.
    High-Dimensional Penalized Bernstein Support Vector Machines. (arXiv:2303.09066v1 [stat.ML])
    The support vector machines (SVM) is a powerful classifier used for binary classification to improve the prediction accuracy. However, the non-differentiability of the SVM hinge loss function can lead to computational difficulties in high dimensional settings. To overcome this problem, we rely on Bernstein polynomial and propose a new smoothed version of the SVM hinge loss called the Bernstein support vector machine (BernSVM), which is suitable for the high dimension $p >> n$ regime. As the BernSVM objective loss function is of the class $C^2$, we propose two efficient algorithms for computing the solution of the penalized BernSVM. The first algorithm is based on coordinate descent with maximization-majorization (MM) principle and the second one is IRLS-type algorithm (iterative re-weighted least squares). Under standard assumptions, we derive a cone condition and a restricted strong convexity to establish an upper bound for the weighted Lasso BernSVM estimator. Using a local linear approximation, we extend the latter result to penalized BernSVM with non convex penalties SCAD and MCP. Our bound holds with high probability and achieves a rate of order $\sqrt{s\log(p)/n}$, where $s$ is the number of active features. Simulation studies are considered to illustrate the prediction accuracy of BernSVM to its competitors and also to compare the performance of the two algorithms in terms of computational timing and error estimation. The use of the proposed method is illustrated through analysis of three large-scale real data examples.
    FibeRed: Fiberwise Dimensionality Reduction of Topologically Complex Data with Vector Bundles. (arXiv:2206.06513v2 [cs.CG] UPDATED)
    Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the fibers, while preserving the large scale topology. We formalize this point of view, and, as an application, we describe an algorithm which takes as input a dataset together with an initial representation of it in Euclidean space, assumed to recover part of its large scale topology, and outputs a new representation that integrates local representations, obtained through local linear dimensionality reduction, along the initial global representation. We demonstrate this algorithm on examples coming from dynamical systems and chemistry. In these examples, our algorithm is able to learn topologically faithful embeddings of the data in lower target dimension than various well known metric-based dimensionality reduction algorithms.
    Toroidal Coordinates: Decorrelating Circular Coordinates With Lattice Reduction. (arXiv:2212.07201v2 [cs.CG] UPDATED)
    The circular coordinates algorithm of de Silva, Morozov, and Vejdemo-Johansson takes as input a dataset together with a cohomology class representing a $1$-dimensional hole in the data; the output is a map from the data into the circle that captures this hole, and that is of minimum energy in a suitable sense. However, when applied to several cohomology classes, the output circle-valued maps can be "geometrically correlated" even if the chosen cohomology classes are linearly independent. It is shown in the original work that less correlated maps can be obtained with suitable integer linear combinations of the cohomology classes, with the linear combinations being chosen by inspection. In this paper, we identify a formal notion of geometric correlation between circle-valued maps which, in the Riemannian manifold case, corresponds to the Dirichlet form, a bilinear form derived from the Dirichlet energy. We describe a systematic procedure for constructing low energy torus-valued maps on data, starting from a set of linearly independent cohomology classes. We showcase our procedure with computational examples. Our main algorithm is based on the Lenstra--Lenstra--Lov\'asz algorithm from computational number theory.
    Towards Lower Bounds on the Depth of ReLU Neural Networks. (arXiv:2105.14835v4 [cs.LG] UPDATED)
    We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). As a by-product of our investigations, we settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We also present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.
    Bayesian Quadrature for Neural Ensemble Search. (arXiv:2303.08874v1 [stat.ML])
    Ensembling can improve the performance of Neural Networks, but existing approaches struggle when the architecture likelihood surface has dispersed, narrow peaks. Furthermore, existing methods construct equally weighted ensembles, and this is likely to be vulnerable to the failure modes of the weaker architectures. By viewing ensembling as approximately marginalising over architectures we construct ensembles using the tools of Bayesian Quadrature -- tools which are well suited to the exploration of likelihood surfaces with dispersed, narrow peaks. Additionally, the resulting ensembles consist of architectures weighted commensurate with their performance. We show empirically -- in terms of test likelihood, accuracy, and expected calibration error -- that our method outperforms state-of-the-art baselines, and verify via ablation studies that its components do so independently.
    On the Existence of a Complexity in Fixed Budget Bandit Identification. (arXiv:2303.09468v1 [stat.ML])
    In fixed budget bandit identification, an algorithm sequentially observes samples from several distributions up to a given final time. It then answers a query about the set of distributions. A good algorithm will have a small probability of error. While that probability decreases exponentially with the final time, the best attainable rate is not known precisely for most identification tasks. We show that if a fixed budget task admits a complexity, defined as a lower bound on the probability of error which is attained by a single algorithm on all bandit problems, then that complexity is determined by the best non-adaptive sampling procedure for that problem. We show that there is no such complexity for several fixed budget identification tasks including Bernoulli best arm identification with two arms: there is no single algorithm that attains everywhere the best possible rate.
    Geometric Analysis of Noisy Low-rank Matrix Recovery in the Exact Parameterized and the Overparameterized Regimes. (arXiv:2105.08232v3 [math.OC] UPDATED)
    The matrix sensing problem is an important low-rank optimization problem that has found a wide range of applications, such as matrix completion, phase synchornization/retrieval, robust PCA, and power system state estimation. In this work, we focus on the general matrix sensing problem with linear measurements that are corrupted by random noise. We investigate the scenario where the search rank $r$ is equal to the true rank $r^*$ of the unknown ground truth (the exact parametrized case), as well as the scenario where $r$ is greater than $r^*$ (the overparametrized case). We quantify the role of the restricted isometry property (RIP) in shaping the landscape of the non-convex factorized formulation and assisting with the success of local search algorithms. First, we develop a global guarantee on the maximum distance between an arbitrary local minimizer of the non-convex problem and the ground truth under the assumption that the RIP constant is smaller than $1/(1+\sqrt{r^*/r})$. We then present a local guarantee for problems with an arbitrary RIP constant, which states that any local minimizer is either considerably close to the ground truth or far away from it. More importantly, we prove that this noisy, overparametrized problem exhibits the strict saddle property, which leads to the global convergence of perturbed gradient descent algorithm in polynomial time. The results of this work provide a comprehensive understanding of the geometric landscape of the matrix sensing problem in the noisy and overparametrized regime.
    PyVBMC: Efficient Bayesian inference in Python. (arXiv:2303.09519v1 [stat.ML])
    PyVBMC is a Python implementation of the Variational Bayesian Monte Carlo (VBMC) algorithm for posterior and model inference for black-box computational models (Acerbi, 2018, 2020). VBMC is an approximate inference method designed for efficient parameter estimation and model assessment when model evaluations are mildly-to-very expensive (e.g., a second or more) and/or noisy. Specifically, VBMC computes: - a flexible (non-Gaussian) approximate posterior distribution of the model parameters, from which statistics and posterior samples can be easily extracted; - an approximation of the model evidence or marginal likelihood, a metric used for Bayesian model selection. PyVBMC can be applied to any computational or statistical model with up to roughly 10-15 continuous parameters, with the only requirement that the user can provide a Python function that computes the target log likelihood of the model, or an approximation thereof (e.g., an estimate of the likelihood obtained via simulation or Monte Carlo methods). PyVBMC is particularly effective when the model takes more than about a second per evaluation, with dramatic speed-ups of 1-2 orders of magnitude when compared to traditional approximate inference methods. Extensive benchmarks on both artificial test problems and a large number of real models from the computational sciences, particularly computational and cognitive neuroscience, show that VBMC generally - and often vastly - outperforms alternative methods for sample-efficient Bayesian inference, and is applicable to both exact and simulator-based models (Acerbi, 2018, 2019, 2020). PyVBMC brings this state-of-the-art inference algorithm to Python, along with an easy-to-use Pythonic interface for running the algorithm and manipulating and visualizing its results.
    Identifiability Results for Multimodal Contrastive Learning. (arXiv:2303.09166v1 [cs.LG])
    Contrastive learning is a cornerstone underlying recent progress in multi-view and multimodal learning, e.g., in representation learning with image/caption pairs. While its effectiveness is not yet fully understood, a line of recent work reveals that contrastive learning can invert the data generating process and recover ground truth latent factors shared between views. In this work, we present new identifiability results for multimodal contrastive learning, showing that it is possible to recover shared factors in a more general setup than the multi-view setting studied previously. Specifically, we distinguish between the multi-view setting with one generative mechanism (e.g., multiple cameras of the same type) and the multimodal setting that is characterized by distinct mechanisms (e.g., cameras and microphones). Our work generalizes previous identifiability results by redefining the generative process in terms of distinct mechanisms with modality-specific latent variables. We prove that contrastive learning can block-identify latent factors shared between modalities, even when there are nontrivial dependencies between factors. We empirically verify our identifiability results with numerical simulations and corroborate our findings on a complex multimodal dataset of image/text pairs. Zooming out, our work provides a theoretical basis for multimodal representation learning and explains in which settings multimodal contrastive learning can be effective in practice.
    Noisy Low-rank Matrix Optimization: Geometry of Local Minima and Convergence Rate. (arXiv:2203.03899v3 [math.OC] UPDATED)
    This paper is concerned with low-rank matrix optimization, which has found a wide range of applications in machine learning. This problem in the special case of matrix sensing has been studied extensively through the notion of Restricted Isometry Property (RIP), leading to a wealth of results on the geometric landscape of the problem and the convergence rate of common algorithms. However, the existing results can handle the problem in the case with a general objective function subject to noisy data only when the RIP constant is close to 0. In this paper, we develop a new mathematical framework to solve the above-mentioned problem with a far less restrictive RIP constant. We prove that as long as the RIP constant of the noiseless objective is less than $1/3$, any spurious local solution of the noisy optimization problem must be close to the ground truth solution. By working through the strict saddle property, we also show that an approximate solution can be found in polynomial time. We characterize the geometry of the spurious local minima of the problem in a local region around the ground truth in the case when the RIP constant is greater than $1/3$. Compared to the existing results in the literature, this paper offers the strongest RIP bound and provides a complete theoretical analysis on the global and local optimization landscapes of general low-rank optimization problems under random corruptions from any finite-variance family.
    Unsupervised domain adaptation by learning using privileged information. (arXiv:2303.09350v1 [cs.LG])
    Successful unsupervised domain adaptation (UDA) is guaranteed only under strong assumptions such as covariate shift and overlap between input domains. The latter is often violated in high-dimensional applications such as image classification which, despite this challenge, continues to serve as inspiration and benchmark for algorithm development. In this work, we show that access to side information about examples from the source and target domains can help relax these assumptions and increase sample efficiency in learning, at the cost of collecting a richer variable set. We call this domain adaptation by learning using privileged information (DALUPI). Tailored for this task, we propose a simple two-stage learning algorithm inspired by our analysis and a practical end-to-end algorithm for multi-label image classification. In a suite of experiments, including an application to medical image analysis, we demonstrate that incorporating privileged information in learning can reduce errors in domain transfer compared to classical learning.
    SRMD: Sparse Random Mode Decomposition. (arXiv:2204.06108v2 [eess.SP] UPDATED)
    Signal decomposition and multiscale signal analysis provide many useful tools for time-frequency analysis. We proposed a random feature method for analyzing time-series data by constructing a sparse approximation to the spectrogram. The randomization is both in the time window locations and the frequency sampling, which lowers the overall sampling and computational cost. The sparsification of the spectrogram leads to a sharp separation between time-frequency clusters which makes it easier to identify intrinsic modes, and thus leads to a new data-driven mode decomposition. The applications include signal representation, outlier removal, and mode decomposition. On the benchmark tests, we show that our approach outperforms other state-of-the-art decomposition methods.
    Sequential Gaussian Processes for Online Learning of Nonstationary Functions. (arXiv:1905.10003v4 [stat.ML] UPDATED)
    Many machine learning problems can be framed in the context of estimating functions, and often these are time-dependent functions that are estimated in real-time as observations arrive. Gaussian processes (GPs) are an attractive choice for modeling real-valued nonlinear functions due to their flexibility and uncertainty quantification. However, the typical GP regression model suffers from several drawbacks: 1) Conventional GP inference scales $O(N^{3})$ with respect to the number of observations; 2) Updating a GP model sequentially is not trivial; and 3) Covariance kernels typically enforce stationarity constraints on the function, while GPs with non-stationary covariance kernels are often intractable to use in practice. To overcome these issues, we propose a sequential Monte Carlo algorithm to fit infinite mixtures of GPs that capture non-stationary behavior while allowing for online, distributed inference. Our approach empirically improves performance over state-of-the-art methods for online GP estimation in the presence of non-stationarity in time-series data. To demonstrate the utility of our proposed online Gaussian process mixture-of-experts approach in applied settings, we show that we can sucessfully implement an optimization algorithm using online Gaussian process bandits.
    Bayesian Generalization Error in Linear Neural Networks with Concept Bottleneck Structure and Multitask Formulation. (arXiv:2303.09154v1 [stat.ML])
    Concept bottleneck model (CBM) is a ubiquitous method that can interpret neural networks using concepts. In CBM, concepts are inserted between the output layer and the last intermediate layer as observable values. This helps in understanding the reason behind the outputs generated by the neural networks: the weights corresponding to the concepts from the last hidden layer to the output layer. However, it has not yet been possible to understand the behavior of the generalization error in CBM since a neural network is a singular statistical model in general. When the model is singular, a one to one map from the parameters to probability distributions cannot be created. This non-identifiability makes it difficult to analyze the generalization performance. In this study, we mathematically clarify the Bayesian generalization error and free energy of CBM when its architecture is three-layered linear neural networks. We also consider a multitask problem where the neural network outputs not only the original output but also the concepts. The results show that CBM drastically changes the behavior of the parameter region and the Bayesian generalization error in three-layered linear neural networks as compared with the standard version, whereas the multitask formulation does not.
    On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits. (arXiv:2303.09390v1 [cs.LG])
    We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class up to a bounded misspecification level $\zeta>0$. We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. We show that, when the misspecification level $\zeta$ is dominated by $\tilde O (\Delta / \sqrt{d})$ with $\Delta$ being the minimal sub-optimality gap and $d$ being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound $\tilde O (d^2/\Delta)$ as in the well-specified setting up to logarithmic factors. In addition, we show that an existing algorithm SupLinUCB (Chu et al., 2011) can also achieve a gap-dependent constant regret bound without the knowledge of sub-optimality gap $\Delta$. Together with a lower bound adapted from Lattimore et al. (2020), our result suggests an interplay between misspecification level and the sub-optimality gap: (1) the linear contextual bandit model is efficiently learnable when $\zeta \leq \tilde O(\Delta / \sqrt{d})$; and (2) it is not efficiently learnable when $\zeta \geq \tilde \Omega({\Delta} / {\sqrt{d}})$. Experiments on both synthetic and real-world datasets corroborate our theoretical results.
    Learning ground states of gapped quantum Hamiltonians with Kernel Methods. (arXiv:2303.08902v1 [quant-ph])
    Neural network approaches to approximate the ground state of quantum hamiltonians require the numerical solution of a highly nonlinear optimization problem. We introduce a statistical learning approach that makes the optimization trivial by using kernel methods. Our scheme is an approximate realization of the power method, where supervised learning is used to learn the next step of the power iteration. We show that the ground state properties of arbitrary gapped quantum hamiltonians can be reached with polynomial resources under the assumption that the supervised learning is efficient. Using kernel ridge regression, we provide numerical evidence that the learning assumption is verified by applying our scheme to find the ground states of several prototypical interacting many-body quantum systems, both in one and two dimensions, showing the flexibility of our approach.
    A Stochastic Sequential Quadratic Optimization Algorithm for Nonlinear Equality Constrained Optimization with Rank-Deficient Jacobians. (arXiv:2106.13015v2 [math.OC] UPDATED)
    A sequential quadratic optimization algorithm is proposed for solving smooth nonlinear equality constrained optimization problems in which the objective function is defined by an expectation of a stochastic function. The algorithmic structure of the proposed method is based on a step decomposition strategy that is known in the literature to be widely effective in practice, wherein each search direction is computed as the sum of a normal step (toward linearized feasibility) and a tangential step (toward objective decrease in the null space of the constraint Jacobian). However, the proposed method is unique from others in the literature in that it both allows the use of stochastic objective gradient estimates and possesses convergence guarantees even in the setting in which the constraint Jacobians may be rank deficient. The results of numerical experiments demonstrate that the algorithm offers superior performance when compared to popular alternatives.
    Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold. (arXiv:2303.09261v1 [math.OC])
    We consider the problem of minimizing a non-convex function over a smooth manifold $\mathcal{M}$. We propose a novel algorithm, the Orthogonal Directions Constrained Gradient Method (ODCGM) which only requires computing a projection onto a vector space. ODCGM is infeasible but the iterates are constantly pulled towards the manifold, ensuring the convergence of ODCGM towards $\mathcal{M}$. ODCGM is much simpler to implement than the classical methods which require the computation of a retraction. Moreover, we show that ODCGM exhibits the near-optimal oracle complexities $\mathcal{O}(1/\varepsilon^2)$ and $\mathcal{O}(1/\varepsilon^4)$ in the deterministic and stochastic cases, respectively. Furthermore, we establish that, under an appropriate choice of the projection metric, our method recovers the landing algorithm of Ablin and Peyr\'e (2022), a recently introduced algorithm for optimization over the Stiefel manifold. As a result, we significantly extend the analysis of Ablin and Peyr\'e (2022), establishing near-optimal rates both in deterministic and stochastic frameworks. Finally, we perform numerical experiments which shows the efficiency of ODCGM in a high-dimensional setting.
    Challenges and Opportunities in Quantum Machine Learning. (arXiv:2303.09491v1 [quant-ph])
    At the intersection of machine learning and quantum computing, Quantum Machine Learning (QML) has the potential of accelerating data analysis, especially for quantum data, with applications for quantum materials, biochemistry, and high-energy physics. Nevertheless, challenges remain regarding the trainability of QML models. Here we review current methods and applications for QML. We highlight differences between quantum and classical machine learning, with a focus on quantum neural networks and quantum deep learning. Finally, we discuss opportunities for quantum advantage with QML.
    Neural Networks Efficiently Learn Low-Dimensional Representations with SGD. (arXiv:2209.14863v2 [stat.ML] UPDATED)
    We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input $\boldsymbol{x}\in \mathbb{R}^d$ is Gaussian and the target $y \in \mathbb{R}$ follows a multiple-index model, i.e., $y=g(\langle\boldsymbol{u_1},\boldsymbol{x}\rangle,...,\langle\boldsymbol{u_k},\boldsymbol{x}\rangle)$ with a noisy link function $g$. We prove that the first-layer weights of the NN converge to the $k$-dimensional principal subspace spanned by the vectors $\boldsymbol{u_1},...,\boldsymbol{u_k}$ of the true model, when online SGD with weight decay is used for training. This phenomenon has several important consequences when $k \ll d$. First, by employing uniform convergence on this smaller subspace, we establish a generalization error bound of $O(\sqrt{{kd}/{T}})$ after $T$ iterations of SGD, which is independent of the width of the NN. We further demonstrate that, SGD-trained ReLU NNs can learn a single-index target of the form $y=f(\langle\boldsymbol{u},\boldsymbol{x}\rangle) + \epsilon$ by recovering the principal direction, with a sample complexity linear in $d$ (up to log factors), where $f$ is a monotonic function with at most polynomial growth, and $\epsilon$ is the noise. This is in contrast to the known $d^{\Omega(p)}$ sample requirement to learn any degree $p$ polynomial in the kernel regime, and it shows that NNs trained with SGD can outperform the neural tangent kernel at initialization. Finally, we also provide compressibility guarantees for NNs using the approximate low-rank structure produced by SGD.
    Distributionally Robust Optimization using Cost-Aware Ambiguity Sets. (arXiv:2303.09408v1 [math.OC])
    We present a novel class of ambiguity sets for distributionally robust optimization (DRO). These ambiguity sets, called cost-aware ambiguity sets, are defined as halfspaces which depend on the cost function evaluated at an independent estimate of the optimal solution, thus excluding only those distributions that are expected to have significant impact on the obtained worst-case cost. We show that the resulting DRO method provides both a high-confidence upper bound and a consistent estimator of the out-of-sample expected cost, and demonstrate empirically that it results in less conservative solutions compared to divergence-based ambiguity sets.
    Combining Distance to Class Centroids and Outlier Discounting for Improved Learning with Noisy Labels. (arXiv:2303.09470v1 [cs.LG])
    In this paper, we propose a new approach for addressing the challenge of training machine learning models in the presence of noisy labels. By combining a clever usage of distance to class centroids in the items' latent space with a discounting strategy to reduce the importance of samples far away from all the class centroids (i.e., outliers), our method effectively addresses the issue of noisy labels. Our approach is based on the idea that samples farther away from their respective class centroid in the early stages of training are more likely to be noisy. We demonstrate the effectiveness of our method through extensive experiments on several popular benchmark datasets. Our results show that our approach outperforms the state-of-the-art in this area, achieving significant improvements in classification accuracy when the dataset contains noisy labels.
  • Open

    NVIDIA CEO to Reveal What’s Next for AI at GTC
    The secret’s out. Thanks to ChatGPT, everyone knows about the power of modern AI. To find out what’s coming next, tune in to NVIDIA founder and CEO Jensen Huang’s keynote address at NVIDIA GTC on Tuesday, March 21, at 8 a.m. Pacific. Huang will share his vision for the future of AI and how NVIDIA Read article >  ( 4 min )

  • Open

    Optimism Phase 2 Token Airdrop! | $OP
    submitted by /u/Wise-Hamster-9131 [link] [comments]  ( 41 min )
    A collection of research papers for Reinforcement Learning with Human Feedback (RLHF)
    submitted by /u/OpenDILab [link] [comments]  ( 41 min )
  • Open

    How was GPT4 trained on text-image pairs?
    What kind of dataset was used to train GPT4 on image to text pairs? In their demo they show the AI understanding what is inside an image. I wonder what kind of data was the AI given to be able to explain an image? (like, "here we see the text XYZ in the top corner, and ZWA in the middle of the image, with a dog on the right" etc) submitted by /u/nxtboyIII [link] [comments]  ( 41 min )
    With all of the recent developments with AI, this is worth a watch. Also I'm curious to here yalls thoughts.
    submitted by /u/johnaldmilligan [link] [comments]  ( 41 min )
    I am creatively paralyzed by ChatGPT - stuck in short term replaceability.
    I had literally hundreds of ideas for apps and websites using AI, each of them has been annihilated after 1 hour of research and 5 minutes of using ChatGPT-4. Many people are already building fitness apps, fashion apps, image recognition stuff, but how do they not see the inevitable? All this effort seems useless, all these can be done ALL IN ONE by a chat. We don't even need apps. A prompt is enough. What is the motivation, where do you find any of it in this moment? ​ We are all reasoning like it's a week after the first iPhone came out with the App Store and we are rushing through creating random ass apps and websites with it, without a real advantage. All we are doing is incapsulating some features and selling them in an uglier and less performant, costly, package, in some platform around the world. Why? How are you all not paralyzed by these obvious thoughts? submitted by /u/BetterProphet5585 [link] [comments]  ( 42 min )
    Watch while you're High - Trippy Mushroom AI Dream 196
    submitted by /u/LordPewPew777 [link] [comments]  ( 41 min )
    I created a website about various articles with over 7,000 posts with thumbnail in one week. Everything is automated and powered by GPT-3.5-Turbo, and under 40 dollars.
    Hello everyone. As i side project, I created a website that generated over 7,000 articles in one week, each with roughly 800 to 1000 words, all using the GPT 3.5 Turbo API in a fully automated manner. I created a Python script (also generated by the GPT) where I feed a list of topics, and it generates the content and automatically posts it on WordPress. In addition, I integrated the Google Images API to capture the image and also post it automatically. Currently, I can create around 10 posts per minute. And what about the cost? To generate these 7,000 posts with 7,000 images, I spent $40 so far! So far, however, I don't know how Google or Bing will handle this AI-generated content and if it will affect SEO, but I'm here to check it out. If you are interessed in how i did it and some videos, check my post: https://www.tigove.com/how/how-i-created-a-website-with-7000-post-with-chatgpt/ submitted by /u/maurimbr [link] [comments]  ( 42 min )
    ARK investment management predicted a 99% reduction in AI costs until 2030 (for GPT-3 like performance), Alpaca seems to have done it within 5 weeks of this prediction being published.
    submitted by /u/SuspiciousPillbox [link] [comments]  ( 41 min )
    NEW Microsoft AI Copilot VS Google AI. Who takes the win?
    Main Comparisons between Microsoft's AI Copilot & Google's AI-powered features: Microsoft's AI Copilot: 📅 Integrates with your calendar & email to prepare you for meetings 📄 Generates documents based on existing ones with context 🎨 Creates PowerPoint presentations with styling and pictures 📊 Uses natural language to analyze data in Excel 📝 Captures meeting notes, transcripts, and action items in Teams Google's AI-powered features: 📧 Drafts, replies, summarizes, and prioritizes emails in Gmail 📝 Brainstorms, proofreads, and rewrites content in Docs 🌟 Generates images, audio, and video in Slides 🔢 Autocompletes, generates formulas, and categorizes data in Sheets 📽️ Creates new backgrounds and captures notes in Meet 🗣️ Enables workflows in Chat The race between Microsoft and Google to integrate AI-driven features into their productivity suites is a testament to the power of competition in driving innovation. As both companies strive to outperform each other, they continue to push the boundaries of what's possible in AI-enhanced workflows. This rivalry ultimately benefits us end users. My main takeaway: Overall, Microsoft takes the ultimate win for AI content. The OpenAI investment of a staggering 10 billion has paid off. Google's Bard is nowhere near as close as GPT 3.5 and 4.0. However, for AI features in google workspace and Office, I think both companies have a chance to introduce some unique features that will benefit end users. My bottom line: ‼️ Healthy competition fuels innovation ‼️ submitted by /u/Small-Willingness432 [link] [comments]  ( 42 min )
    [GPT-4 POWERED] We’ve created a mobile IOS AI app that generates text, art, analyzes photos, and more!
    We’ve created a mobile IOS AI app that generates text, art, analyzes photos, and more! I'm the cofounder of a tech startup focused on providing free AI services, we're one of the first mobile all-in-one multipurpose AI apps. We've developed a pretty cool app that offers AI services like image generation, code generation, text generation, story generation, image captioning, and more for free. We're the Swiss Army knife of generative and analytical AI. Recently, we’ve released a new update called "ECF texting experience" that allows users to literally text the AI, and receive generated content based on what you text. The ECF texting experience can be accessed by going to the generate tab. Our analytical services can be accessed by going to the analyze screen. We'd love to have people try the app out, right now we have around 2,000 downloads and we'd like to expand our user base, get feedback, and keep in touch with all of you. We are INCREDIBLY responsive to user feedback at this stage, so recommend to us anything you'd like to see in the future. https://apps.apple.com/us/app/bright-eye/id1593932475 submitted by /u/Psychological_Ad4766 [link] [comments]  ( 7 min )
    GPT-4 Is EPIC - Build A Tetris Game In Seconds - Better Than ChatGPT - Code Refactor - How To Use - Thoroughly Tested GPT4
    submitted by /u/CeFurkan [link] [comments]  ( 41 min )
    Content-Aware ChatGPT on Any Website
    submitted by /u/MarkFulton [link] [comments]  ( 41 min )
    AI website mascots. Thought you guys would like this idea.
    submitted by /u/sidianmsjones [link] [comments]  ( 41 min )
    I used ChatGPT to write a terrible Op-Ed...and it got published
    https://medium.com/@wiroll/fake-news-chatbots-and-the-state-of-journalism-bf95c187e582 Basically...I (ChatGPT) wrote an op-ed with the essential hypothesis of, "let's double speeds in school zones in the name of safety" and...it got published...in a place I don't live...with no verification. Problematic? submitted by /u/KillBosby [link] [comments]  ( 41 min )
    Asking AI to Create Byzantine Iconograph - AI Dream 194
    submitted by /u/LordPewPew777 [link] [comments]  ( 41 min )
    GPT4 Image Generation - Generated Programmatically In C# Unity As Texture
    submitted by /u/TheRPGGamerMan [link] [comments]  ( 6 min )
    One AI Tutor Per Child: Personalized learning is finally here
    submitted by /u/Csai [link] [comments]  ( 41 min )
    As someone who struggles with social anxiety, AI messaging feels like a double-edged sword
    Well, for a person with social anxiety, this is a big yes for me. But ethically I'm really not sure if AI that writes messages and responses is something appropriate. Especially when they are used for ads or marketing. You spend your mental energy considering the message, you feel this connection that naturally appears in every communication, but it ends up it is written by a machine. Does anyone know if this topic is in the discussion? https://preview.redd.it/h6bcpjec05oa1.jpg?width=1080&format=pjpg&auto=webp&s=ba58f62e3a37fc6d688be34dc48c733fa49a1e24 submitted by /u/andosina [link] [comments]  ( 41 min )
    Can anyone explain what training time means for AI models?
    So with large language models they have some set amount of parameters and certain amount of data they train on. What i don't understand is what exactly does training time mean. Like naturally it takes some time to run the data through the system. But what does it mean to let the model train longer? Like whst extra additional things is it doing. Is it just going over the same things in a more precise way or what? Thank you very much for any answers! submitted by /u/HumanSeeing [link] [comments]  ( 41 min )
    What do people use to make AI rappers rap other rappers songs?
    Examples: (28) AI attempts to make Kanye West sing "What's Next" By Drake : GoodAssSub (reddit.com) (28) [AI YE] CASH TO BURN - Kanye West (feat. Ant Clemons) : GoodAssSub (reddit.com) Googling this stuff seems to only bring up outdated stuff submitted by /u/oliveroliv [link] [comments]  ( 41 min )
    GPT-4 given $100 and told to make as much money as possible
    submitted by /u/jaredigital62 [link] [comments]  ( 48 min )
    Universal Income Needed for a World Where AI Puts People Out of Work, TV Host Suggests
    submitted by /u/DCGirl20874 [link] [comments]  ( 41 min )
    Train custom AI models on spreadsheet data with just a few clicks
    submitted by /u/doofdoofdoof [link] [comments]  ( 43 min )
    Wordle? GPTordle! - The ChatGPT Word Guessing game
    submitted by /u/theluk246 [link] [comments]  ( 41 min )
    Looking for interview subject (Preferably Sydney)
    Hi guys, My name is Grace and I’m part of a group of student filmmakers looking for a subject to interview for our documentary project. The film is entitled Charles and the Real World and is an exploration of the boundaries of reality through artificial intelligence. Charles sets on a quest to discover what it means to be real in a world where the deeper you search, the further away the answer becomes. We are looking for someone (preferably based in Sydney, Australia) who has formed an emotional connection (romantic or platonic) with a Replika AI or similar and is willing to share their experiences and feelings, especially regarding the new updates in a safe, friendly and bias-free environment. If you are at all interested or would like more information, please contact me at [charlesandtherealworld@gmail.com](mailto:charlesandtherealworld@gmail.com) We look forward to hearing from you; your expertise would be greatly appreciated. submitted by /u/Sure_Fig2980 [link] [comments]  ( 42 min )
  • Open

    How to connect Text and Images
    Part 2: Understanding Zero-Shot Learning with the CLIP model  ( 11 min )
    How to Connect Text and Images
    Part 1: Understanding Zero-Shot Learning  ( 12 min )
    Understanding Graph Neural Network with hands-on example| Part-2
    Welcome back to the next part of this Blog Series on Graph Neural Networks!  ( 11 min )
    Understanding Graph Neural Network with hands-on example| Part-1
    Hello and welcome to this post, in which I will study a relatively new field in deep learning involving graphs — a very important and…  ( 13 min )
    AI Drug Discovery: How It’s Changing the Game
    AI drug discovery is exploding.  ( 18 min )
  • Open

    Responsible AI at Google Research: The Impact Lab
    Posted by Jamila Smith-Loud, Human Rights & Social Impact Research Lead, Google Research, Responsible AI and Human-Centered Technology Team Globalized technology has the potential to create large-scale societal impact, and having a grounded research approach rooted in existing international human and civil rights standards is a critical component to assuring responsible and ethical AI development and deployment. The Impact Lab team, part of Google’s Responsible AI Team, employs a range of interdisciplinary methodologies to ensure critical and rich analysis of the potential implications of technology development. The team’s mission is to examine socioeconomic and human rights impacts of AI, publish foundational research, and incubate novel mitigations enabling machine learning (ML) …  ( 93 min )
  • Open

    [R] Memory Augmented Large Language Models are Computationally Universal
    submitted by /u/MysteryInc152 [link] [comments]  ( 46 min )
    [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch
    We release the code to reproduce the pre-training of a "Large Language Model" (T5) under a limited budget (1xA100 GPU, ~20 hours) in PyTorch. We start from the randomly initialised T5-base-v1.1 (248M parameters) model implemented in HuggingFace. Next, we pre-train it on the English subset of the C4 dataset and then fine-tune it on Super-Natural Instructions (SNI). In ~20 hours on a single GPU, we achieve ~40 RougeL on the SNI test set, compared to ~42 RougeL of the original model available on HuggingFace Hub and pre-trained through "a combination of model and data parallelism [...] on slices of Cloud TPU Pods", each with 1024 TPUs. Our core contribution is not the T5 model itself, which follows the HuggingFace implementation. Instead, we optimise everything else in the training pipeline to offer you a user-friendly starting template for your NLP application/research. We are keen to hear your suggestions to improve the codebase further. ​ Github: https://github.com/PiotrNawrot/nanoT5 Twitter: https://twitter.com/p_nawrot/status/1636373725397520384 ​ https://preview.redd.it/zluas7u235oa1.png?width=1152&format=png&auto=webp&s=68d413aa702b2160785a9f95e5cb00318fbfcdb4 submitted by /u/korec1234 [link] [comments]  ( 44 min )
    [D] Creating an open platform for collecting corrective feedback on conversational ML products & projects
    Any applied scientist or engineer working with deep learning would tell you that corrective info/feedback is the only way up (especially for massive deep networks). With some of my fellows, we are watching what has been happening in the last couple of months with quite a shock and wonder as the community is throwing valuable feedback (for free) to a (closed source) company from all available online channels (e.g., Reddit, Twitter, Github, blogs), boosting their models (again for free). I should better note here that this is what they need to go from GPT-4 to v5, or GPTX (folks love X these days). There are also valuable calls to action from the community. As a start, community can attempt to create an open initiative to organize all the feedback thrown to open or closed (e.g., GPT-4) conversational models/papers/products. One may argue this will make companies' work even easier, but if there is a resource to mine, it will be mined. Therefore creating a (legal) initiative may work in the favor of the community. We should consider that conversational DL (maybe in the far future AI, not sure about that yet) becoming like the commercial aircraft industry where the only way to succeed is to fall and enforce what went wrong. Although the community is very excited now, folks may (probably will) get saturated, and the new companies and initiatives in the future may not get the same amount of feedback. This may create a monopoly, so it can be a better idea now to discuss the options how we can unify these valuable resources. submitted by /u/eksalant [link] [comments]  ( 45 min )
    [D] Comparison of the Model prediction uncertainty of two different models
    In your career as data scientists have you ever faced the situation where you have to compare the quality of the predictive uncertainty estimation of a machine learning model with an old statistical model that was already in use? if so, how did you do it? i have a bnn trained on some experimental data and a statistical models developed by my department that depends on some parameters estimated through the classic mcmc methods. Both seems to agree well with the experimental data but i wanted to compare the quality of the model predictive uncertainty ​ i thought about comparing the level of calibration of the uncertainty but i am not sure if i have to do it on the test dataset (due to the bnn) or the entire dataset ( due to the fact that for the old statistical model they use mcmc methods on the entire dataset) submitted by /u/ilrazziatore [link] [comments]  ( 45 min )
    [D] paperspace or another cloud service
    i'm working on colab, and i have 25GB of ram but that's not enough for my case and my notebook always end up crashing (only reading data, i haven't even started the training process). i considering passing to paperspace gradient but can i access my data stored in drive with it ? ( in colab it's easy to mount it), and if i move the data from drive to github repo can i access it then ? submitted by /u/maleeka02 [link] [comments]  ( 7 min )
    [N] bloomz.cpp: Run any BLOOM-like model in pure C++
    bloomz.cpp allows running inference of BLOOM-like models in pure C/C++ (inspired by llama.cpp). It supports all models that can be loaded with BloomForCausalLM.from_pretrained(). For example, you can achieve 16 tokens per second on a M1 Pro. submitted by /u/hackerllama [link] [comments]  ( 43 min )
    [D]Will the AI use and distribution be under strict government control?
    I mean, it is not too far fetched to imagine the governments will try to limit the use of AI for deepfakes etc., and mere possession of those AIs capable of those things, or distribution of tools capable of that, will end of the same spectrum as possession / distribution of child porn. I can easily see huge pushback for regulation once we get to stage where everyone can run AIs on their home computers with minimal setup and they will became so good at generating stuff it will not be distinguishable from the real thing. submitted by /u/Adequately_Insane [link] [comments]  ( 45 min )
    [N] A $250k contest to read ancient Roman papyrus scrolls with ML
    Today we launched the Vesuvius Challenge, an open competition to read a set of charred papyrus scrolls that were buried by the eruption of Mount Vesuvius 2000 years ago. The scrolls can't be physically opened, but we have released 3d tomographic x-ray scans of two of them at 8µm resolution. The scans were made at a particle accelerator. A team at UKY led by Prof Brent Seales has very recently demonstrated the ability to detect ink inside the CT scans using CNNs, and so we believe that it is possible for the first time in history to read what's in these scrolls without opening them. There are hundreds of carbonized scrolls that we could read once the technique works – enough to more than double our total corpus of literature from antiquity. Many of us are fans of /r/MachineLearning and we thought this group would be interested in hearing about it! submitted by /u/nat_friedman [link] [comments]  ( 47 min )
    In your experience, are AI Ethics teams valuable/effective? [D]
    Hello! I read the following article about Microsoft laying off their AI Ethics team: https://www.cmswire.com/customer-experience/microsoft-cuts-ai-ethics-and-society-team-as-part-of-layoffs/ In your experience, what value do AI ethics teams add? Do they actually add useful insight, or do they serve more as a PR thing? I’ve heard conflicting anecdotes for each side. Is there anything you think AI ethics as a field can do to be more useful and to get more change? Thanks! submitted by /u/namey-name-name [link] [comments]  ( 54 min )
  • Open

    Two-letter abbreviations.
    Countries and regions have two letter abbreviations (ISO 3166-1) as do languages (ISO 639-1). So do chemical elements, US states, and books filed using the Library of Congress system. I was curious how many of the 676 possible two-letter combinations are used by the abbreviation systems above. About two thirds, not as many as I […] Two-letter abbreviations. first appeared on John D. Cook.  ( 4 min )
    Interpolating rotations with SLERP
    Naive interpolation of rotation matrices does not produce a rotation matrix. That is, if R1 and R2 are rotation (orthogonal) matrices and 0 < t < 1, then is not in general a rotation matrix. You can represent rotations with unit quaternions rather than orthogonal matrices (see details here), so a reasonable approach might be […] Interpolating rotations with SLERP first appeared on John D. Cook.  ( 5 min )
  • Open

    Thoroughly Spell-Checking a PhD Thesis
    I always used aspell to spell check papers. I did not care about setting up a dictionary of words that aspell does not recognize due to time pressure. As papers usually involve few LaTeX files, this was an OK process. For my PhD thesis, however, I needed a more automatic and thorough process. This is because more files are involved and I had to spell check multiple times, several weeks or months apart, throughout the process. In this article, I want to share a semi-automatic but thorough process based on aspell and TeXtidote that worked well for me. The post Thoroughly Spell-Checking a PhD Thesis appeared first on David Stutz.  ( 5 min )
  • Open

    NVIDIA Canvas 1.4 Available With Panorama Beta This Week ‘In the NVIDIA Studio’
    An update is now available for NVIDIA Canvas, the free beta app that harnesses the power of AI to help artists quickly turn simple brushstrokes into realistic landscapes.  ( 6 min )
    Game Like a PC: GeForce NOW Breaks Boundaries Transforming Macs Into Ultimate Gaming PCs
    Disney Dreamlight Valley is streaming from Steam and Epic Games Store on GeForce NOW starting today. It’s one of two new games this week that members can stream with beyond-fast performance using a GeForce NOW Ultimate membership. Game as if using a PC on any device — at up to 4K resolution and 120 frames Read article >  ( 5 min )
    Peter Ma on Using AI to Find Promising Signals of Alien Life
    Peter Ma was bored in his high school computer science class. So he decided to teach himself something new: how to use artificial intelligence to find alien life. That’s how he eventually became the lead author of a groundbreaking study published in Nature Astronomy. The study reveals how Ma and his co-authors used AI to Read article >  ( 4 min )
  • Open

    Alpaca - Train Your GPT-4 for Less Than $100
    submitted by /u/deeplearningperson [link] [comments]  ( 41 min )
  • Open

    What Makes Python a Quick Pick for Data Analysis and Data Science?
    Python comes across as an object-oriented high-level programming language with dynamic semantics that allows rapid application development. It has become a general-purpose programming language for a number of reasons.  It is the ready pick for data science enthusiasts; who look forward to majoring in the field with the requisite essentials. Not just that, Python has… Read More »What Makes Python a Quick Pick for Data Analysis and Data Science? The post What Makes Python a Quick Pick for Data Analysis and Data Science? appeared first on Data Science Central.  ( 20 min )
  • Open

    Self-supervised based general laboratory progress pretrained model for cardiovascular event detection. (arXiv:2303.06980v2 [cs.LG] UPDATED)
    Regular surveillance is an indispensable aspect of managing cardiovascular disorders. Patient recruitment for rare or specific diseases is often limited due to their small patient size and episodic observations, whereas prevalent cases accumulate longitudinal data easily due to regular follow-ups. These data, however, are notorious for their irregularity, temporality, absenteeism, and sparsity. In this study, we leveraged self-supervised learning (SSL) and transfer learning to overcome the above-mentioned barriers, transferring patient progress trends in cardiovascular laboratory parameters from prevalent cases to rare or specific cardiovascular events detection. We pretrained a general laboratory progress (GLP) pretrain model using hypertension patients (who were yet to be diabetic), and transferred their laboratory progress trend to assist in detecting target vessel revascularization (TVR) in percutaneous coronary intervention patients. GLP adopted a two-stage training process that utilized interpolated data, enhancing the performance of SSL. After pretraining GLP, we fine-tuned it for TVR prediction. The proposed two-stage training process outperformed SSL. Upon processing by GLP, the classification demonstrated a marked improvement, increasing from 0.63 to 0.90 in averaged accuracy. All metrics were significantly superior (p < 0.01) to the performance of prior GLP processing. The representation displayed distinct separability independent of algorithmic mechanisms, and diverse data distribution trend. Our approach effectively transferred the progression trends of cardiovascular laboratory parameters from prevalent cases to small-numbered cases, thereby demonstrating its efficacy in aiding the risk assessment of cardiovascular events without limiting to episodic observation. The potential for extending this approach to other laboratory tests and diseases is promising.
    Using Model-Based Trees with Boosting to Fit Low-Order Functional ANOVA Models. (arXiv:2207.06950v3 [stat.ML] UPDATED)
    Low-order functional ANOVA (fANOVA) models have been rediscovered in the machine learning (ML) community under the guise of inherently interpretable machine learning. Explainable Boosting Machines or EBM (Lou et al. 2013) and GAMI-Net (Yang et al. 2021) are two recently proposed ML algorithms for fitting functional main effects and second-order interactions. We propose a new algorithm, called GAMI-Tree, that is similar to EBM, but has a number of features that lead to better performance. It uses model-based trees as base learners and incorporates a new interaction filtering method that is better at capturing the underlying interactions. In addition, our iterative training method converges to a model with better predictive performance, and the embedded purification ensures that interactions are hierarchically orthogonal to main effects. The algorithm does not need extensive tuning, and our implementation is fast and efficient. We use simulated and real datasets to compare the performance and interpretability of GAMI-Tree with EBM and GAMI-Net.
    Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality. (arXiv:2212.09900v2 [cs.LG] UPDATED)
    This paper studies offline policy learning, which aims at utilizing observations collected a priori (from either fixed or adaptively evolving behavior policies) to learn the optimal individualized decision rule in a given class. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics are lower bounded in the offline dataset. In other words, the performance of these methods depends on the worst-case propensity in the offline dataset. As one has no control over the data collection process, this assumption can be unrealistic in many situations, especially when the behavior policies are allowed to evolve over time with diminishing propensities. In this paper, we propose a new algorithm that optimizes lower confidence bounds (LCBs) -- instead of point estimates -- of the policy values. The LCBs are constructed by quantifying the estimation uncertainty of the augmented inverse propensity weighted (AIPW)-type estimators using knowledge of the behavior policies for collecting the offline data. Without assuming any uniform overlap condition, we establish a data-dependent upper bound for the suboptimality of our algorithm, which depends only on (i) the overlap for the optimal policy, and (ii) the complexity of the policy class. As an implication, for adaptively collected data, we ensure efficient policy learning as long as the propensities for optimal actions are lower bounded over time, while those for suboptimal ones are allowed to diminish arbitrarily fast. In our theoretical analysis, we develop a new self-normalized concentration inequality for IPW estimators, generalizing the well-known empirical Bernstein's inequality to unbounded and non-i.i.d. data.
    Artificial Influence: An Analysis Of AI-Driven Persuasion. (arXiv:2303.08721v1 [cs.CY])
    Persuasion is a key aspect of what it means to be human, and is central to business, politics, and other endeavors. Advancements in artificial intelligence (AI) have produced AI systems that are capable of persuading humans to buy products, watch videos, click on search results, and more. Even systems that are not explicitly designed to persuade may do so in practice. In the future, increasingly anthropomorphic AI systems may form ongoing relationships with users, increasing their persuasive power. This paper investigates the uncertain future of persuasive AI systems. We examine ways that AI could qualitatively alter our relationship to and views regarding persuasion by shifting the balance of persuasive power, allowing personalized persuasion to be deployed at scale, powering misinformation campaigns, and changing the way humans can shape their own discourse. We consider ways AI-driven persuasion could differ from human-driven persuasion. We warn that ubiquitous highlypersuasive AI systems could alter our information environment so significantly so as to contribute to a loss of human control of our own future. In response, we examine several potential responses to AI-driven persuasion: prohibition, identification of AI agents, truthful AI, and legal remedies. We conclude that none of these solutions will be airtight, and that individuals and governments will need to take active steps to guard against the most pernicious effects of persuasive AI.
    Generalized Kernel Regularized Least Squares. (arXiv:2209.14355v3 [stat.ML] UPDATED)
    Kernel Regularized Least Squares (KRLS) is a popular method for flexibly estimating models that may have complex relationships between variables. However, its usefulness to many researchers is limited for two reasons. First, existing approaches are inflexible and do not allow KRLS to be combined with theoretically-motivated extensions such as random effects, unregularized fixed effects, or non-Gaussian outcomes. Second, estimation is extremely computationally intensive for even modestly sized datasets. Our paper addresses both concerns by introducing generalized KRLS (gKRLS). We note that KRLS can be re-formulated as a hierarchical model thereby allowing easy inference and modular model construction where KRLS can be used alongside random effects, splines, and unregularized fixed effects. Computationally, we also implement random sketching to dramatically accelerate estimation while incurring a limited penalty in estimation quality. We demonstrate that gKRLS can be fit on datasets with tens of thousands of observations in under one minute. Further, state-of-the-art techniques that require fitting the model over a dozen times (e.g. meta-learners) can be estimated quickly.
    Robust online active learning. (arXiv:2302.00422v2 [stat.ML] UPDATED)
    In many industrial applications, obtaining labeled observations is not straightforward as it often requires the intervention of human experts or the use of expensive testing equipment. In these circumstances, active learning can be highly beneficial in suggesting the most informative data points to be used when fitting a model. Reducing the number of observations needed for model development alleviates both the computational burden required for training and the operational expenses related to labeling. Online active learning, in particular, is useful in high-volume production processes where the decision about the acquisition of the label for a data point needs to be taken within an extremely short time frame. However, despite the recent efforts to develop online active learning strategies, the behavior of these methods in the presence of outliers has not been thoroughly examined. In this work, we investigate the performance of online active linear regression in contaminated data streams. Our study shows that the currently available query strategies are prone to sample outliers, whose inclusion in the training set eventually degrades the predictive performance of the models. To address this issue, we propose a solution that bounds the search area of a conditional D-optimal algorithm and uses a robust estimator. Our approach strikes a balance between exploring unseen regions of the input space and protecting against outliers. Through numerical simulations, we show that the proposed method is effective in improving the performance of online active learning in the presence of outliers, thus expanding the potential applications of this powerful tool.
    FretNet: Continuous-Valued Pitch Contour Streaming for Polyphonic Guitar Tablature Transcription. (arXiv:2212.03023v2 [eess.AS] UPDATED)
    In recent years, the task of Automatic Music Transcription (AMT), whereby various attributes of music notes are estimated from audio, has received increasing attention. At the same time, the related task of Multi-Pitch Estimation (MPE) remains a challenging but necessary component of almost all AMT approaches, even if only implicitly. In the context of AMT, pitch information is typically quantized to the nominal pitches of the Western music scale. Even in more general contexts, MPE systems typically produce pitch predictions with some degree of quantization. In certain applications of AMT, such as Guitar Tablature Transcription (GTT), it is more meaningful to estimate continuous-valued pitch contours. Guitar tablature has the capacity to represent various playing techniques, some of which involve pitch modulation. Contemporary approaches to AMT do not adequately address pitch modulation, and offer only less quantization at the expense of more model complexity. In this paper, we present a GTT formulation that estimates continuous-valued pitch contours, grouping them according to their string and fret of origin. We demonstrate that for this task, the proposed method significantly improves the resolution of MPE and simultaneously yields tablature estimation results competitive with baseline models.
    Offline Learning of Closed-Loop Deep Brain Stimulation Controllers for Parkinson Disease Treatment. (arXiv:2302.02477v3 [cs.LG] UPDATED)
    Deep brain stimulation (DBS) has shown great promise toward treating motor symptoms caused by Parkinson's disease (PD), by delivering electrical pulses to the Basal Ganglia (BG) region of the brain. However, DBS devices approved by the U.S. Food and Drug Administration (FDA) can only deliver continuous DBS (cDBS) stimuli at a fixed amplitude; this energy inefficient operation reduces battery lifetime of the device, cannot adapt treatment dynamically for activity, and may cause significant side-effects (e.g., gait impairment). In this work, we introduce an offline reinforcement learning (RL) framework, allowing the use of past clinical data to train an RL policy to adjust the stimulation amplitude in real time, with the goal of reducing energy use while maintaining the same level of treatment (i.e., control) efficacy as cDBS. Moreover, clinical protocols require the safety and performance of such RL controllers to be demonstrated ahead of deployments in patients. Thus, we also introduce an offline policy evaluation (OPE) method to estimate the performance of RL policies using historical data, before deploying them on patients. We evaluated our framework on four PD patients equipped with the RC+S DBS system, employing the RL controllers during monthly clinical visits, with the overall control efficacy evaluated by severity of symptoms (i.e., bradykinesia and tremor), changes in PD biomakers (i.e., local field potentials), and patient ratings. The results from clinical experiments show that our RL-based controller maintains the same level of control efficacy as cDBS, but with significantly reduced stimulation energy. Further, the OPE method is shown effective in accurately estimating and ranking the expected returns of RL controllers.
    Distinguishing Cause from Effect on Categorical Data: The Uniform Channel Model. (arXiv:2303.08572v1 [cs.LG])
    Distinguishing cause from effect using observations of a pair of random variables is a core problem in causal discovery. Most approaches proposed for this task, namely additive noise models (ANM), are only adequate for quantitative data. We propose a criterion to address the cause-effect problem with categorical variables (living in sets with no meaningful order), inspired by seeing a conditional probability mass function (pmf) as a discrete memoryless channel. We select as the most likely causal direction the one in which the conditional pmf is closer to a uniform channel (UC). The rationale is that, in a UC, as in an ANM, the conditional entropy (of the effect given the cause) is independent of the cause distribution, in agreement with the principle of independence of cause and mechanism. Our approach, which we call the uniform channel model (UCM), thus extends the ANM rationale to categorical variables. To assess how close a conditional pmf (estimated from data) is to a UC, we use statistical testing, supported by a closed-form estimate of a UC channel. On the theoretical front, we prove identifiability of the UCM and show its equivalence with a structural causal model with a low-cardinality exogenous variable. Finally, the proposed method compares favorably with recent state-of-the-art alternatives in experiments on synthetic, benchmark, and real data.
    Deep Calibration With Artificial Neural Network: A Performance Comparison on Option Pricing Models. (arXiv:2303.08760v1 [q-fin.MF])
    This paper explores Artificial Neural Network (ANN) as a model-free solution for a calibration algorithm of option pricing models. We construct ANNs to calibrate parameters for two well-known GARCH-type option pricing models: Duan's GARCH and the classical tempered stable GARCH that significantly improve upon the limitation of the Black-Scholes model but have suffered from computation complexity. To mitigate this technical difficulty, we train ANNs with a dataset generated by Monte Carlo Simulation (MCS) method and apply them to calibrate optimal parameters. The performance results indicate that the ANN approach consistently outperforms MCS and takes advantage of faster computation times once trained. The Greeks of options are also discussed.
    WikiCoder: Learning to Write Knowledge-Powered Code. (arXiv:2303.08574v1 [cs.LG])
    We tackle the problem of automatic generation of computer programs from a few pairs of input-output examples. The starting point of this work is the observation that in many applications a solution program must use external knowledge not present in the examples: we call such programs knowledge-powered since they can refer to information collected from a knowledge graph such as Wikipedia. This paper makes a first step towards knowledge-powered program synthesis. We present WikiCoder, a system building upon state of the art machine-learned program synthesizers and integrating knowledge graphs. We evaluate it to show its wide applicability over different domains and discuss its limitations. WikiCoder solves tasks that no program synthesizers were able to solve before thanks to the use of knowledge graphs, while integrating with recent developments in the field to operate at scale.
    Exploiting 4D CT Perfusion for segmenting infarcted areas in patients with suspected acute ischemic stroke. (arXiv:2303.08757v1 [eess.IV])
    Precise and fast prediction methods for ischemic areas (core and penumbra) in acute ischemic stroke (AIS) patients are of significant clinical interest: they play an essential role in improving diagnosis and treatment planning. Computed Tomography (CT) scan is one of the primary modalities for early assessment in patients with suspected AIS. CT Perfusion (CTP) is often used as a primary assessment to determine stroke location, severity, and volume of ischemic lesions. Current automatic segmentation methods for CTP mostly use already processed 3D color maps conventionally used for visual assessment by radiologists as input. Alternatively, the raw CTP data is used on a slice-by-slice basis as 2D+time input, where the spatial information over the volume is ignored. In this paper, we investigate different methods to utilize the entire 4D CTP as input to fully exploit the spatio-temporal information. This leads us to propose a novel 4D convolution layer. Our comprehensive experiments on a local dataset comprised of 152 patients divided into three groups show that our proposed models generate more precise results than other methods explored. A Dice Coefficient of 0.70 and 0.45 is achieved for penumbra and core areas, respectively. The code is available on https://github.com/Biomedical-Data-Analysis-Laboratory/4D-mJ-Net.git.
    A Cross-institutional Evaluation on Breast Cancer Phenotyping NLP Algorithms on Electronic Health Records. (arXiv:2303.08448v1 [cs.CL])
    Objective: The generalizability of clinical large language models is usually ignored during the model development process. This study evaluated the generalizability of BERT-based clinical NLP models across different clinical settings through a breast cancer phenotype extraction task. Materials and Methods: Two clinical corpora of breast cancer patients were collected from the electronic health records from the University of Minnesota and the Mayo Clinic, and annotated following the same guideline. We developed three types of NLP models (i.e., conditional random field, bi-directional long short-term memory and CancerBERT) to extract cancer phenotypes from clinical texts. The models were evaluated for their generalizability on different test sets with different learning strategies (model transfer vs. locally trained). The entity coverage score was assessed with their association with the model performances. Results: We manually annotated 200 and 161 clinical documents at UMN and MC, respectively. The corpora of the two institutes were found to have higher similarity between the target entities than the overall corpora. The CancerBERT models obtained the best performances among the independent test sets from two clinical institutes and the permutation test set. The CancerBERT model developed in one institute and further fine-tuned in another institute achieved reasonable performance compared to the model developed on local data (micro-F1: 0.925 vs 0.932). Conclusions: The results indicate the CancerBERT model has the best learning ability and generalizability among the three types of clinical NLP models. The generalizability of the models was found to be correlated with the similarity of the target entities between the corpora.
    Robust and Provably Monotonic Networks. (arXiv:2112.00038v2 [cs.LG] UPDATED)
    The Lipschitz constant of the map between the input and output space represented by a neural network is a natural metric for assessing the robustness of the model. We present a new method to constrain the Lipschitz constant of dense deep learning models that can also be generalized to other architectures. The method relies on a simple weight normalization scheme during training that ensures the Lipschitz constant of every layer is below an upper limit specified by the analyst. A simple monotonic residual connection can then be used to make the model monotonic in any subset of its inputs, which is useful in scenarios where domain knowledge dictates such dependence. Examples can be found in algorithmic fairness requirements or, as presented here, in the classification of the decays of subatomic particles produced at the CERN Large Hadron Collider. Our normalization is minimally constraining and allows the underlying architecture to maintain higher expressiveness compared to other techniques which aim to either control the Lipschitz constant of the model or ensure its monotonicity. We show how the algorithm was used to train a powerful, robust, and interpretable discriminator for heavy-flavor-quark decays, which has been adopted for use as the primary data-selection algorithm in the LHCb real-time data-processing system in the current LHC data-taking period known as Run 3. In addition, our algorithm has also achieved state-of-the-art performance on benchmarks in medicine, finance, and other applications.
    Automated patent extraction powers generative modeling in focused chemical spaces. (arXiv:2303.08272v1 [physics.chem-ph])
    Deep generative models have emerged as an exciting avenue for inverse molecular design, with progress coming from the interplay between training algorithms and molecular representations. One of the key challenges in their applicability to materials science and chemistry has been the lack of access to sizeable training datasets with property labels. Published patents contain the first disclosure of new materials prior to their publication in journals, and are a vast source of scientific knowledge that has remained relatively untapped in the field of data-driven molecular design. Because patents are filed seeking to protect specific uses, molecules in patents can be considered to be weakly labeled into application classes. Furthermore, patents published by the US Patent and Trademark Office (USPTO) are downloadable and have machine-readable text and molecular structures. In this work, we train domain-specific generative models using patent data sources by developing an automated pipeline to go from USPTO patent digital files to the generation of novel candidates with minimal human intervention. We test the approach on two in-class extracted datasets, one in organic electronics and another in tyrosine kinase inhibitors. We then evaluate the ability of generative models trained on these in-class datasets on two categories of tasks (distribution learning and property optimization), identify strengths and limitations, and suggest possible explanations and remedies that could be used to overcome these in practice.
    Systematic design space exploration by learning the explored space using Machine Learning. (arXiv:2303.08249v1 [cs.LG])
    Current practice in parameter space exploration in euclidean space is dominated by randomized sampling or design of experiment methods. The biggest issue with these methods is not keeping track of what part of parameter space has been explored and what has not. In this context, we utilize the geometric learning of explored data space using modern machine learning methods to keep track of already explored regions and samples from the regions that are unexplored. For this purpose, we use a modified version of a robust random-cut forest along with other heuristic-based approaches. We demonstrate our method and its progression in two-dimensional Euclidean space but it can be extended to any dimension since the underlying method is generic.
    Efficient and Secure Federated Learning for Financial Applications. (arXiv:2303.08355v1 [cs.LG])
    The conventional machine learning (ML) and deep learning approaches need to share customers' sensitive information with an external credit bureau to generate a prediction model that opens the door to privacy leakage. This leakage risk makes financial companies face an enormous challenge in their cooperation. Federated learning is a machine learning setting that can protect data privacy, but the high communication cost is often the bottleneck of the federated systems, especially for large neural networks. Limiting the number and size of communications is necessary for the practical training of large neural structures. Gradient sparsification has received increasing attention as a method to reduce communication cost, which only updates significant gradients and accumulates insignificant gradients locally. However, the secure aggregation framework cannot directly use gradient sparsification. This article proposes two sparsification methods to reduce communication cost in federated learning. One is a time-varying hierarchical sparsification method for model parameter update, which solves the problem of maintaining model accuracy after high ratio sparsity. It can significantly reduce the cost of a single communication. The other is to apply the sparsification method to the secure aggregation framework. We sparse the encryption mask matrix to reduce the cost of communication while protecting privacy. Experiments show that under different Non-IID experiment settings, our method can reduce the upload communication cost to about 2.9% to 18.9% of the conventional federated learning algorithm when the sparse rate is 0.01.
    Graph Neural Network Surrogates of Fair Graph Filtering. (arXiv:2303.08157v1 [cs.LG])
    Graph filters that transform prior node values to posterior scores via edge propagation often support graph mining tasks affecting humans, such as recommendation and ranking. Thus, it is important to make them fair in terms of satisfying statistical parity constraints between groups of nodes (e.g., distribute score mass between genders proportionally to their representation). To achieve this while minimally perturbing the original posteriors, we introduce a filter-aware universal approximation framework for posterior objectives. This defines appropriate graph neural networks trained at runtime to be similar to filters but also locally optimize a large class of objectives, including fairness-aware ones. Experiments on a collection of 8 filters and 5 graphs show that our approach performs equally well or better than alternatives in meeting parity constraints while preserving the AUC of score-based community member recommendation and creating minimal utility loss in prior diffusion.
    Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer's Disease Detection. (arXiv:2303.08216v1 [eess.IV])
    Neuroimaging of large populations is valuable to identify factors that promote or resist brain disease, and to assist diagnosis, subtyping, and prognosis. Data-driven models such as convolutional neural networks (CNNs) have increasingly been applied to brain images to perform diagnostic and prognostic tasks by learning robust features. Vision transformers (ViT) - a new class of deep learning architectures - have emerged in recent years as an alternative to CNNs for several computer vision applications. Here we tested variants of the ViT architecture for a range of desired neuroimaging downstream tasks based on difficulty, in this case for sex and Alzheimer's disease (AD) classification based on 3D brain MRI. In our experiments, two vision transformer architecture variants achieved an AUC of 0.987 for sex and 0.892 for AD classification, respectively. We independently evaluated our models on data from two benchmark AD datasets. We achieved a performance boost of 5% and 9-10% upon fine-tuning vision transformer models pre-trained on synthetic (generated by a latent diffusion model) and real MRI scans, respectively. Our main contributions include testing the effects of different ViT training strategies including pre-training, data augmentation and learning rate warm-ups followed by annealing, as pertaining to the neuroimaging domain. These techniques are essential for training ViT-like models for neuroimaging applications where training data is usually limited. We also analyzed the effect of the amount of training data utilized on the test-time performance of the ViT via data-model scaling curves.
    Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift. (arXiv:2302.10160v2 [stat.ME] UPDATED)
    We develop and analyze a principled approach to kernel ridge regression under covariate shift. The goal is to learn a regression function with small mean squared error over a target distribution, based on unlabeled data from there and labeled data that may have a different feature distribution. We propose to split the labeled data into two subsets and conduct kernel ridge regression on them separately to obtain a collection of candidate models and an imputation model. We use the latter to fill the missing labels and then select the best candidate model accordingly. Our non-asymptotic excess risk bounds show that in quite general scenarios, our estimator adapts to the structure of the target distribution as well as the covariate shift. It achieves the minimax optimal error rate up to a logarithmic factor. The use of pseudo-labels in model selection does not have major negative impacts.
    Measuring The Impact Of Programming Language Distribution. (arXiv:2302.01973v2 [cs.LG] UPDATED)
    Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present the BabelCode framework for execution-based evaluation of any benchmark in any language. BabelCode enables new investigations into the qualitative performance of models' memory, runtime, and individual test case results. Additionally, we present a new code translation dataset called Translating Python Programming Puzzles (TP3) from the Python Programming Puzzles (Schuster et al. 2021) benchmark that involves translating expert-level python functions to any language. With both BabelCode and the TP3 benchmark, we investigate if balancing the distributions of 14 languages in a training dataset improves a large language model's performance on low-resource languages. Training a model on a balanced corpus results in, on average, 12.34% higher $pass@k$ across all tasks and languages compared to the baseline. We find that this strategy achieves 66.48% better $pass@k$ on low-resource languages at the cost of only a 12.94% decrease to high-resource languages. In our three translation tasks, this strategy yields, on average, 30.77% better low-resource $pass@k$ while having 19.58% worse high-resource $pass@k$.
    Similarity of Neural Architectures Based on Input Gradient Transferability. (arXiv:2210.11407v2 [cs.LG] UPDATED)
    In recent years, a huge amount of deep neural architectures have been developed for image classification. It remains curious whether these models are similar or different and what factors contribute to their similarities or differences. To address this question, we aim to design a quantitative and scalable similarity function between neural architectures. We utilize adversarial attack transferability, which has information related to input gradients and decision boundaries that are widely used to understand model behaviors. We conduct a large-scale analysis on 69 state-of-the-art ImageNet classifiers using our proposed similarity function to answer the question. Moreover, we observe neural architecture-related phenomena using model similarity that model diversity can lead to better performance on model ensembles and knowledge distillation under specific conditions. Our results provide insights into why the development of diverse neural architectures with distinct components is necessary.
    Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism. (arXiv:2212.13703v2 [eess.AS] UPDATED)
    This paper proposes a novel sequence-to-sequence (seq2seq) model with a musical note position-aware attention mechanism for singing voice synthesis (SVS). A seq2seq modeling approach that can simultaneously perform acoustic and temporal modeling is attractive. However, due to the difficulty of the temporal modeling of singing voices, many recent SVS systems with an encoder-decoder-based model still rely on explicitly on duration information generated by additional modules. Although some studies perform simultaneous modeling using seq2seq models with an attention mechanism, they have insufficient robustness against temporal modeling. The proposed attention mechanism is designed to estimate the attention weights by considering the rhythm given by the musical score. Furthermore, several techniques are also introduced to improve the modeling performance of the singing voice. Experimental results indicated that the proposed model is effective in terms of both naturalness and robustness of timing.
    Generalization in Neural Networks: A Broad Survey. (arXiv:2209.01610v2 [cs.LG] UPDATED)
    This paper reviews concepts, modeling approaches, and recent findings along a spectrum of different levels of abstraction of neural network models including generalization across (1) Samples, (2) Distributions, (3) Domains, (4) Tasks, (5) Modalities, and (6) Scopes. Results on (1) sample generalization show that, in the case of ImageNet, nearly all the recent improvements reduced training error while overfitting stayed flat; with nearly all the training error eliminated, future progress will require a focus on reducing overfitting. Perspectives from statistics highlight how (2) distribution generalization can be viewed alternately as a change in sample weights or a change in the input-output relationship; thus, techniques that have been successful in domain generalization have the potential to be applied to difficult forms of sample or distribution generalization. Transfer learning approaches to (3) domain generalization are summarized, as are recent advances and the wealth of domain adaptation benchmark datasets available. Recent breakthroughs surveyed in (4) task generalization include few-shot meta-learning approaches and the BERT NLP engine, and recent (5) modality generalization studies are discussed that integrate image and text data and that apply a biologically-inspired network across olfactory, visual, and auditory modalities. Recent (6) scope generalization results are reviewed that embed knowledge graphs into deep NLP approaches. Additionally, concepts from neuroscience are discussed on the modular architecture of brains and the steps by which dopamine-driven conditioning leads to abstract thinking.
    Online Active Learning for Soft Sensor Development using Semi-Supervised Autoencoders. (arXiv:2212.13067v2 [cs.LG] UPDATED)
    Data-driven soft sensors are extensively used in industrial and chemical processes to predict hard-to-measure process variables whose real value is difficult to track during routine operations. The regression models used by these sensors often require a large number of labeled examples, yet obtaining the label information can be very expensive given the high time and cost required by quality inspections. In this context, active learning methods can be highly beneficial as they can suggest the most informative labels to query. However, most of the active learning strategies proposed for regression focus on the offline setting. In this work, we adapt some of these approaches to the stream-based scenario and show how they can be used to select the most informative data points. We also demonstrate how to use a semi-supervised architecture based on orthogonal autoencoders to learn salient features in a lower dimensional space. The Tennessee Eastman Process is used to compare the predictive performance of the proposed approaches.
    Encoding Domain Knowledge in Multi-view Latent Variable Models: A Bayesian Approach with Structured Sparsity. (arXiv:2204.06242v2 [stat.ML] UPDATED)
    Many real-world systems are described not only by data from a single source but via multiple data views. In genomic medicine, for instance, patients can be characterized by data from different molecular layers. Latent variable models with structured sparsity are a commonly used tool for disentangling variation within and across data views. However, their interpretability is cumbersome since it requires a direct inspection and interpretation of each factor from domain experts. Here, we propose MuVI, a novel multi-view latent variable model based on a modified horseshoe prior for modeling structured sparsity. This facilitates the incorporation of limited and noisy domain knowledge, thereby allowing for an analysis of multi-view data in an inherently explainable manner. We demonstrate that our model (i) outperforms state-of-the-art approaches for modeling structured sparsity in terms of the reconstruction error and the precision/recall, (ii) robustly integrates noisy domain expertise in the form of feature sets, (iii) promotes the identifiability of factors and (iv) infers interpretable and biologically meaningful axes of variation in a real-world multi-view dataset of cancer patients.
    Null Hypothesis Test for Anomaly Detection. (arXiv:2210.02226v3 [hep-ph] UPDATED)
    We extend the use of Classification Without Labels for anomaly detection with a hypothesis test designed to exclude the background-only hypothesis. By testing for statistical independence of the two discriminating dataset regions, we are able to exclude the background-only hypothesis without relying on fixed anomaly score cuts or extrapolations of background estimates between regions. The method relies on the assumption of conditional independence of anomaly score features and dataset regions, which can be ensured using existing decorrelation techniques. As a benchmark example, we consider the LHC Olympics dataset where we show that mutual information represents a suitable test for statistical independence and our method exhibits excellent and robust performance at different signal fractions even in presence of realistic feature correlations.
    Age of Information in Deep Learning-Driven Task-Oriented Communications. (arXiv:2301.04298v2 [cs.IT] UPDATED)
    This paper studies the notion of age in task-oriented communications that aims to execute a task at a receiver utilizing the data at its transmitter. The transmitter-receiver operations are modeled as an encoder-decoder pair that is jointly trained while considering channel effects. The encoder converts data samples into feature vectors of small dimension and transmits them with a small number of channel uses thereby reducing the number of transmissions and latency. Instead of reconstructing input samples, the decoder performs a task, e.g., classification, on the received signals. Applying different deep neural networks of encoder-decoder pairs on MNIST and CIFAR-10 image datasets, the classifier accuracy is shown to increase with the number of channel uses at the expense of longer service time. The peak age of task information (PAoTI) is introduced to analyze this accuracy-latency tradeoff when the age grows unless a received signal is classified correctly. By incorporating channel and traffic effects, design guidelines are obtained for task-oriented communications by characterizing how the PAoTI first decreases and then increases with the number of channel uses. A dynamic update mechanism is presented to adapt the number of channel uses to channel and traffic conditions, and reduce the PAoTI in task-oriented communications.
    Interpretable Outlier Summarization. (arXiv:2303.06261v2 [cs.LG] UPDATED)
    Outlier detection is critical in real applications to prevent financial fraud, defend network intrusions, or detecting imminent device failures. To reduce the human effort in evaluating outlier detection results and effectively turn the outliers into actionable insights, the users often expect a system to automatically produce interpretable summarizations of subgroups of outlier detection results. Unfortunately, to date no such systems exist. To fill this gap, we propose STAIR which learns a compact set of human understandable rules to summarize and explain the anomaly detection results. Rather than use the classical decision tree algorithms to produce these rules, STAIR proposes a new optimization objective to produce a small number of rules with least complexity, hence strong interpretability, to accurately summarize the detection results. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets which are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate, compared to the decision tree methods.
    Flex-Net: A Graph Neural Network Approach to Resource Management in Flexible Duplex Networks. (arXiv:2301.11166v2 [cs.NI] UPDATED)
    Flexible duplex networks allow users to dynamically employ uplink and downlink channels without static time scheduling, thereby utilizing the network resources efficiently. This work investigates the sum-rate maximization of flexible duplex networks. In particular, we consider a network with pairwise-fixed communication links. Corresponding combinatorial optimization is a non-deterministic polynomial (NP)-hard without a closed-form solution. In this respect, the existing heuristics entail high computational complexity, raising a scalability issue in large networks. Motivated by the recent success of Graph Neural Networks (GNNs) in solving NP-hard wireless resource management problems, we propose a novel GNN architecture, named Flex-Net, to jointly optimize the communication direction and transmission power. The proposed GNN produces near-optimal performance meanwhile maintaining a low computational complexity compared to the most commonly used techniques. Furthermore, our numerical results shed light on the advantages of using GNNs in terms of sample complexity, scalability, and generalization capability.
    SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning. (arXiv:2301.10921v2 [cs.LG] UPDATED)
    The critical challenge of Semi-Supervised Learning (SSL) is how to effectively leverage the limited labeled data and massive unlabeled data to improve the model's generalization performance. In this paper, we first revisit the popular pseudo-labeling methods via a unified sample weighting formulation and demonstrate the inherent quantity-quality trade-off problem of pseudo-labeling with thresholding, which may prohibit learning. To this end, we propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training, effectively exploiting the unlabeled data. We derive a truncated Gaussian function to weight samples based on their confidence, which can be viewed as a soft version of the confidence threshold. We further enhance the utilization of weakly-learned classes by proposing a uniform alignment approach. In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
    Quantifying the Effect of Feedback Frequency in Interactive Reinforcement Learning for Robotic Tasks. (arXiv:2207.09845v2 [cs.RO] UPDATED)
    Reinforcement learning (RL) has become widely adopted in robot control. Despite many successes, one major persisting problem can be very low data efficiency. One solution is interactive feedback, which has been shown to speed up RL considerably. As a result, there is an abundance of different strategies, which are, however, primarily tested on discrete grid-world and small scale optimal control scenarios. In the literature, there is no consensus about which feedback frequency is optimal or at which time the feedback is most beneficial. To resolve these discrepancies we isolate and quantify the effect of feedback frequency in robotic tasks with continuous state and action spaces. The experiments encompass inverse kinematics learning for robotic manipulator arms of different complexity. We show that seemingly contradictory reported phenomena occur at different complexity levels. Furthermore, our results suggest that no single ideal feedback frequency exists. Rather that feedback frequency should be changed as the agent's proficiency in the task increases.
    Biologically-Inspired Continual Learning of Human Motion Sequences. (arXiv:2211.05231v2 [cs.CV] UPDATED)
    This work proposes a model for continual learning on tasks involving temporal sequences, specifically, human motions. It improves on a recently proposed brain-inspired replay model (BI-R) by building a biologically-inspired conditional temporal variational autoencoder (BI-CTVAE), which instantiates a latent mixture-of-Gaussians for class representation. We investigate a novel continual-learning-to-generate (CL2Gen) scenario where the model generates motion sequences of different classes. The generative accuracy of the model is tested over a set of tasks. The final classification accuracy of BI-CTVAE on a human motion dataset after sequentially learning all action classes is 78%, which is 63% higher than using no-replay, and only 5.4% lower than a state-of-the-art offline trained GRU model.
    Stabilizing and Improving Federated Learning with Non-IID Data and Client Dropout. (arXiv:2303.06314v2 [cs.LG] UPDATED)
    The label distribution skew induced data heterogeniety has been shown to be a significant obstacle that limits the model performance in federated learning, which is particularly developed for collaborative model training over decentralized data sources while preserving user privacy. This challenge could be more serious when the participating clients are in unstable circumstances and dropout frequently. Previous work and our empirical observations demonstrate that the classifier head for classification task is more sensitive to label skew and the unstable performance of FedAvg mainly lies in the imbalanced training samples across different classes. The biased classifier head will also impact the learning of feature representations. Therefore, maintaining a balanced classifier head is of significant importance for building a better global model. To this end, we propose a simple yet effective framework by introducing a prior-calibrated softmax function for computing the cross-entropy loss and a prototype-based feature augmentation scheme to re-balance the local training, which are lightweight for edge devices and can facilitate the global model aggregation. The improved model performance over existing baselines in the presence of non-IID data and client dropout is demonstrated by conducting extensive experiments on benchmark classification tasks.
    DACOS-A Manually Annotated Dataset of Code Smells. (arXiv:2303.08729v1 [cs.SE])
    Researchers apply machine-learning techniques for code smell detection to counter the subjectivity of many code smells. Such approaches need a large, manually annotated dataset for training and benchmarking. Existing literature offers a few datasets; however, they are small in size and, more importantly, do not focus on the subjective code snippets. In this paper, we present DACOS, a manually annotated dataset containing 10,267 annotations for 5,192 code snippets. The dataset targets three kinds of code smells at different granularity: multifaceted abstraction, complex method, and long parameter list. The dataset is created in two phases. The first phase helps us identify the code snippets that are potentially subjective by determining the thresholds of metrics used to detect a smell. The second phase collects annotations for potentially subjective snippets. We also offer an extended dataset DACOSX that includes definitely benign and definitely smelly snippets by using the thresholds identified in the first phase. We have developed TagMan, a web application to help annotators view and mark the snippets one-by-one and record the provided annotations. We make the datasets and the web application accessible publicly. This dataset will help researchers working on smell detection techniques to build relevant and context-aware machine-learning models.
    Visual Reinforcement Learning with Self-Supervised 3D Representations. (arXiv:2210.07241v2 [cs.LG] UPDATED)
    A prominent approach to visual Reinforcement Learning (RL) is to learn an internal state representation using self-supervised methods, which has the potential benefit of improved sample-efficiency and generalization through additional learning signal and inductive biases. However, while the real world is inherently 3D, prior efforts have largely been focused on leveraging 2D computer vision techniques as auxiliary self-supervision. In this work, we present a unified framework for self-supervised learning of 3D representations for motor control. Our proposed framework consists of two phases: a pretraining phase where a deep voxel-based 3D autoencoder is pretrained on a large object-centric dataset, and a finetuning phase where the representation is jointly finetuned together with RL on in-domain data. We empirically show that our method enjoys improved sample efficiency in simulated manipulation tasks compared to 2D representation learning methods. Additionally, our learned policies transfer zero-shot to a real robot setup with only approximate geometric correspondence, and successfully solve motor control tasks that involve grasping and lifting from a single, uncalibrated RGB camera. Code and videos are available at https://yanjieze.com/3d4rl/ .
    Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring. (arXiv:2303.08271v1 [cs.AI])
    We study Markov decision processes (MDPs), where agents have direct control over when and how they gather information, as formalized by action-contingent noiselessly observable MDPs (ACNO-MPDs). In these models, actions consist of two components: a control action that affects the environment, and a measurement action that affects what the agent can observe. To solve ACNO-MDPs, we introduce the act-then-measure (ATM) heuristic, which assumes that we can ignore future state uncertainty when choosing control actions. We show how following this heuristic may lead to shorter policy computation times and prove a bound on the performance loss incurred by the heuristic. To decide whether or not to take a measurement action, we introduce the concept of measuring value. We develop a reinforcement learning algorithm based on the ATM heuristic, using a Dyna-Q variant adapted for partially observable domains, and showcase its superior performance compared to prior methods on a number of partially-observable environments.
    Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting. (arXiv:2110.03135v3 [cs.LG] UPDATED)
    We show that label noise exists in adversarial training. Such label noise is due to the mismatch between the true label distribution of adversarial examples and the label inherited from clean examples - the true label distribution is distorted by the adversarial perturbation, but is neglected by the common practice that inherits labels from clean examples. Recognizing label noise sheds insights on the prevalence of robust overfitting in adversarial training, and explains its intriguing dependence on perturbation radius and data quality. Also, our label noise perspective aligns well with our observations of the epoch-wise double descent in adversarial training. Guided by our analyses, we proposed a method to automatically calibrate the label to address the label noise and robust overfitting. Our method achieves consistent performance improvements across various models and datasets without introducing new hyper-parameters or additional tuning.
    Masked Vision and Language Modeling for Multi-modal Representation Learning. (arXiv:2208.02131v2 [cs.CV] UPDATED)
    In this paper, we study how to use masked signal modeling in vision and language (V+L) representation learning. Instead of developing masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality. This is motivated by the nature of image-text paired data that both of the image and the text convey almost the same information but in different formats. The masked signal reconstruction of one modality conditioned on another modality can also implicitly learn cross-modal alignment between language tokens and image patches. Our experiments on various V+L tasks show that the proposed method, along with common V+L alignment losses, achieves state-of-the-art performance in the regime of millions of pre-training data. Also, we outperforms the other competitors by a significant margin in limited data scenarios.
    The Devil's Advocate: Shattering the Illusion of Unexploitable Data using Diffusion Models. (arXiv:2303.08500v1 [cs.LG])
    Protecting personal data against the exploitation of machine learning models is of paramount importance. Recently, availability attacks have shown great promise to provide an extra layer of protection against the unauthorized use of data to train neural networks. These methods aim to add imperceptible noise to clean data so that the neural networks cannot extract meaningful patterns from the protected data, claiming that they can make personal data "unexploitable." In this paper, we provide a strong countermeasure against such approaches, showing that unexploitable data might only be an illusion. In particular, we leverage the power of diffusion models and show that a carefully designed denoising process can defuse the ramifications of the data-protecting perturbations. We rigorously analyze our algorithm, and theoretically prove that the amount of required denoising is directly related to the magnitude of the data-protecting perturbations. Our approach, called AVATAR, delivers state-of-the-art performance against a suite of recent availability attacks in various scenarios, outperforming adversarial training. Our findings call for more research into making personal data unexploitable, showing that this goal is far from over.
    Dodging DeepFake Detection via Implicit Spatial-Domain Notch Filtering. (arXiv:2009.09213v4 [cs.CV] UPDATED)
    The current high-fidelity generation and high-precision detection of DeepFake images are at an arms race. We believe that producing DeepFakes that are highly realistic and 'detection evasive' can serve the ultimate goal of improving future generation DeepFake detection capabilities. In this paper, we propose a simple yet powerful pipeline to reduce the artifact patterns of fake images without hurting image quality by performing implicit spatial-domain notch filtering. We first demonstrate that frequency-domain notch filtering, although famously shown to be effective in removing periodic noise in the spatial domain, is infeasible for our task at hand due to the manual designs required for the notch filters. We, therefore, resort to a learning-based approach to reproduce the notch filtering effects, but solely in the spatial domain. We adopt a combination of adding overwhelming spatial noise for breaking the periodic noise pattern and deep image filtering to reconstruct the noise-free fake images, and we name our method DeepNotch. Deep image filtering provides a specialized filter for each pixel in the noisy image, producing filtered images with high fidelity compared to their DeepFake counterparts. Moreover, we also use the semantic information of the image to generate an adversarial guidance map to add noise intelligently. Our large-scale evaluation on 3 representative state-of-the-art DeepFake detection methods (tested on 16 types of DeepFakes) has demonstrated that our technique significantly reduces the accuracy of these 3 fake image detection methods, 36.79% on average and up to 97.02% in the best case.
    Thunderstorm nowcasting with deep learning: a multi-hazard data fusion model. (arXiv:2211.01001v2 [physics.ao-ph] UPDATED)
    Predictions of thunderstorm-related hazards are needed in several sectors, including first responders, infrastructure management and aviation. To address this need, we present a deep learning model that can be adapted to different hazard types. The model can utilize multiple data sources; we use data from weather radar, lightning detection, satellite visible/infrared imagery, numerical weather prediction and digital elevation models. We demonstrate the ability of the model to predict lightning, hail and heavy precipitation probabilistically on a 1 km resolution grid, with a temporal resolution of 5 min and lead times up to 60 min. Shapley values quantify the importance of the different data sources, showing that the weather radar products are the most important predictors for all three hazard types.
    Load Encoding for Learning AC-OPF. (arXiv:2101.03973v2 [eess.SY] UPDATED)
    The AC Optimal Power Flow (AC-OPF) problem is a core building block in electrical transmission system. It seeks the most economical active and reactive generation dispatch to meet demands while satisfying transmission operational limits. It is often solved repeatedly, especially in regions with large penetration of wind farms to avoid violating operational and physical limits. Recent work has shown that deep learning techniques have huge potential in providing accurate approximations of AC-OPF solutions. However, deep learning approaches often suffer from scalability issues, especially when applied to real life power grids. This paper focuses on the scalability limitation and proposes a load compression embedding scheme to reduce training model sizes using a 3-step approach. The approach is evaluated experimentally on large-scale test cases from the PGLib, and produces an order of magnitude improvements in training convergence and prediction accuracy.
    On Stability and Generalization of Bilevel Optimization Problem. (arXiv:2210.01063v3 [cs.LG] UPDATED)
    (Stochastic) bilevel optimization is a frequently encountered problem in machine learning with a wide range of applications such as meta-learning, hyper-parameter optimization, and reinforcement learning. Most of the existing studies on this problem only focused on analyzing the convergence or improving the convergence rate, while little effort has been devoted to understanding its generalization behaviors. In this paper, we conduct a thorough analysis on the generalization of first-order (gradient-based) methods for the bilevel optimization problem. We first establish a fundamental connection between algorithmic stability and generalization error in different forms and give a high probability generalization bound which improves the previous best one from $\bigO(\sqrt{n})$ to $\bigO(\log n)$, where $n$ is the sample size. We then provide the first stability bounds for the general case where both inner and outer level parameters are subject to continuous update, while existing work allows only the outer level parameter to be updated. Our analysis can be applied in various standard settings such as strongly-convex-strongly-convex (SC-SC), convex-convex (C-C), and nonconvex-nonconvex (NC-NC). Our analysis for the NC-NC setting can also be extended to a particular nonconvex-strongly-convex (NC-SC) setting that is commonly encountered in practice. Finally, we corroborate our theoretical analysis and demonstrate how iterations can affect the generalization error by experiments on meta-learning and hyper-parameter optimization.
    Betty: An Automatic Differentiation Library for Multilevel Optimization. (arXiv:2207.02849v2 [cs.LG] UPDATED)
    Gradient-based multilevel optimization (MLO) has gained attention as a framework for studying numerous problems, ranging from hyperparameter optimization and meta-learning to neural architecture search and reinforcement learning. However, gradients in MLO, which are obtained by composing best-response Jacobians via the chain rule, are notoriously difficult to implement and memory/compute intensive. We take an initial step towards closing this gap by introducing Betty, a software library for large-scale MLO. At its core, we devise a novel dataflow graph for MLO, which allows us to (1) develop efficient automatic differentiation for MLO that reduces the computational complexity from O(d^3) to O(d^2), (2) incorporate systems support such as mixed-precision and data-parallel training for scalability, and (3) facilitate implementation of MLO programs of arbitrary complexity while allowing a modular interface for diverse algorithmic and systems design choices. We empirically demonstrate that Betty can be used to implement an array of MLO programs, while also observing up to 11% increase in test accuracy, 14% decrease in GPU memory usage, and 20% decrease in training wall time over existing implementations on multiple benchmarks. We also showcase that Betty enables scaling MLO to models with hundreds of millions of parameters. We open-source the code at https://github.com/leopard-ai/betty.
    Recurrent Neural Networks and Universal Approximation of Bayesian Filters. (arXiv:2211.00335v2 [stat.ML] UPDATED)
    We consider the Bayesian optimal filtering problem: i.e. estimating some conditional statistics of a latent time-series signal from an observation sequence. Classical approaches often rely on the use of assumed or estimated transition and observation models. Instead, we formulate a generic recurrent neural network framework and seek to learn directly a recursive mapping from observational inputs to the desired estimator statistics. The main focus of this article is the approximation capabilities of this framework. We provide approximation error bounds for filtering in general non-compact domains. We also consider strong time-uniform approximation error bounds that guarantee good long-time performance. We discuss and illustrate a number of practical concerns and implications of these results.
    Multimodal Lyrics-Rhythm Matching. (arXiv:2301.02732v2 [cs.SD] UPDATED)
    Despite the recent increase in research on artificial intelligence for music, prominent correlations between key components of lyrics and rhythm such as keywords, stressed syllables, and strong beats are not frequently studied. This is likely due to challenges such as audio misalignment, inaccuracies in syllabic identification, and most importantly, the need for cross-disciplinary knowledge. To address this lack of research, we propose a novel multimodal lyrics-rhythm matching approach in this paper that specifically matches key components of lyrics and music with each other without any language limitations. We use audio instead of sheet music with readily available metadata, which creates more challenges yet increases the application flexibility of our method. Furthermore, our approach creatively generates several patterns involving various multimodalities, including music strong beats, lyrical syllables, auditory changes in a singer's pronunciation, and especially lyrical keywords, which are utilized for matching key lyrical elements with key rhythmic elements. This advantageous approach not only provides a unique way to study auditory lyrics-rhythm correlations including efficient rhythm-based audio alignment algorithms, but also bridges computational linguistics with music as well as music cognition. Our experimental results reveal an 0.81 probability of matching on average, and around 30% of the songs have a probability of 0.9 or higher of keywords landing on strong beats, including 12% of the songs with a perfect landing. Also, the similarity metrics are used to evaluate the correlation between lyrics and rhythm. It shows that nearly 50% of the songs have 0.70 similarity or higher. In conclusion, our approach contributes significantly to the lyrics-rhythm relationship by computationally unveiling insightful correlations.
    Prompting Large Language Models With the Socratic Method. (arXiv:2303.08769v1 [cs.LG])
    This paper outlines a systematic approach to using the Socratic method in developing prompt templates that effectively interact with large language models, including GPT-3. We examine various methods and identify those that yield precise answers and justifications while simultaneously fostering creativity and imagination to enhance creative writing. Specifically, we discuss how techniques such as definition, elenchus, dialectic, maieutics, generalization, and counterfactual reasoning can be applied in engineering prompt templates, and provide practical examples that demonstrate their effectiveness in performing inductive, deductive, and abductive reasoning.
    Phased Progressive Learning with Coupling-Regulation-Imbalance Loss for Imbalanced Data Classification. (arXiv:2205.12117v3 [cs.LG] UPDATED)
    Deep convolutional neural networks often perform poorly when faced with datasets that suffer from quantity imbalances and classification difficulties. Despite advances in the field, existing two-stage approaches still exhibit dataset bias or domain shift. To counter this, a phased progressive learning schedule has been proposed that gradually shifts the emphasis from representation learning to training the upper classifier. This approach is particularly beneficial for datasets with larger imbalances or fewer samples. Another new method a coupling-regulation-imbalance loss function is proposed, which combines three parts: a correction term, Focal loss, and LDAM loss. This loss is effective in addressing quantity imbalances and outliers, while regulating the focus of attention on samples with varying classification difficulties. These approaches have yielded satisfactory results on several benchmark datasets, including Imbalanced CIFAR10, Imbalanced CIFAR100, ImageNet-LT, and iNaturalist 2018, and can be easily generalized to other imbalanced classification models.
    Learning Minimally-Violating Continuous Control for Infeasible Linear Temporal Logic Specifications. (arXiv:2210.01162v4 [cs.RO] UPDATED)
    This paper explores continuous-time control synthesis for target-driven navigation to satisfy complex high-level tasks expressed as linear temporal logic (LTL). We propose a model-free framework using deep reinforcement learning (DRL) where the underlying dynamic system is unknown (an opaque box). Unlike prior work, this paper considers scenarios where the given LTL specification might be infeasible and therefore cannot be accomplished globally. Instead of modifying the given LTL formula, we provide a general DRL-based approach to satisfy it with minimal violation. To do this, we transform a previously multi-objective DRL problem, which requires simultaneous automata satisfaction and minimum violation cost, into a single objective. By guiding the DRL agent with a sampling-based path planning algorithm for the potentially infeasible LTL task, the proposed approach mitigates the myopic tendencies of DRL, which are often an issue when learning general LTL tasks that can have long or infinite horizons. This is achieved by decomposing an infeasible LTL formula into several reach-avoid sub-tasks with shorter horizons, which can be trained in a modular DRL architecture. Furthermore, we overcome the challenge of the exploration process for DRL in complex and cluttered environments by using path planners to design rewards that are dense in the configuration space. The benefits of the presented approach are demonstrated through testing on various complex nonlinear systems and compared with state-of-the-art baselines. The Video demonstration can be found here:https://youtu.be/jBhx6Nv224E.
    PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining. (arXiv:2303.08789v1 [cs.RO])
    A rich representation is key to general robotic manipulation, but existing model architectures require a lot of data to learn it. Unfortunately, ideal robotic manipulation training data, which comes in the form of expert visuomotor demonstrations for a variety of annotated tasks, is scarce. In this work we propose PLEX, a transformer-based architecture that learns from task-agnostic visuomotor trajectories accompanied by a much larger amount of task-conditioned object manipulation videos -- a type of robotics-relevant data available in quantity. The key insight behind PLEX is that the trajectories with observations and actions help induce a latent feature space and train a robot to execute task-agnostic manipulation routines, while a diverse set of video-only demonstrations can efficiently teach the robot how to plan in this feature space for a wide variety of tasks. In contrast to most works on robotic manipulation pretraining, PLEX learns a generalizable sensorimotor multi-task policy, not just an observational representation. We also show that using relative positional encoding in PLEX's transformers further increases its data efficiency when learning from human-collected demonstrations. Experiments showcase \appr's generalization on Meta-World-v2 benchmark and establish state-of-the-art performance in challenging Robosuite environments.
    Learning Resilient Radio Resource Management Policies with Graph Neural Networks. (arXiv:2203.11012v2 [eess.SP] UPDATED)
    We consider the problems of user selection and power control in wireless interference networks, comprising multiple access points (APs) communicating with a group of user equipment devices (UEs) over a shared wireless medium. To achieve a high aggregate rate, while ensuring fairness across all users, we formulate a resilient radio resource management (RRM) policy optimization problem with per-user minimum-capacity constraints that adapt to the underlying network conditions via learnable slack variables. We reformulate the problem in the Lagrangian dual domain, and show that we can parameterize the RRM policies using a finite set of parameters, which can be trained alongside the slack and dual variables via an unsupervised primal-dual approach thanks to a provably small duality gap. We use a scalable and permutation-equivariant graph neural network (GNN) architecture to parameterize the RRM policies based on a graph topology derived from the instantaneous channel conditions. Through experimental results, we verify that the minimum-capacity constraints adapt to the underlying network configurations and channel conditions. We further demonstrate that, thanks to such adaptation, our proposed method achieves a superior tradeoff between the average rate and the 5th percentile rate -- a metric that quantifies the level of fairness in the resource allocation decisions -- as compared to baseline algorithms.
    Training Neural Networks for Sequential Change-point Detection. (arXiv:2210.17312v2 [cs.LG] UPDATED)
    Detecting an abrupt distributional shift of a data stream, known as change-point detection, is a fundamental problem in statistics and machine learning. We introduce a novel approach for online change-point detection using neural networks. To be specific, our approach is training neural networks to compute the cumulative sum of a detection statistic sequentially, which exhibits a significant change when a change-point occurs. We demonstrated the superiority and potential of the proposed method in detecting change-point using both synthetic and real-world data.
    Robust High-speed Running for Quadruped Robots via Deep Reinforcement Learning. (arXiv:2103.06484v2 [cs.RO] UPDATED)
    Deep reinforcement learning has emerged as a popular and powerful way to develop locomotion controllers for quadruped robots. Common approaches have largely focused on learning actions directly in joint space, or learning to modify and offset foot positions produced by trajectory generators. Both approaches typically require careful reward shaping and training for millions of time steps, and with trajectory generators introduce human bias into the resulting control policies. In this paper, we present a learning framework that leads to the natural emergence of fast and robust bounding policies for quadruped robots. The agent both selects and controls actions directly in task space to track desired velocity commands subject to environmental noise including model uncertainty and rough terrain. We observe that this framework improves sample efficiency, necessitates little reward shaping, leads to the emergence of natural gaits such as galloping and bounding, and eases the sim-to-real transfer at running speeds. Policies can be learned in only a few million time steps, even for challenging tasks of running over rough terrain with loads of over 100% of the nominal quadruped mass. Training occurs in PyBullet, and we perform a sim-to-sim transfer to Gazebo and sim-to-real transfer to the Unitree A1 hardware. For sim-to-sim, our results show the quadruped is able to run at over 4 m/s without a load, and 3.5 m/s with a 10 kg load, which is over 83% of the nominal quadruped mass. For sim-to-real, the Unitree A1 is able to bound at 2 m/s with a 5 kg load, representing 42% of the nominal quadruped mass.
    Contrast and Clustering: Learning Neighborhood Pair Representation for Source-free Domain Adaptation. (arXiv:2301.13428v2 [cs.CV] UPDATED)
    Unsupervised domain adaptation uses source data from different distributions to solve the problem of classifying data from unlabeled target domains. However, conventional methods require access to source data, which often raise concerns about data privacy. In this paper, we consider a more practical but challenging setting where the source domain data is unavailable and the target domain data is unlabeled. Specifically, we address the domain discrepancy problem from the perspective of contrastive learning. The key idea of our work is to learn a domain-invariant feature by 1) performing clustering directly in the original feature space with nearest neighbors; 2) constructing truly hard negative pairs by extended neighbors without introducing additional computational complexity; and 3) combining noise-contrastive estimation theory to gain computational advantage. We conduct careful ablation studies and extensive experiments on three common benchmarks: VisDA, Office-Home, and Office-31. The results demonstrate the superiority of our methods compared with other state-of-the-art works.
    Efficient Compressed Ratio Estimation using Online Sequential Learning for Edge Computing. (arXiv:2211.04284v2 [cs.LG] UPDATED)
    Owing to the widespread adoption of the Internet of Things, a vast amount of sensor information is being acquired in real time. Accordingly, the communication cost of data from edge devices is increasing. Compressed sensing (CS), a data compression method that can be used on edge devices, has been attracting attention as a method to reduce communication costs. In CS, estimating the appropriate compression ratio is important. There is a method to adaptively estimate the compression ratio for the acquired data using reinforcement learning (RL). However, the computational costs associated with existing RL methods that can be utilized on edges are often high. In this study, we developed an efficient RL method for edge devices, referred to as the actor--critic online sequential extreme learning machine (AC-OSELM), and a system to compress data by estimating an appropriate compression ratio on the edge using AC-OSELM. The performance of the proposed method in estimating the compression ratio is evaluated by comparing it with other RL methods for edge devices. The experimental results indicate that AC-OSELM demonstrated the same or better compression performance and faster compression ratio estimation than the existing methods.
    NovelCraft: A Dataset for Novelty Detection and Discovery in Open Worlds. (arXiv:2206.11736v2 [cs.CV] UPDATED)
    In order for artificial agents to successfully perform tasks in changing environments, they must be able to both detect and adapt to novelty. However, visual novelty detection research often only evaluates on repurposed datasets such as CIFAR-10 originally intended for object classification, where images focus on one distinct, well-centered object. New benchmarks are needed to represent the challenges of navigating the complex scenes of an open world. Our new NovelCraft dataset contains multimodal episodic data of the images and symbolic world-states seen by an agent completing a pogo stick assembly task within a modified Minecraft environment. In some episodes, we insert novel objects within the complex 3D scene that may impact gameplay and appear in a variety of sizes and positions. Our visual novelty detection benchmark finds that methods that rank best on popular area-under-the-curve metrics may be outperformed by simpler alternatives when controlling false positives matters most. Further multi-modal novelty detection experiments suggest that methods that fuse both visual and symbolic information can improve time until detection as well as overall discrimination. Finally, our evaluation of recent generalized category discovery methods suggests that adapting to new imbalanced categories in complex scenes remains an exciting open problem.
    A machine-learning approach to thunderstorm forecasting through post-processing of simulation data. (arXiv:2303.08736v1 [physics.ao-ph])
    Thunderstorms pose a major hazard to society and economy, which calls for reliable thunderstorm forecasts. In this work, we introduce SALAMA, a feedforward neural network model for identifying thunderstorm occurrence in numerical weather prediction (NWP) data. The model is trained on convection-resolving ensemble forecasts over Central Europe and lightning observations. Given only a set of pixel-wise input parameters that are extracted from NWP data and related to thunderstorm development, SALAMA infers the probability of thunderstorm occurrence in a reliably calibrated manner. For lead times up to eleven hours, we find a forecast skill superior to classification based only on convective available potential energy. Varying the spatiotemporal criteria by which we associate lightning observations with NWP data, we show that the time scale for skillful thunderstorm predictions increases linearly with the spatial scale of the forecast.
    Gradient Gating for Deep Multi-Rate Learning on Graphs. (arXiv:2210.00513v2 [cs.LG] UPDATED)
    We present Gradient Gating (G$^2$), a novel framework for improving the performance of Graph Neural Networks (GNNs). Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph. Local gradients are harnessed to further modulate message passing updates. Our framework flexibly allows one to use any basic GNN layer as a wrapper around which the multi-rate gradient gating mechanism is built. We rigorously prove that G$^2$ alleviates the oversmoothing problem and allows the design of deep GNNs. Empirical results are presented to demonstrate that the proposed framework achieves state-of-the-art performance on a variety of graph learning tasks, including on large-scale heterophilic graphs.
    Relax, it doesn't matter how you get there: A new self-supervised approach for multi-timescale behavior analysis. (arXiv:2303.08811v1 [cs.LG])
    Natural behavior consists of dynamics that are complex and unpredictable, especially when trying to predict many steps into the future. While some success has been found in building representations of behavior under constrained or simplified task-based conditions, many of these models cannot be applied to free and naturalistic settings where behavior becomes increasingly hard to model. In this work, we develop a multi-task representation learning model for behavior that combines two novel components: (i) An action prediction objective that aims to predict the distribution of actions over future timesteps, and (ii) A multi-scale architecture that builds separate latent spaces to accommodate short- and long-term dynamics. After demonstrating the ability of the method to build representations of both local and global dynamics in realistic robots in varying environments and terrains, we apply our method to the MABe 2022 Multi-agent behavior challenge, where our model ranks 1st overall and on all global tasks, and 1st or 2nd on 7 out of 9 frame-level tasks. In all of these cases, we show that our model can build representations that capture the many different factors that drive behavior and solve a wide range of downstream tasks.
    Weisfeiler and Leman go Machine Learning: The Story so far. (arXiv:2112.09992v3 [cs.LG] UPDATED)
    In recent years, algorithms and neural architectures based on the Weisfeiler-Leman algorithm, a well-known heuristic for the graph isomorphism problem, have emerged as a powerful tool for machine learning with graphs and relational data. Here, we give a comprehensive overview of the algorithm's use in a machine-learning setting, focusing on the supervised regime. We discuss the theoretical background, show how to use it for supervised graph and node representation learning, discuss recent extensions, and outline the algorithm's connection to (permutation-)equivariant neural architectures. Moreover, we give an overview of current applications and future directions to stimulate further research.
    Planning and Learning Using Adaptive Entropy Tree Search. (arXiv:2102.06808v3 [cs.AI] UPDATED)
    Recent breakthroughs in Artificial Intelligence have shown that the combination of tree-based planning with deep learning can lead to superior performance. We present Adaptive Entropy Tree Search (ANTS) - a novel algorithm combining planning and learning in the maximum entropy paradigm. Through a comprehensive suite of experiments on the Atari benchmark we show that ANTS significantly outperforms PUCT, the planning component of the state-of-the-art AlphaZero system. ANTS builds upon recent work on maximum entropy planning methods - which however, as we show, fail in combination with learning. ANTS resolves this issue to reach state-of-the-art performance. We further find that ANTS exhibits superior robustness to different hyperparameter choices, compared to the previous algorithms. We believe that the high performance and robustness of ANTS can bring tree search planning one step closer to wide practical adoption.
    Interpretable Ensembles of Hyper-Rectangles as Base Models. (arXiv:2303.08625v1 [cs.LG])
    A new extremely simple ensemble-based model with the uniformly generated axis-parallel hyper-rectangles as base models (HRBM) is proposed. Two types of HRBMs are studied: closed rectangles and corners. The main idea behind HRBM is to consider and count training examples inside and outside each rectangle. It is proposed to incorporate HRBMs into the gradient boosting machine (GBM). Despite simplicity of HRBMs, it turns out that these simple base models allow us to construct effective ensemble-based models and avoid overfitting. A simple method for calculating optimal regularization parameters of the ensemble-based model, which can be modified in the explicit way at each iteration of GBM, is considered. Moreover, a new regularization called the "step height penalty" is studied in addition to the standard L1 and L2 regularizations. An extremely simple approach to the proposed ensemble-based model prediction interpretation by using the well-known method SHAP is proposed. It is shown that GBM with HRBM can be regarded as a model extending a set of interpretable models for explaining black-box models. Numerical experiments with real datasets illustrate the proposed GBM with HRBMs for regression and classification problems. Experiments also illustrate computational efficiency of the proposed SHAP modifications. The code of proposed algorithms implementing GBM with HRBM is publicly available.
    Trigger-Level Event Reconstruction for Neutrino Telescopes Using Sparse Submanifold Convolutional Neural Networks. (arXiv:2303.08812v1 [hep-ex])
    Convolutional neural networks (CNNs) have seen extensive applications in scientific data analysis, including in neutrino telescopes. However, the data from these experiments present numerous challenges to CNNs, such as non-regular geometry, sparsity, and high dimensionality. Consequently, CNNs are highly inefficient on neutrino telescope data, and require significant pre-processing that results in information loss. We propose sparse submanifold convolutions (SSCNNs) as a solution to these issues and show that the SSCNN event reconstruction performance is comparable to or better than traditional and machine learning algorithms. Additionally, our SSCNN runs approximately 16 times faster than a traditional CNN on a GPU. As a result of this speedup, it is expected to be capable of handling the trigger-level event rate of IceCube-scale neutrino telescopes. These networks could be used to improve the first estimation of the neutrino energy and direction to seed more advanced reconstructions, or to provide this information to an alert-sending system to quickly follow-up interesting events.
    Borda Regret Minimization for Generalized Linear Dueling Bandits. (arXiv:2303.08816v1 [cs.LG])
    Dueling bandits are widely used to model preferential feedback that is prevalent in machine learning applications such as recommendation systems and ranking. In this paper, we study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score while minimizing the cumulative regret. We propose a new and highly expressive generalized linear dueling bandits model, which covers many existing models. Surprisingly, the Borda regret minimization problem turns out to be difficult, as we prove a regret lower bound of order $\Omega(d^{2/3} T^{2/3})$, where $d$ is the dimension of contextual vectors and $T$ is the time horizon. To attain the lower bound, we propose an explore-then-commit type algorithm, which has a nearly matching regret upper bound $\tilde{O}(d^{2/3} T^{2/3})$. When the number of items/arms $K$ is small, our algorithm can achieve a smaller regret $\tilde{O}( (d \log K)^{1/3} T^{2/3})$ with proper choices of hyperparameters. We also conduct empirical experiments on both synthetic data and a simulated real-world environment, which corroborate our theoretical analysis.
    Joint Graph and Vertex Importance Learning. (arXiv:2303.08552v1 [eess.SP])
    In this paper, we explore the topic of graph learning from the perspective of the Irregularity-Aware Graph Fourier Transform, with the goal of learning the graph signal space inner product to better model data. We propose a novel method to learn a graph with smaller edge weight upper bounds compared to combinatorial Laplacian approaches. Experimentally, our approach yields much sparser graphs compared to a combinatorial Laplacian approach, with a more interpretable model.
    Sensitivity-Aware Visual Parameter-Efficient Tuning. (arXiv:2303.08566v1 [cs.CV])
    Visual Parameter-Efficient Tuning (VPET) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty. However, existing VPET methods introduce trainable parameters to the same positions across different tasks depending solely on human heuristics and neglect the domain gaps. To this end, we study where to introduce and how to allocate trainable parameters by proposing a novel Sensitivity-aware visual Parameter-efficient Tuning (SPT) scheme, which adaptively allocates trainable parameters to task-specific important positions given a desired tunable parameter budget. Specifically, our SPT first quickly identifies the sensitive parameters that require tuning for a given task in a data-dependent way. Next, our SPT further boosts the representational capability for the weight matrices whose number of sensitive parameters exceeds a pre-defined threshold by utilizing any of the existing structured tuning methods, e.g., LoRA or Adapter, to replace directly tuning the selected sensitive parameters (unstructured tuning) under the budget. Extensive experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing VPET methods and largely boosts their performance, e.g., SPT improves Adapter with supervised pre-trained ViT-B/16 backbone by 4.2% and 1.4% mean Top-1 accuracy, reaching SOTA performance on FGVC and VTAB-1k benchmarks, respectively. Source code is at https://github.com/ziplab/SPT
    Investigating GANsformer: A Replication Study of a State-of-the-Art Image Generation Model. (arXiv:2303.08577v1 [cs.CV])
    The field of image generation through generative modelling is abundantly discussed nowadays. It can be used for various applications, such as up-scaling existing images, creating non-existing objects, such as interior design scenes, products or even human faces, and achieving transfer-learning processes. In this context, Generative Adversarial Networks (GANs) are a class of widely studied machine learning frameworks first appearing in the paper "Generative adversarial nets" by Goodfellow et al. that achieve the goal above. In our work, we reproduce and evaluate a novel variation of the original GAN network, the GANformer, proposed in "Generative Adversarial Transformers" by Hudson and Zitnick. This project aimed to recreate the methods presented in this paper to reproduce the original results and comment on the authors' claims. Due to resources and time limitations, we had to constrain the network's training times, dataset types, and sizes. Our research successfully recreated both variations of the proposed GANformer model and found differences between the authors' and our results. Moreover, discrepancies between the publication methodology and the one implemented, made available in the code, allowed us to study two undisclosed variations of the presented procedures.
    Visual Prompt Based Personalized Federated Learning. (arXiv:2303.08678v1 [cs.LG])
    As a popular paradigm of distributed learning, personalized federated learning (PFL) allows personalized models to improve generalization ability and robustness by utilizing knowledge from all distributed clients. Most existing PFL algorithms tackle personalization in a model-centric way, such as personalized layer partition, model regularization, and model interpolation, which all fail to take into account the data characteristics of distributed clients. In this paper, we propose a novel PFL framework for image classification tasks, dubbed pFedPT, that leverages personalized visual prompts to implicitly represent local data distribution information of clients and provides that information to the aggregation model to help with classification tasks. Specifically, in each round of pFedPT training, each client generates a local personalized prompt related to local data distribution. Then, the local model is trained on the input composed of raw data and a visual prompt to learn the distribution information contained in the prompt. During model testing, the aggregated model obtains prior knowledge of the data distributions based on the prompts, which can be seen as an adaptive fine-tuning of the aggregation model to improve model performances on different clients. Furthermore, the visual prompt can be added as an orthogonal method to implement personalization on the client for existing FL methods to boost their performance. Experiments on the CIFAR10 and CIFAR100 datasets show that pFedPT outperforms several state-of-the-art (SOTA) PFL algorithms by a large margin in various settings.
    Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations. (arXiv:2107.12003v3 [cs.CV] UPDATED)
    In this paper, we propose a multi-speaker face-to-speech waveform generation model that also works for unseen speaker conditions. Using a generative adversarial network (GAN) with linguistic and speaker characteristic features as auxiliary conditions, our method directly converts face images into speech waveforms under an end-to-end training framework. The linguistic features are extracted from lip movements using a lip-reading model, and the speaker characteristic features are predicted from face images using cross-modal learning with a pre-trained acoustic model. Since these two features are uncorrelated and controlled independently, we can flexibly synthesize speech waveforms whose speaker characteristics vary depending on the input face images. We show the superiority of our proposed model over conventional methods in terms of objective and subjective evaluation results. Specifically, we evaluate the performances of linguistic features by measuring their accuracy on an automatic speech recognition task. In addition, we estimate speaker and gender similarity for multi-speaker and unseen conditions, respectively. We also evaluate the aturalness of the synthesized speech waveforms using a mean opinion score (MOS) test and non-intrusive objective speech quality assessment (NISQA).The demo samples of the proposed and other models are available at https://sam-0927.github.io/
    Evaluating gesture-generation in a large-scale open challenge: The GENEA Challenge 2022. (arXiv:2303.08737v1 [cs.HC])
    This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research papers, differences in results are here only due to differences between methods, enabling direct comparison between systems. The dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field. The evaluation results are a revolution, and a revelation. Some synthetic conditions are rated as significantly more human-like than human motion capture. To the best of our knowledge, this has never been shown before on a high-fidelity avatar. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Fr\'echet gesture distance (FGD), which achieves a Kendall's tau rank correlation of around -0.5. Based on the challenge results we formulate numerous recommendations for system building and evaluation.
    Singular relaxation of a random walk in a box with a Metropolis Monte Carlo dynamics. (arXiv:2303.08535v1 [cond-mat.stat-mech])
    We study analytically the relaxation eigenmodes of a simple Monte Carlo algorithm, corresponding to a particle in a box which moves by uniform random jumps. Moves outside of the box are rejected. At long times, the system approaches the equilibrium probability density, which is uniform inside the box. We show that the relaxation towards this equilibrium is unusual: for a jump length comparable to the size of the box, the number of relaxation eigenmodes can be surprisingly small, one or two. We provide a complete analytic description of the transition between these two regimes. When only a single relaxation eigenmode is present, a suitable choice of the symmetry of the initial conditions gives a localizing decay to equilibrium. In this case, the deviation from equilibrium concentrates at the edges of the box where the rejection probability is maximal. Finally, in addition to the relaxation analysis of the master equation, we also describe the full eigen-spectrum of the master equation including its sub-leading eigen-modes.
    Health Monitoring of Movement Disorder Subject based on Diamond Stacked Sparse Autoencoder Ensemble Model. (arXiv:2303.08538v1 [cs.LG])
    The health monitoring of chronic diseases is very important for people with movement disorders because of their limited mobility and long duration of chronic diseases. Machine learning-based processing of data collected from the human with movement disorders using wearable sensors is an effective method currently available for health monitoring. However, wearable sensor systems are difficult to obtain high-quality and large amounts of data, which cannot meet the requirement for diagnostic accuracy. Moreover, existing machine learning methods do not handle this problem well. Feature learning is key to machine learning. To solve this problem, a health monitoring of movement disorder subject based on diamond stacked sparse autoencoder ensemble model (DsaeEM) is proposed in this paper. This algorithm has two major components. First, feature expansion is designed using feature-embedded stacked sparse autoencoder (FSSAE). Second, a feature reduction mechanism is designed to remove the redundancy among the expanded features. This mechanism includes L1 regularized feature-reduction algorithm and the improved manifold dimensionality reduction algorithm. This paper refers to the combined feature expansion and feature reduction mechanism as the diamond-like feature learning mechanism. The method is experimentally verified with several state of art algorithms and on two datasets. The results show that the proposed algorithm has higher accuracy apparently. In conclusion, this study developed an effective and feasible feature-learning algorithm for the recognition of chronic diseases.
    Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer. (arXiv:2303.08622v1 [cs.CV])
    Diffusion models have shown great promise in text-guided image style transfer, but there is a trade-off between style transformation and content preservation due to their stochastic nature. Existing methods require computationally expensive fine-tuning of diffusion models or additional neural network. To address this, here we propose a zero-shot contrastive loss for diffusion models that doesn't require additional fine-tuning or auxiliary networks. By leveraging patch-wise contrastive loss between generated samples and original image embeddings in the pre-trained diffusion model, our method can generate images with the same semantic content as the source image in a zero-shot manner. Our approach outperforms existing methods while preserving content and requiring no additional training, not only for image style transfer but also for image-to-image translation and manipulation. Our experimental results validate the effectiveness of our proposed method.
    Smoothed Q-learning. (arXiv:2303.08631v1 [cs.LG])
    In Reinforcement Learning the Q-learning algorithm provably converges to the optimal solution. However, as others have demonstrated, Q-learning can also overestimate the values and thereby spend too long exploring unhelpful states. Double Q-learning is a provably convergent alternative that mitigates some of the overestimation issues, though sometimes at the expense of slower convergence. We introduce an alternative algorithm that replaces the max operation with an average, resulting also in a provably convergent off-policy algorithm which can mitigate overestimation yet retain similar convergence as standard Q-learning.
    Learning to Reconstruct Signals From Binary Measurements. (arXiv:2303.08691v1 [eess.SP])
    Recent advances in unsupervised learning have highlighted the possibility of learning to reconstruct signals from noisy and incomplete linear measurements alone. These methods play a key role in medical and scientific imaging and sensing, where ground truth data is often scarce or difficult to obtain. However, in practice, measurements are not only noisy and incomplete but also quantized. Here we explore the extreme case of learning from binary observations and provide necessary and sufficient conditions on the number of measurements required for identifying a set of signals from incomplete binary data. Our results are complementary to existing bounds on signal recovery from binary measurements. Furthermore, we introduce a novel self-supervised learning approach, which we name SSBM, that only requires binary data for training. We demonstrate in a series of experiments with real datasets that SSBM performs on par with supervised learning and outperforms sparse reconstruction methods with a fixed wavelet basis by a large margin.
    Transfer Learning Based Diagnosis and Analysis of Lung Sound Aberrations. (arXiv:2303.08362v1 [cs.SD])
    With the development of computer -systems that can collect and analyze enormous volumes of data, the medical profession is establishing several non-invasive tools. This work attempts to develop a non-invasive technique for identifying respiratory sounds acquired by a stethoscope and voice recording software via machine learning techniques. This study suggests a trained and proven CNN-based approach for categorizing respiratory sounds. A visual representation of each audio sample is constructed, allowing resource identification for classification using methods like those used to effectively describe visuals. We used a technique called Mel Frequency Cepstral Coefficients (MFCCs). Here, features are retrieved and categorized via VGG16 (transfer learning) and prediction is accomplished using 5-fold cross-validation. Employing various data splitting techniques, Respiratory Sound Database obtained cutting-edge results, including accuracy of 95%, precision of 88%, recall score of 86%, and F1 score of 81%. The ICBHI dataset is used to train and test the model.
    Pixel-Level Explanation of Multiple Instance Learning Models in Biomedical Single Cell Images. (arXiv:2303.08632v1 [eess.IV])
    Explainability is a key requirement for computer-aided diagnosis systems in clinical decision-making. Multiple instance learning with attention pooling provides instance-level explainability, however for many clinical applications a deeper, pixel-level explanation is desirable, but missing so far. In this work, we investigate the use of four attribution methods to explain a multiple instance learning models: GradCAM, Layer-Wise Relevance Propagation (LRP), Information Bottleneck Attribution (IBA), and InputIBA. With this collection of methods, we can derive pixel-level explanations on for the task of diagnosing blood cancer from patients' blood smears. We study two datasets of acute myeloid leukemia with over 100 000 single cell images and observe how each attribution method performs on the multiple instance learning architecture focusing on different properties of the white blood single cells. Additionally, we compare attribution maps with the annotations of a medical expert to see how the model's decision-making differs from the human standard. Our study addresses the challenge of implementing pixel-level explainability in multiple instance learning models and provides insights for clinicians to better understand and trust decisions from computer-aided diagnosis systems.
    Understanding Post-hoc Explainers: The Case of Anchors. (arXiv:2303.08806v1 [stat.ML])
    In many scenarios, the interpretability of machine learning models is a highly required but difficult task. To explain the individual predictions of such models, local model-agnostic approaches have been proposed. However, the process generating the explanations can be, for a user, as mysterious as the prediction to be explained. Furthermore, interpretability methods frequently lack theoretical guarantees, and their behavior on simple models is frequently unknown. While it is difficult, if not impossible, to ensure that an explainer behaves as expected on a cutting-edge model, we can at least ensure that everything works on simple, already interpretable models. In this paper, we present a theoretical analysis of Anchors (Ribeiro et al., 2018): a popular rule-based interpretability method that highlights a small set of words to explain a text classifier's decision. After formalizing its algorithm and providing useful insights, we demonstrate mathematically that Anchors produces meaningful results when used with linear text classifiers on top of a TF-IDF vectorization. We believe that our analysis framework can aid in the development of new explainability methods based on solid theoretical foundations.
    MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting. (arXiv:2210.07179v2 [cs.CV] UPDATED)
    Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. We propose MAPL, a simple and parameter-efficient method that reuses frozen pre-trained unimodal models and leverages their strong generalization capabilities in multimodal vision-language (VL) settings. MAPL learns a lightweight mapping between the representation spaces of unimodal models using aligned image-text data, and can generalize to unseen VL tasks from just a few in-context examples. The small number of trainable parameters makes MAPL effective at low-data and in-domain learning. Moreover, MAPL's modularity enables easy extension to other pre-trained models. Extensive experiments on several visual question answering and image captioning benchmarks show that MAPL achieves superior or competitive performance compared to similar methods while training orders of magnitude fewer parameters. MAPL can be trained in just a few hours using modest computational resources and public datasets. We release our code and pre-trained model weights at https://github.com/mair-lab/mapl.
    The Benefits of Mixup for Feature Learning. (arXiv:2303.08433v1 [cs.LG])
    Mixup, a simple data augmentation method that randomly mixes two data points via linear interpolation, has been extensively applied in various deep learning applications to gain better generalization. However, the theoretical underpinnings of its efficacy are not yet fully understood. In this paper, we aim to seek a fundamental understanding of the benefits of Mixup. We first show that Mixup using different linear interpolation parameters for features and labels can still achieve similar performance to the standard Mixup. This indicates that the intuitive linearity explanation in Zhang et al., (2018) may not fully explain the success of Mixup. Then we perform a theoretical study of Mixup from the feature learning perspective. We consider a feature-noise data model and show that Mixup training can effectively learn the rare features (appearing in a small fraction of data) from its mixture with the common features (appearing in a large fraction of data). In contrast, standard training can only learn the common features but fails to learn the rare features, thus suffering from bad generalization performance. Moreover, our theoretical analysis also shows that the benefits of Mixup for feature learning are mostly gained in the early training phase, based on which we propose to apply early stopping in Mixup. Experimental results verify our theoretical findings and demonstrate the effectiveness of the early-stopped Mixup training.
    From Images to Features: Unbiased Morphology Classification via Variational Auto-Encoders and Domain Adaptation. (arXiv:2303.08627v1 [astro-ph.GA])
    We present a novel approach for the dimensionality reduction of galaxy images by leveraging a combination of variational auto-encoders (VAE) and domain adaptation (DA). We demonstrate the effectiveness of this approach using a sample of low redshift galaxies with detailed morphological type labels from the Galaxy-Zoo DECaLS project. We show that 40-dimensional latent variables can effectively reproduce most morphological features in galaxy images. To further validate the effectiveness of our approach, we utilised a classical random forest (RF) classifier on the 40-dimensional latent variables to make detailed morphology feature classifications. This approach performs similarly to a direct neural network application on galaxy images. We further enhance our model by tuning the VAE network via DA using galaxies in the overlapping footprint of DECaLS and BASS+MzLS, enabling the unbiased application of our model to galaxy images in both surveys. We observed that noise suppression during DA led to even better morphological feature extraction and classification performance. Overall, this combination of VAE and DA can be applied to achieve image dimensionality reduction, defect image identification, and morphology classification in large optical surveys.
    Fashion-model pose recommendation and generation using Machine Learning. (arXiv:2303.08660v1 [cs.CV])
    Fashion-model pose is an important attribute in the fashion industry. Creative directors, modeling production houses, and top photographers always look for professional models able to pose. without the skill to correctly pose, their chances of landing professional modeling employment are regrettably quite little. There are occasions when models and photographers are unsure of the best pose to strike while taking photographs. This research concentrates on suggesting the fashion personnel a series of similar images based on the input image. The image is segmented into different parts and similar images are suggested for the user. This was achieved by calculating the color histogram of the input image and applying the same for all the images in the dataset and comparing the histograms. Synthetic images have become popular to avoid privacy concerns and to overcome the high cost of photoshoots. Hence, this paper also extends the work of generating synthetic images from the recommendation engine using styleGAN to an extent.
    Distribution-free Deviation Bounds of Learning via Model Selection with Cross-validation Risk Estimation. (arXiv:2303.08777v1 [stat.ML])
    Cross-validation techniques for risk estimation and model selection are widely used in statistics and machine learning. However, the understanding of the theoretical properties of learning via model selection with cross-validation risk estimation is quite low in face of its widespread use. In this context, this paper presents learning via model selection with cross-validation risk estimation as a general systematic learning framework within classical statistical learning theory and establishes distribution-free deviation bounds in terms of VC dimension, giving detailed proofs of the results and considering both bounded and unbounded loss functions. We also deduce conditions under which the deviation bounds of learning via model selection are tighter than that of learning via empirical risk minimization in the whole hypotheses space, supporting the better performance of model selection frameworks observed empirically in some instances.
    ES-ENAS: Efficient Evolutionary Optimization for Large Hybrid Search Spaces. (arXiv:2101.07415v6 [cs.LG] UPDATED)
    In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters. We demonstrate that previous evolutionary algorithms which rely on mutation-based approaches, while flexible over combinatorial spaces, suffer from a curse of dimensionality in high dimensional continuous spaces both theoretically and empirically, which thus limits their scope over hybrid search spaces as well. In order to combat this curse, we propose ES-ENAS, a simple and modular joint optimization procedure combining the class of sample-efficient smoothed gradient techniques, commonly known as Evolutionary Strategies (ES), with combinatorial optimizers in a highly scalable and intuitive way, inspired by the one-shot or supernet paradigm introduced in Efficient Neural Architecture Search (ENAS). By doing so, we achieve significantly more sample efficiency, which we empirically demonstrate over synthetic benchmarks, and are further able to apply ES-ENAS for architecture search over popular RL benchmarks.
    DCT-Former: Efficient Self-Attention with Discrete Cosine Transform. (arXiv:2203.01178v3 [cs.LG] UPDATED)
    Since their introduction the Trasformer architectures emerged as the dominating architectures for both natural language processing and, more recently, computer vision applications. An intrinsic limitation of this family of "fully-attentive" architectures arises from the computation of the dot-product attention, which grows both in memory consumption and number of operations as $O(n^2)$ where $n$ stands for the input sequence length, thus limiting the applications that require modeling very long sequences. Several approaches have been proposed so far in the literature to mitigate this issue, with varying degrees of success. Our idea takes inspiration from the world of lossy data compression (such as the JPEG algorithm) to derive an approximation of the attention module by leveraging the properties of the Discrete Cosine Transform. An extensive section of experiments shows that our method takes up less memory for the same performance, while also drastically reducing inference time. This makes it particularly suitable in real-time contexts on embedded platforms. Moreover, we assume that the results of our research might serve as a starting point for a broader family of deep neural models with reduced memory footprint. The implementation will be made publicly available at https://github.com/cscribano/DCT-Former-Public
    Computation Offloading in Heterogeneous Vehicular Edge Networks: On-line and Off-policy Bandit Solutions. (arXiv:2008.06302v2 [cs.NI] UPDATED)
    With the rapid advancement of Intelligent Transportation Systems (ITS) and vehicular communications, Vehicular Edge Computing (VEC) is emerging as a promising technology to support low-latency ITS applications and services. In this paper, we consider the computation offloading problem from mobile vehicles/users in a heterogeneous VEC scenario, and focus on the network- and base station selection problems, where different networks have different traffic loads. In a fast-varying vehicular environment, computation offloading experience of users is strongly affected by the latency due to the congestion at the edge computing servers co-located with the base stations. However, as a result of the non-stationary property of such an environment and also information shortage, predicting this congestion is an involved task. To address this challenge, we propose an on-line learning algorithm and an off-policy learning algorithm based on multi-armed bandit theory. To dynamically select the least congested network in a piece-wise stationary environment, these algorithms predict the latency that the offloaded tasks experience using the offloading history. In addition, to minimize the task loss due to the mobility of the vehicles, we develop a method for base station selection. Moreover, we propose a relaying mechanism for the selected network, which operates based on the sojourn time of the vehicles. Through intensive numerical analysis, we demonstrate that the proposed learning-based solutions adapt to the traffic changes of the network by selecting the least congested network, thereby reducing the latency of offloaded tasks. Moreover, we demonstrate that the proposed joint base station selection and the relaying mechanism minimize the task loss in a vehicular environment.
    Policy Adaptation from Foundation Model Feedback. (arXiv:2212.07398v3 [cs.LG] UPDATED)
    Recent progress on vision-language foundation models have brought significant advancement to building general-purpose robots. By using the pre-trained models to encode the scene and instructions as inputs for decision making, the instruction-conditioned policy can generalize across different objects and tasks. While this is encouraging, the policy still fails in most cases given an unseen task or environment. In this work, we propose Policy Adaptation from Foundation model Feedback (PAFF). When deploying the trained policy to a new task or a new environment, we first let the policy play with randomly generated instructions to record the demonstrations. While the execution could be wrong, we can use the pre-trained foundation models to provide feedback to relabel the demonstrations. This automatically provides new pairs of demonstration-instruction data for policy fine-tuning. We evaluate our method on a broad range of experiments with the focus on generalization on unseen objects, unseen tasks, unseen environments, and sim-to-real transfer. We show PAFF improves baselines by a large margin in all cases. Our project page is available at https://geyuying.github.io/PAFF/
    Reversing the Abnormal: Pseudo-Healthy Generative Networks for Anomaly Detection. (arXiv:2303.08452v1 [eess.IV])
    Early and accurate disease detection is crucial for patient management and successful treatment outcomes. However, the automatic identification of anomalies in medical images can be challenging. Conventional methods rely on large labeled datasets which are difficult to obtain. To overcome these limitations, we introduce a novel unsupervised approach, called PHANES (Pseudo Healthy generative networks for ANomaly Segmentation). Our method has the capability of reversing anomalies, i.e., preserving healthy tissue and replacing anomalous regions with pseudo-healthy (PH) reconstructions. Unlike recent diffusion models, our method does not rely on a learned noise distribution nor does it introduce random alterations to the entire image. Instead, we use latent generative networks to create masks around possible anomalies, which are refined using inpainting generative networks. We demonstrate the effectiveness of PHANES in detecting stroke lesions in T1w brain MRI datasets and show significant improvements over state-of-the-art (SOTA) methods. We believe that our proposed framework will open new avenues for interpretable, fast, and accurate anomaly segmentation with the potential to support various clinical-oriented downstream tasks.
    Practicality of generalization guarantees for unsupervised domain adaptation with neural networks. (arXiv:2303.08720v1 [cs.LG])
    Understanding generalization is crucial to confidently engineer and deploy machine learning models, especially when deployment implies a shift in the data domain. For such domain adaptation problems, we seek generalization bounds which are tractably computable and tight. If these desiderata can be reached, the bounds can serve as guarantees for adequate performance in deployment. However, in applications where deep neural networks are the models of choice, deriving results which fulfill these remains an unresolved challenge; most existing bounds are either vacuous or has non-estimable terms, even in favorable conditions. In this work, we evaluate existing bounds from the literature with potential to satisfy our desiderata on domain adaptation image classification tasks, where deep neural networks are preferred. We find that all bounds are vacuous and that sample generalization terms account for much of the observed looseness, especially when these terms interact with measures of domain shift. To overcome this and arrive at the tightest possible results, we combine each bound with recent data-dependent PAC-Bayes analysis, greatly improving the guarantees. We find that, when domain overlap can be assumed, a simple importance weighting extension of previous work provides the tightest estimable bound. Finally, we study which terms dominate the bounds and identify possible directions for further improvement.
    Generating symbolic music using diffusion models. (arXiv:2303.08385v1 [cs.SD])
    Probabilistic Denoising Diffusion models have emerged as simple yet very powerful generative models. Diffusion models unlike other generative models do not suffer from mode collapse nor require a discriminator to generate high quality samples. In this paper, we propose a diffusion model that uses a binomial prior distribution to generate piano-rolls. The paper also proposes an efficient method to train the model and generate samples. The generated music has coherence at time scales up to the length of the training piano-roll segments. We show how such a model is conditioned on the input and can be used to harmonize a given melody, complete an incomplete piano-roll or generate a variation of a given piece. The code is shared publicly to encourage the use and development of the method by the community.
    Adapting U-Net for linear elastic stress estimation in polycrystal Zr microstructures. (arXiv:2303.08541v1 [cond-mat.mtrl-sci])
    A variant of the U-Net convolutional neural network architecture is proposed to estimate linear elastic compatibility stresses in a-Zr (hcp) polycrystalline grain structures. Training data was generated using VGrain software with a regularity alpha of 0.73 and uniform random orientation for the grain structures and ABAQUS to evaluate the stress welds using the finite element method. The initial dataset contains 200 samples with 20 held from training for validation. The network gives speedups of around 200x to 6000x using a CPU or GPU, with signifcant memory savings, compared to finite element analysis with a modest reduction in accuracy of up to 10%. Network performance is not correlated with grain structure regularity or texture, showing generalisation of the network beyond the training set to arbitrary Zr crystal structures. Performance when trained with 200 and 400 samples was measured, finding an improvement in accuracy of approximately 10% when the size of the dataset was doubled.
    High-dimensional multi-view clustering methods. (arXiv:2303.08582v1 [cs.LG])
    Multi-view clustering has been widely used in recent years in comparison to single-view clustering, for clear reasons, as it offers more insights into the data, which has brought with it some challenges, such as how to combine these views or features. Most of recent work in this field focuses mainly on tensor representation instead of treating the data as simple matrices. This permits to deal with the high-order correlation between the data which the based matrix approach struggles to capture. Accordingly, we will examine and compare these approaches, particularly in two categories, namely graph-based clustering and subspace-based clustering. We will conduct and report experiments of the main clustering methods over a benchmark datasets.
    Making Vision Transformers Efficient from A Token Sparsification View. (arXiv:2303.08685v1 [cs.CV])
    The quadratic computational complexity to the number of tokens limits the practical applications of Vision Transformers (ViTs). Several works propose to prune redundant tokens to achieve efficient ViTs. However, these methods generally suffer from (i) dramatic accuracy drops, (ii) application difficulty in the local vision transformer, and (iii) non-general-purpose networks for downstream tasks. In this work, we propose a novel Semantic Token ViT (STViT), for efficient global and local vision transformers, which can also be revised to serve as backbone for downstream tasks. The semantic tokens represent cluster centers, and they are initialized by pooling image tokens in space and recovered by attention, which can adaptively represent global or local semantic information. Due to the cluster properties, a few semantic tokens can attain the same effect as vast image tokens, for both global and local vision transformers. For instance, only 16 semantic tokens on DeiT-(Tiny,Small,Base) can achieve the same accuracy with more than 100% inference speed improvement and nearly 60% FLOPs reduction; on Swin-(Tiny,Small,Base), we can employ 16 semantic tokens in each window to further speed it up by around 20% with slight accuracy increase. Besides great success in image classification, we also extend our method to video recognition. In addition, we design a STViT-R(ecover) network to restore the detailed spatial information based on the STViT, making it work for downstream tasks, which is powerless for previous token sparsification methods. Experiments demonstrate that our method can achieve competitive results compared to the original networks in object detection and instance segmentation, with over 30% FLOPs reduction for backbone.
    MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. (arXiv:2303.08374v1 [cs.DC])
    In recent years, the training requirements of many state-of-the-art Deep Learning (DL) models have scaled beyond the compute and memory capabilities of a single processor, and necessitated distribution among processors. Training such massive models necessitates advanced parallelism strategies to maintain efficiency. However, such distributed DL parallelism strategies require a varied mixture of collective and point-to-point communication operations across a broad range of message sizes and scales. Examples of models using advanced parallelism strategies include Deep Learning Recommendation Models (DLRM) and Mixture-of-Experts (MoE). Communication libraries' performance varies wildly across different communication operations, scales, and message sizes. We propose MCR-DL: an extensible DL communication framework that supports all point-to-point and collective operations while enabling users to dynamically mix-and-match communication backends for a given operation without deadlocks. MCR-DL also comes packaged with a tuning suite for dynamically selecting the best communication backend for a given input tensor. We select DeepSpeed-MoE and DLRM as candidate DL models and demonstrate a 31% improvement in DS-MoE throughput on 256 V100 GPUs on the Lassen HPC system. Further, we achieve a 20% throughput improvement in a dense Megatron-DeepSpeed model and a 25% throughput improvement in DLRM on 32 A100 GPUs with the Theta-GPU HPC system.
    MAHTM: A Multi-Agent Framework for Hierarchical Transactive Microgrids. (arXiv:2303.08447v1 [cs.LG])
    Integrating variable renewable energy into the grid has posed challenges to system operators in achieving optimal trade-offs among energy availability, cost affordability, and pollution controllability. This paper proposes a multi-agent reinforcement learning framework for managing energy transactions in microgrids. The framework addresses the challenges above: it seeks to optimize the usage of available resources by minimizing the carbon footprint while benefiting all stakeholders. The proposed architecture consists of three layers of agents, each pursuing different objectives. The first layer, comprised of prosumers and consumers, minimizes the total energy cost. The other two layers control the energy price to decrease the carbon impact while balancing the consumption and production of both renewable and conventional energy. This framework also takes into account fluctuations in energy demand and supply.
    Hybrid-Physical Probabilistic Forecasting for a Set of Photovoltaic Systems using Recurrent Neural Networks. (arXiv:2303.08459v1 [cs.LG])
    Accurate intra-day forecasts of the power output by PhotoVoltaic (PV) systems are critical to improve the operation of energy distribution grids. We describe a hybrid-physical model, which aims at improving deterministic intra-day forecasts, issued by a PV performance model fed by Numerical Weather Predictions (NWP), by using them as covariates in the context of an autoregressive recurrent neural model. Our proposal repurposes a neural model initially used in the retail sector, and discloses a novel truncated Gaussian output distribution. We experimentally compare many model variants to alternatives from the literature, and an ablation study shows that the components in the best performing variant work synergistically to reach a skill score of 7.54% with respect to the NWP-driven PV performance model baseline.
    EGFR mutation prediction using F18-FDG PET-CT based radiomics features in non-small cell lung cancer. (arXiv:2303.08569v1 [q-bio.QM])
    Lung cancer is the leading cause of cancer death in the world. Accurate determination of the EGFR (epidermal growth factor receptor) mutation status is highly relevant for the proper treatment of this patients. Purpose: The aim of this study was to predict the mutational status of the EGFR in non-small cell lung cancer patients using radiomics features extracted from PET-CT images. Methods: Retrospective study that involve 34 patients with lung cancer confirmed by histology and EGFR status mutation assessment. A total of 2.205 radiomics features were extracted from manual segmentation of the PET-CT images using pyradiomics library. Both computed tomography and positron emission tomography images were used. All images were acquired with intravenous iodinated contrast and F18-FDG. Preprocessing includes resampling, normalization, and discretization of the pixel intensity. Three methods were used for the feature selection process: backward selection (set 1), forward selection (set 2), and feature importance analysis of random forest model (set 3). Nine machine learning methods were used for radiomics model building. Results: 35.2% of patients had EGFR mutation, without significant differences in age, gender, tumor size and SUVmax. After the feature selection process 6, 7 and 17 radiomics features were selected, respectively in each group. The best performances were obtained by Ridge Regression in set 1: AUC of 0.826 (95% CI, 0.811 - 0.839), Random Forest in set 2: AUC of 0.823 (95% CI, 0.808 - 0.838) and Neural Network in set 3: AUC of 0.821 (95% CI, 0.808 - 0.835). Conclusion: The radiomics features analysis has the potential of predicting clinically relevant mutations in lung cancer patients through a non-invasive methodology.
    Rediscovery of CNN's Versatility for Text-based Encoding of Raw Electronic Health Records. (arXiv:2303.08290v1 [cs.LG])
    Making the most use of abundant information in electronic health records (EHR) is rapidly becoming an important topic in the medical domain. Recent work presented a promising framework that embeds entire features in raw EHR data regardless of its form and medical code standards. The framework, however, only focuses on encoding EHR with minimal preprocessing and fails to consider how to learn efficient EHR representation in terms of computation and memory usage. In this paper, we search for a versatile encoder not only reducing the large data into a manageable size but also well preserving the core information of patients to perform diverse clinical tasks. We found that hierarchically structured Convolutional Neural Network (CNN) often outperforms the state-of-the-art model on diverse tasks such as reconstruction, prediction, and generation, even with fewer parameters and less training time. Moreover, it turns out that making use of the inherent hierarchy of EHR data can boost the performance of any kind of backbone models and clinical tasks performed. Through extensive experiments, we present concrete evidence to generalize our research findings into real-world practice. We give a clear guideline on building the encoder based on the research findings captured while exploring numerous settings.
    Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting. (arXiv:2303.08331v1 [cs.CV])
    As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, thus achieving better video quality and transmission efficiency. However, a large number of chunks are expected to ensure good overfitting quality, which substantially increases the storage and consumes more bandwidth resources for data transmission. On the other hand, decreasing the number of chunks through training optimization techniques usually requires high model capacity, which significantly slows down execution speed. To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum. Additionally, we advance our method into a single overfitting model by a data-aware joint training technique, which further reduces the storage requirement with negligible quality drop. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality. Compared with the state-of-the-art, our method achieves 28 fps streaming speed with 41.6 PSNR, which is 14$\times$ faster and 2.29 dB better in the live video resolution upscaling tasks. Our codes are available at: https://github.com/coulsonlee/STDO-CVPR2023.git
    Fair Off-Policy Learning from Observational Data. (arXiv:2303.08516v1 [cs.LG])
    Businesses and organizations must ensure that their algorithmic decision-making is fair in order to meet legislative, ethical, and societal demands. For example, decision-making in automated hiring must not discriminate with respect to gender or race. To achieve this, prior research has contributed approaches that ensure algorithmic fairness in machine learning predictions, while comparatively little effort has focused on algorithmic fairness in decision models, specifically off-policy learning. In this paper, we propose a novel framework for fair off-policy learning: we learn decision rules from observational data under different notions of fairness, where we explicitly assume that observational data were collected under a different -- potentially biased -- behavioral policy. For this, we first formalize different fairness notions for off-policy learning. We then propose a machine learning approach to learn optimal policies under these fairness notions. Specifically, we reformulate the fairness notions into unconstrained learning objectives that can be estimated from finite samples. Here, we leverage machine learning to minimize the objective constrained on a fair representation of the data, so that the resulting policies satisfy our fairness notions. We further provide theoretical guarantees in form of generalization bounds for the finite-sample version of our framework. We demonstrate the effectiveness of our framework through extensive numerical experiments using both simulated and real-world data. As a result, our work enables algorithmic decision-making in a wide array of practical applications where fairness must ensured.
    Model Extraction Attacks on Split Federated Learning. (arXiv:2303.08581v1 [cs.LG])
    Federated Learning (FL) is a popular collaborative learning scheme involving multiple clients and a server. FL focuses on protecting clients' data but turns out to be highly vulnerable to Intellectual Property (IP) threats. Since FL periodically collects and distributes the model parameters, a free-rider can download the latest model and thus steal model IP. Split Federated Learning (SFL), a recent variant of FL that supports training with resource-constrained clients, splits the model into two, giving one part of the model to clients (client-side model), and the remaining part to the server (server-side model). Thus SFL prevents model leakage by design. Moreover, by blocking prediction queries, it can be made resistant to advanced IP threats such as traditional Model Extraction (ME) attacks. While SFL is better than FL in terms of providing IP protection, it is still vulnerable. In this paper, we expose the vulnerability of SFL and show how malicious clients can launch ME attacks by querying the gradient information from the server side. We propose five variants of ME attack which differs in the gradient usage as well as in the data assumptions. We show that under practical cases, the proposed ME attacks work exceptionally well for SFL. For instance, when the server-side model has five layers, our proposed ME attack can achieve over 90% accuracy with less than 2% accuracy degradation with VGG-11 on CIFAR-10.
    Delay-SDE-net: A deep learning approach for time series modelling with memory and uncertainty estimates. (arXiv:2303.08587v1 [cs.LG])
    To model time series accurately is important within a wide range of fields. As the world is generally too complex to be modelled exactly, it is often meaningful to assess the probability of a dynamical system to be in a specific state. This paper presents the Delay-SDE-net, a neural network model based on stochastic delay differential equations (SDDEs). The use of SDDEs with multiple delays as modelling framework makes it a suitable model for time series with memory effects, as it includes memory through previous states of the system. The stochastic part of the Delay-SDE-net provides a basis for estimating uncertainty in modelling, and is split into two neural networks to account for aleatoric and epistemic uncertainty. The uncertainty is provided instantly, making the model suitable for applications where time is sparse. We derive the theoretical error of the Delay-SDE-net and analyze the convergence rate numerically. At comparisons with similar models, the Delay-SDE-net has consistently the best performance, both in predicting time series values and uncertainties.
    Learning to Grow Artificial Hippocampi in Vision Transformers for Resilient Lifelong Learning. (arXiv:2303.08250v1 [cs.CV])
    Lifelong learning without catastrophic forgetting (i.e., resiliency) possessed by human intelligence is entangled with sophisticated memory mechanisms in the brain, especially the long-term memory (LM) maintained by Hippocampi. To a certain extent, Transformers have emerged as the counterpart ``Brain" of Artificial Intelligence (AI), and yet leave the LM component under-explored for lifelong learning settings. This paper presents a method of learning to grow Artificial Hippocampi (ArtiHippo) in Vision Transformers (ViTs) for resilient lifelong learning. With a comprehensive ablation study, the final linear projection layer in the multi-head self-attention (MHSA) block is selected in realizing and growing ArtiHippo. ArtiHippo is represented by a mixture of experts (MoEs). Each expert component is an on-site variant of the linear projection layer, maintained via neural architecture search (NAS) with the search space defined by four basic growing operations -- skip, reuse, adapt, and new in lifelong learning. The LM of a task consists of two parts: the dedicated expert components (as model parameters) at different layers of a ViT learned via NAS, and the mean class-tokens (as stored latent vectors for measuring task similarity) associated with the expert components. For a new task, a hierarchical task-similarity-oriented exploration-exploitation sampling based NAS is proposed to learn the expert components. The task similarity is measured based on the normalized cosine similarity between the mean class-token of the new task and those of old tasks. The proposed method is complementary to prompt-based lifelong learningwith ViTs. In experiments, the proposed method is tested on the challenging Visual Domain Decathlon (VDD) benchmark and the recently proposed 5-Dataset benchmark. It obtains consistently better performance than the prior art with sensible ArtiHippo learned continually.
    Stochastic Interpolants: A Unifying Framework for Flows and Diffusions. (arXiv:2303.08797v1 [cs.LG])
    We introduce a class of generative models based on the stochastic interpolant framework proposed in Albergo & Vanden-Eijnden (2023) that unifies flow-based and diffusion-based methods. We first show how to construct a broad class of continuous-time stochastic processes whose time-dependent probability density function bridges two arbitrary densities exactly in finite time. These `stochastic interpolants' are built by combining data from the two densities with an additional latent variable, and the specific details of the construction can be leveraged to shape the resulting time-dependent density in a flexible way. We then show that the time-dependent density of the stochastic interpolant satisfies a first-order transport equation as well as a family of forward and backward Fokker-Planck equations with tunable diffusion; upon consideration of the time evolution of an individual sample, this viewpoint immediately leads to both deterministic and stochastic generative models based on probability flow equations or stochastic differential equations with a tunable level of noise. The drift coefficients entering these models are time-dependent velocity fields characterized as the unique minimizers of simple quadratic objective functions, one of which is a new objective for the score of the interpolant density. Remarkably, we show that minimization of these quadratic objectives leads to control of the likelihood for generative models built upon stochastic dynamics; by contrast, we show that generative models based upon a deterministic dynamics must, in addition, control the Fisher divergence between the target and the model. Finally, we construct estimators for the likelihood and the cross-entropy of interpolant-based generative models, and demonstrate that such models recover the Schr\"odinger bridge between the two target densities when explicitly optimizing over the interpolant.
    Is forgetting less a good inductive bias for forward transfer?. (arXiv:2303.08207v1 [cs.LG])
    One of the main motivations of studying continual learning is that the problem setting allows a model to accrue knowledge from past tasks to learn new tasks more efficiently. However, recent studies suggest that the key metric that continual learning algorithms optimize, reduction in catastrophic forgetting, does not correlate well with the forward transfer of knowledge. We believe that the conclusion previous works reached is due to the way they measure forward transfer. We argue that the measure of forward transfer to a task should not be affected by the restrictions placed on the continual learner in order to preserve knowledge of previous tasks. Instead, forward transfer should be measured by how easy it is to learn a new task given a set of representations produced by continual learning on previous tasks. Under this notion of forward transfer, we evaluate different continual learning algorithms on a variety of image classification benchmarks. Our results indicate that less forgetful representations lead to a better forward transfer suggesting a strong correlation between retaining past information and learning efficiency on new tasks. Further, we found less forgetful representations to be more diverse and discriminative compared to their forgetful counterparts.
    PULSNAR -- Positive unlabeled learning selected not at random: class proportion estimation when the SCAR assumption does not hold. (arXiv:2303.08269v1 [cs.LG])
    Positive and Unlabeled (PU) learning is a type of semi-supervised binary classification where the machine learning algorithm differentiates between a set of positive instances (labeled) and a set of both positive and negative instances (unlabeled). PU learning has broad applications in settings where confirmed negatives are unavailable or difficult to obtain, and there is value in discovering positives among the unlabeled (e.g., viable drugs among untested compounds). Most PU learning algorithms make the selected completely at random (SCAR) assumption, namely that positives are selected independently of their features. However, in many real-world applications, such as healthcare, positives are not SCAR (e.g., severe cases are more likely to be diagnosed), leading to a poor estimate of the proportion, $\alpha$, of positives among unlabeled examples and poor model calibration, resulting in an uncertain decision threshold for selecting positives. PU learning algorithms can estimate $\alpha$ or the probability of an individual unlabeled instance being positive or both. We propose two PU learning algorithms to estimate $\alpha$, calculate calibrated probabilities for PU instances, and improve classification metrics: i) PULSCAR (positive unlabeled learning selected completely at random), and ii) PULSNAR (positive unlabeled learning selected not at random). PULSNAR uses a divide-and-conquer approach that creates and solves several SCAR-like sub-problems using PULSCAR. In our experiments, PULSNAR outperformed state-of-the-art approaches on both synthetic and real-world benchmark datasets.
    FairAdaBN: Mitigating unfairness with adaptive batch normalization and its application to dermatological disease classification. (arXiv:2303.08325v1 [cs.LG])
    Deep learning is becoming increasingly ubiquitous in medical research and applications while involving sensitive information and even critical diagnosis decisions. Researchers observe a significant performance disparity among subgroups with different demographic attributes, which is called model unfairness, and put lots of effort into carefully designing elegant architectures to address unfairness, which poses heavy training burden, brings poor generalization, and reveals the trade-off between model performance and fairness. To tackle these issues, we propose FairAdaBN by making batch normalization adaptive to sensitive attribute. This simple but effective design can be adopted to several classification backbones that are originally unaware of fairness. Additionally, we derive a novel loss function that restrains statistical parity between subgroups on mini-batches, encouraging the model to converge with considerable fairness. In order to evaluate the trade-off between model performance and fairness, we propose a new metric, named Fairness-Accuracy Trade-off Efficiency (FATE), to compute normalized fairness improvement over accuracy drop. Experiments on two dermatological datasets show that our proposed method outperforms other methods on fairness criteria and FATE.
    DualFair: Fair Representation Learning at Both Group and Individual Levels via Contrastive Self-supervision. (arXiv:2303.08403v1 [cs.LG])
    Algorithmic fairness has become an important machine learning problem, especially for mission-critical Web applications. This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations. Unlike existing models that target a single type of fairness, our model jointly optimizes for two fairness criteria - group fairness and counterfactual fairness - and hence makes fairer predictions at both the group and individual levels. Our model uses contrastive loss to generate embeddings that are indistinguishable for each protected group, while forcing the embeddings of counterfactual pairs to be similar. It then uses a self-knowledge distillation method to maintain the quality of representation for the downstream tasks. Extensive analysis over multiple datasets confirms the model's validity and further shows the synergy of jointly addressing two fairness criteria, suggesting the model's potential value in fair intelligent Web applications.
    Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models. (arXiv:2303.08343v1 [eess.AS])
    Continued improvements in machine learning techniques offer exciting new opportunities through the use of larger models and larger training datasets. However, there is a growing need to offer these new capabilities on-board low-powered devices such as smartphones, wearables and other embedded environments where only low memory is available. Towards this, we consider methods to reduce the model size of Conformer-based speech recognition models which typically require models with greater than 100M parameters down to just $5$M parameters while minimizing impact on model quality. Such a model allows us to achieve always-on ambient speech recognition on edge devices with low-memory neural processors. We propose model weight reuse at different levels within our model architecture: (i) repeating full conformer block layers, (ii) sharing specific conformer modules across layers, (iii) sharing sub-components per conformer module, and (iv) sharing decomposed sub-component weights after low-rank decomposition. By sharing weights at different levels of our model, we can retain the full model in-memory while increasing the number of virtual transformations applied to the input. Through a series of ablation studies and evaluations, we find that with weight sharing and a low-rank architecture, we can achieve a WER of 2.84 and 2.94 for Librispeech dev-clean and test-clean respectively with a $5$M parameter model.
    A Comprehensive Study on Post-Training Quantization for Large Language Models. (arXiv:2303.08302v1 [cs.LG])
    Post-training quantization (\ptq) had been recently shown as a compromising method to reduce the memory consumption and/or compute cost for large language models. However, a comprehensive study about the effect of different quantization schemes, different model families, different \ptq methods, different quantization bit precision, etc, is still missing. In this work, we provide an extensive study on those components over tens of thousands of zero-shot experiments. Our results show that (1) Fine-grained quantization and \ptq methods (instead of naive round-to-nearest quantization) are necessary to achieve good accuracy and (2) Higher bits (e.g., 5 bits) with coarse-grained quantization is more powerful than lower bits (e.g., 4 bits) with very fine-grained quantization (whose effective bits is similar to 5-bits). We also present recommendations about how to utilize quantization for \llms with different sizes, and leave suggestions of future opportunities and system work that are not resolved in this work.
    Attention-likelihood relationship in transformers. (arXiv:2303.08288v1 [cs.CL])
    We analyze how large language models (LLMs) represent out-of-context words, investigating their reliance on the given context to capture their semantics. Our likelihood-guided text perturbations reveal a correlation between token likelihood and attention values in transformer-based language models. Extensive experiments reveal that unexpected tokens cause the model to attend less to the information coming from themselves to compute their representations, particularly at higher layers. These findings have valuable implications for assessing the robustness of LLMs in real-world scenarios. Fully reproducible codebase at https://github.com/Flegyas/AttentionLikelihood.
    DeDA: Deep Directed Accumulator. (arXiv:2303.08434v1 [eess.IV])
    Chronic active multiple sclerosis lesions, also termed as rim+ lesions, can be characterized by a hyperintense rim at the edge of the lesion on quantitative susceptibility maps. These rim+ lesions exhibit a geometrically simple structure, where gradients at the lesion edge are radially oriented and a greater magnitude of gradients is observed in contrast to rim- (non rim+) lesions. However, recent studies have shown that the identification performance of such lesions remains unsatisfied due to the limited amount of data and high class imbalance. In this paper, we propose a simple yet effective image processing operation, deep directed accumulator (DeDA), that provides a new perspective for injecting domain-specific inductive biases (priors) into neural networks for rim+ lesion identification. Given a feature map and a set of sampling grids, DeDA creates and quantizes an accumulator space into finite intervals, and accumulates feature values accordingly. This DeDA operation is a generalized discrete Radon transform and can also be regarded as a symmetric operation to the grid sampling within the forward-backward neural network framework, the process of which is order-agnostic, and can be efficiently implemented with the native CUDA programming. Experimental results on a dataset with 177 rim+ and 3986 rim- lesions show that 10.1% of improvement in a partial (false positive rate<0.1) area under the receiver operating characteristic curve (pROC AUC) and 10.2% of improvement in an area under the precision recall curve (PR AUC) can be achieved respectively comparing to other state-of-the-art methods. The source code is available online at https://github.com/tinymilky/DeDA
    Physics-Informed Optical Kernel Regression Using Complex-valued Neural Fields. (arXiv:2303.08435v1 [cs.CV])
    Lithography is fundamental to integrated circuit fabrication, necessitating large computation overhead. The advancement of machine learning (ML)-based lithography models alleviates the trade-offs between manufacturing process expense and capability. However, all previous methods regard the lithography system as an image-to-image black box mapping, utilizing network parameters to learn by rote mappings from massive mask-to-aerial or mask-to-resist image pairs, resulting in poor generalization capability. In this paper, we propose a new ML-based paradigm disassembling the rigorous lithographic model into non-parametric mask operations and learned optical kernels containing determinant source, pupil, and lithography information. By optimizing complex-valued neural fields to perform optical kernel regression from coordinates, our method can accurately restore lithography system using a small-scale training dataset with fewer parameters, demonstrating superior generalization capability as well. Experiments show that our framework can use 31\% of parameters while achieving 69$\times$ smaller mean squared error with 1.3$\times$ higher throughput than the state-of-the-art.
    Chat with the Environment: Interactive Multimodal Perception using Large Language Models. (arXiv:2303.08268v1 [cs.RO])
    Programming robot behaviour in a complex world faces challenges on multiple levels, from dextrous low-level skills to high-level planning and reasoning. Recent pre-trained Large Language Models (LLMs) have shown remarkable reasoning ability in zero-shot robotic planning. However, it remains challenging to ground LLMs in multimodal sensory input and continuous action output, while enabling a robot to interact with its environment and acquire novel information as its policies unfold. We develop a robot interaction scenario with a partially observable state, which necessitates a robot to decide on a range of epistemic actions in order to sample sensory information among multiple modalities, before being able to execute the task correctly. An interactive perception framework is therefore proposed with an LLM as its backbone, whose ability is exploited to instruct epistemic actions and to reason over the resulting multimodal sensations (vision, sound, haptics, proprioception), as well as to plan an entire task execution based on the interactively acquired information. Our study demonstrates that LLMs can provide high-level planning and reasoning skills and control interactive robot behaviour in a multimodal environment, while multimodal modules with the context of the environmental state help ground the LLMs and extend their processing ability.
    Towards Cooperative Federated Learning over Heterogeneous Edge/Fog Networks. (arXiv:2303.08361v1 [cs.DC])
    Federated learning (FL) has been promoted as a popular technique for training machine learning (ML) models over edge/fog networks. Traditional implementations of FL have largely neglected the potential for inter-network cooperation, treating edge/fog devices and other infrastructure participating in ML as separate processing elements. Consequently, FL has been vulnerable to several dimensions of network heterogeneity, such as varying computation capabilities, communication resources, data qualities, and privacy demands. We advocate for cooperative federated learning (CFL), a cooperative edge/fog ML paradigm built on device-to-device (D2D) and device-to-server (D2S) interactions. Through D2D and D2S cooperation, CFL counteracts network heterogeneity in edge/fog networks through enabling a model/data/resource pooling mechanism, which will yield substantial improvements in ML model training quality and network resource consumption. We propose a set of core methodologies that form the foundation of D2D and D2S cooperation and present preliminary experiments that demonstrate their benefits. We also discuss new FL functionalities enabled by this cooperative framework such as the integration of unlabeled data and heterogeneous device privacy into ML model training. Finally, we describe some open research directions at the intersection of cooperative edge/fog and FL.
    R^2: Range Regularization for Model Compression and Quantization. (arXiv:2303.08253v1 [cs.LG])
    Model parameter regularization is a widely used technique to improve generalization, but also can be used to shape the weight distributions for various purposes. In this work, we shed light on how weight regularization can assist model quantization and compression techniques, and then propose range regularization (R^2) to further boost the quality of model optimization by focusing on the outlier prevention. By effectively regulating the minimum and maximum weight values from a distribution, we mold the overall distribution into a tight shape so that model compression and quantization techniques can better utilize their limited numeric representation powers. We introduce L-inf regularization, its extension margin regularization and a new soft-min-max regularization to be used as a regularization loss during full-precision model training. Coupled with state-of-the-art quantization and compression techniques, models trained with R^2 perform better on an average, specifically at lower bit weights with 16x compression ratio. We also demonstrate that R^2 helps parameter constrained models like MobileNetV1 achieve significant improvement of around 8% for 2 bit quantization and 7% for 1 bit compression.
    Marginalising over Stationary Kernels with Bayesian Quadrature. (arXiv:2106.07452v3 [stat.ML] UPDATED)
    Marginalising over families of Gaussian Process kernels produces flexible model classes with well-calibrated uncertainty estimates. Existing approaches require likelihood evaluations of many kernels, rendering them prohibitively expensive for larger datasets. We propose a Bayesian Quadrature scheme to make this marginalisation more efficient and thereby more practical. Through use of the maximum mean discrepancies between distributions, we define a kernel over kernels that captures invariances between Spectral Mixture (SM) Kernels. Kernel samples are selected by generalising an information-theoretic acquisition function for warped Bayesian Quadrature. We show that our framework achieves more accurate predictions with better calibrated uncertainty than state-of-the-art baselines, especially when given limited (wall-clock) time budgets.
    Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring. (arXiv:2303.08536v1 [cs.MM])
    This paper deals with Audio-Visual Speech Recognition (AVSR) under multimodal input corruption situations where audio inputs and visual inputs are both corrupted, which is not well addressed in previous research directions. Previous studies have focused on how to complement the corrupted audio inputs with the clean visual inputs with the assumption of the availability of clean visual inputs. However, in real life, clean visual inputs are not always accessible and can even be corrupted by occluded lip regions or noises. Thus, we firstly analyze that the previous AVSR models are not indeed robust to the corruption of multimodal input streams, the audio and the visual inputs, compared to uni-modal models. Then, we design multimodal input corruption modeling to develop robust AVSR models. Lastly, we propose a novel AVSR framework, namely Audio-Visual Reliability Scoring module (AV-RelScore), that is robust to the corrupted multimodal inputs. The AV-RelScore can determine which input modal stream is reliable or not for the prediction and also can exploit the more reliable streams in prediction. The effectiveness of the proposed method is evaluated with comprehensive experiments on popular benchmark databases, LRS2 and LRS3. We also show that the reliability scores obtained by AV-RelScore well reflect the degree of corruption and make the proposed model focus on the reliable multimodal representations.
    Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model. (arXiv:2303.08613v1 [cs.LG])
    We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf. Such a problem is modeled as a Stackelberg game between the principal and the agent, where the principal announces a scoring rule that specifies the payment, and then the agent then chooses an effort level that maximizes her own profit and reports the information. We study the online setting of such a problem from the principal's perspective, i.e., designing the optimal scoring rule by repeatedly interacting with the strategic agent. We design a provably sample efficient algorithm that tailors the UCB algorithm (Auer et al., 2002) to our model, which achieves a sublinear $T^{2/3}$-regret after $T$ iterations. Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized. Furthermore, a key feature of our regret bound is that it is independent of the number of states of the environment.
    RGI : Regularized Graph Infomax for self-supervised learning on graphs. (arXiv:2303.08644v1 [cs.LG])
    Self-supervised learning is gaining considerable attention as a solution to avoid the requirement of extensive annotations in representation learning on graphs. We introduce \textit{Regularized Graph Infomax (RGI)}, a simple yet effective framework for node level self-supervised learning on graphs that trains a graph neural network encoder by maximizing the mutual information between node level local and global views, in contrast to previous works that employ graph level global views. The method promotes the predictability between views while regularizing the covariance matrices of the representations. Therefore, RGI is non-contrastive, does not depend on complex asymmetric architectures nor training tricks, is augmentation-free and does not rely on a two branch architecture. We run RGI on both transductive and inductive settings with popular graph benchmarks and show that it can achieve state-of-the-art performance regardless of its simplicity.
    Fully neuromorphic vision and control for autonomous drone flight. (arXiv:2303.08778v1 [cs.RO])
    Biological sensing and processing is asynchronous and sparse, leading to low-latency and energy-efficient perception and action. In robotics, neuromorphic hardware for event-based vision and spiking neural networks promises to exhibit similar characteristics. However, robotic implementations have been limited to basic tasks with low-dimensional sensory inputs and motor actions due to the restricted network size in current embedded neuromorphic processors and the difficulties of training spiking neural networks. Here, we present the first fully neuromorphic vision-to-control pipeline for controlling a freely flying drone. Specifically, we train a spiking neural network that accepts high-dimensional raw event-based camera data and outputs low-level control actions for performing autonomous vision-based flight. The vision part of the network, consisting of five layers and 28.8k neurons, maps incoming raw events to ego-motion estimates and is trained with self-supervised learning on real event data. The control part consists of a single decoding layer and is learned with an evolutionary algorithm in a drone simulator. Robotic experiments show a successful sim-to-real transfer of the fully learned neuromorphic pipeline. The drone can accurately follow different ego-motion setpoints, allowing for hovering, landing, and maneuvering sideways$\unicode{x2014}$even while yawing at the same time. The neuromorphic pipeline runs on board on Intel's Loihi neuromorphic processor with an execution frequency of 200 Hz, spending only 27 $\unicode{x00b5}$J per inference. These results illustrate the potential of neuromorphic sensing and processing for enabling smaller, more intelligent robots.
    Replay Buffer With Local Forgetting for Adaptive Deep Model-Based Reinforcement Learning. (arXiv:2303.08690v1 [cs.LG])
    One of the key behavioral characteristics used in neuroscience to determine whether the subject of study -- be it a rodent or a human -- exhibits model-based learning is effective adaptation to local changes in the environment. In reinforcement learning, however, recent work has shown that modern deep model-based reinforcement-learning (MBRL) methods adapt poorly to such changes. An explanation for this mismatch is that MBRL methods are typically designed with sample-efficiency on a single task in mind and the requirements for effective adaptation are substantially higher, both in terms of the learned world model and the planning routine. One particularly challenging requirement is that the learned world model has to be sufficiently accurate throughout relevant parts of the state-space. This is challenging for deep-learning-based world models due to catastrophic forgetting. And while a replay buffer can mitigate the effects of catastrophic forgetting, the traditional first-in-first-out replay buffer precludes effective adaptation due to maintaining stale data. In this work, we show that a conceptually simple variation of this traditional replay buffer is able to overcome this limitation. By removing only samples from the buffer from the local neighbourhood of the newly observed samples, deep world models can be built that maintain their accuracy across the state-space, while also being able to effectively adapt to changes in the reward function. We demonstrate this by applying our replay-buffer variation to a deep version of the classical Dyna method, as well as to recent methods such as PlaNet and DreamerV2, demonstrating that deep model-based methods can adapt effectively as well to local changes in the environment.
    Unsupervised Traffic Scene Generation with Synthetic 3D Scene Graphs. (arXiv:2303.08473v1 [cs.CV])
    Image synthesis driven by computer graphics achieved recently a remarkable realism, yet synthetic image data generated this way reveals a significant domain gap with respect to real-world data. This is especially true in autonomous driving scenarios, which represent a critical aspect for overcoming utilizing synthetic data for training neural networks. We propose a method based on domain-invariant scene representation to directly synthesize traffic scene imagery without rendering. Specifically, we rely on synthetic scene graphs as our internal representation and introduce an unsupervised neural network architecture for realistic traffic scene synthesis. We enhance synthetic scene graphs with spatial information about the scene and demonstrate the effectiveness of our approach through scene manipulation.
    DeepAxe: A Framework for Exploration of Approximation and Reliability Trade-offs in DNN Accelerators. (arXiv:2303.08226v1 [cs.LG])
    While the role of Deep Neural Networks (DNNs) in a wide range of safety-critical applications is expanding, emerging DNNs experience massive growth in terms of computation power. It raises the necessity of improving the reliability of DNN accelerators yet reducing the computational burden on the hardware platforms, i.e. reducing the energy consumption and execution time as well as increasing the efficiency of DNN accelerators. Therefore, the trade-off between hardware performance, i.e. area, power and delay, and the reliability of the DNN accelerator implementation becomes critical and requires tools for analysis. In this paper, we propose a framework DeepAxe for design space exploration for FPGA-based implementation of DNNs by considering the trilateral impact of applying functional approximation on accuracy, reliability and hardware performance. The framework enables selective approximation of reliability-critical DNNs, providing a set of Pareto-optimal DNN implementation design space points for the target resource utilization requirements. The design flow starts with a pre-trained network in Keras, uses an innovative high-level synthesis environment DeepHLS and results in a set of Pareto-optimal design space points as a guide for the designer. The framework is demonstrated in a case study of custom and state-of-the-art DNNs and datasets.
    Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators. (arXiv:2303.08431v1 [cs.LG])
    Nonlinear control systems with partial information to the decision maker are prevalent in a variety of applications. As a step toward studying such nonlinear systems, this work explores reinforcement learning methods for finding the optimal policy in the nearly linear-quadratic regulator systems. In particular, we consider a dynamic system that combines linear and nonlinear components, and is governed by a policy with the same structure. Assuming that the nonlinear component comprises kernels with small Lipschitz coefficients, we characterize the optimization landscape of the cost function. Although the cost function is nonconvex in general, we establish the local strong convexity and smoothness in the vicinity of the global optimizer. Additionally, we propose an initialization mechanism to leverage these properties. Building on the developments, we design a policy gradient algorithm that is guaranteed to converge to the globally optimal policy with a linear rate.
    SegPrompt: Using Segmentation Map as a Better Prompt to Finetune Deep Models for Kidney Stone Classification. (arXiv:2303.08303v1 [cs.CV])
    Recently, deep learning has produced encouraging results for kidney stone classification using endoscope images. However, the shortage of annotated training data poses a severe problem in improving the performance and generalization ability of the trained model. It is thus crucial to fully exploit the limited data at hand. In this paper, we propose SegPrompt to alleviate the data shortage problems by exploiting segmentation maps from two aspects. First, SegPrompt integrates segmentation maps to facilitate classification training so that the classification model is aware of the regions of interest. The proposed method allows the image and segmentation tokens to interact with each other to fully utilize the segmentation map information. Second, we use the segmentation maps as prompts to tune the pretrained deep model, resulting in much fewer trainable parameters than vanilla finetuning. We perform extensive experiments on the collected kidney stone dataset. The results show that SegPrompt can achieve an advantageous balance between the model fitting ability and the generalization ability, eventually leading to an effective model with limited training data.
    On the uncertainty analysis of the data-enabled physics-informed neural network for solving neutron diffusion eigenvalue problem. (arXiv:2303.08455v1 [cs.LG])
    In practical engineering experiments, the data obtained through detectors are inevitably noisy. For the already proposed data-enabled physics-informed neural network (DEPINN) \citep{DEPINN}, we investigate the performance of DEPINN in calculating the neutron diffusion eigenvalue problem from several perspectives when the prior data contain different scales of noise. Further, in order to reduce the effect of noise and improve the utilization of the noisy prior data, we propose innovative interval loss functions and give some rigorous mathematical proofs. The robustness of DEPINN is examined on two typical benchmark problems through a large number of numerical results, and the effectiveness of the proposed interval loss function is demonstrated by comparison. This paper confirms the feasibility of the improved DEPINN for practical engineering applications in nuclear reactor physics.
    Optimization Design for Federated Learning in Heterogeneous 6G Networks. (arXiv:2303.08322v1 [cs.LG])
    With the rapid advancement of 5G networks, billions of smart Internet of Things (IoT) devices along with an enormous amount of data are generated at the network edge. While still at an early age, it is expected that the evolving 6G network will adopt advanced artificial intelligence (AI) technologies to collect, transmit, and learn this valuable data for innovative applications and intelligent services. However, traditional machine learning (ML) approaches require centralizing the training data in the data center or cloud, raising serious user-privacy concerns. Federated learning, as an emerging distributed AI paradigm with privacy-preserving nature, is anticipated to be a key enabler for achieving ubiquitous AI in 6G networks. However, there are several system and statistical heterogeneity challenges for effective and efficient FL implementation in 6G networks. In this article, we investigate the optimization approaches that can effectively address the challenging heterogeneity issues from three aspects: incentive mechanism design, network resource management, and personalized model optimization. We also present some open problems and promising directions for future research.
    DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. (arXiv:2208.12242v2 [cs.CV] UPDATED)
    Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering, all while preserving the subject's key features. We also provide a new dataset and evaluation protocol for this new task of subject-driven generation. Project page: https://dreambooth.github.io/
    Muti-Agent Proximal Policy Optimization For Data Freshness in UAV-assisted Networks. (arXiv:2303.08680v1 [math.OC])
    Unmanned aerial vehicles (UAVs) are seen as a promising technology to perform a wide range of tasks in wireless communication networks. In this work, we consider the deployment of a group of UAVs to collect the data generated by IoT devices. Specifically, we focus on the case where the collected data is time-sensitive, and it is critical to maintain its timeliness. Our objective is to optimally design the UAVs' trajectories and the subsets of visited IoT devices such as the global Age-of-Updates (AoU) is minimized. To this end, we formulate the studied problem as a mixed-integer nonlinear programming (MINLP) under time and quality of service constraints. To efficiently solve the resulting optimization problem, we investigate the cooperative Multi-Agent Reinforcement Learning (MARL) framework and propose an RL approach based on the popular on-policy Reinforcement Learning (RL) algorithm: Policy Proximal Optimization (PPO). Our approach leverages the centralized training decentralized execution (CTDE) framework where the UAVs learn their optimal policies while training a centralized value function. Our simulation results show that the proposed MAPPO approach reduces the global AoU by at least a factor of 1/2 compared to conventional off-policy reinforcement learning approaches.
    Automatic Attention Pruning: Improving and Automating Model Pruning using Attentions. (arXiv:2303.08595v1 [cs.LG])
    Pruning is a promising approach to compress deep learning models in order to deploy them on resource-constrained edge devices. However, many existing pruning solutions are based on unstructured pruning, which yields models that cannot efficiently run on commodity hardware; and they often require users to manually explore and tune the pruning process, which is time-consuming and often leads to sub-optimal results. To address these limitations, this paper presents Automatic Attention Pruning (AAP), an adaptive, attention-based, structured pruning approach to automatically generate small, accurate, and hardware-efficient models that meet user objectives. First, it proposes iterative structured pruning using activation-based attention maps to effectively identify and prune unimportant filters. Then, it proposes adaptive pruning policies for automatically meeting the pruning objectives of accuracy-critical, memory-constrained, and latency-sensitive tasks. A comprehensive evaluation shows that AAP substantially outperforms the state-of-the-art structured pruning works for a variety of model architectures. Our code is at: https://github.com/kaiqi123/Automatic-Attention-Pruning.git.
    Vehicle lateral control using Machine Learning for automated vehicle guidance. (arXiv:2303.08187v1 [cs.LG])
    Uncertainty in decision-making is crucial in the machine learning model used for a safety-critical system that operates in the real world. Therefore, it is important to handle uncertainty in a graceful manner for the safe operation of the CPS. In this work, we design a vehicle's lateral controller using a machine-learning model. To this end, we train a random forest model that is an ensemble model and a deep neural network model. Due to the ensemble in the random forest model, we can predict the confidence/uncertainty in the prediction. We train our controller on data generated from running the car on one track in the simulator and tested it on other tracks. Due to prediction in confidence, we could decide when the controller is less confident in prediction and takes control if needed. We have two results to share: first, even on a very small number of labeled data, a very good generalization capability of the random forest-based regressor in comparison with a deep neural network and accordingly random forest controller can drive on another similar track, where the deep neural network-based model fails to drive, and second confidence in predictions in random forest controller makes it possible to let us know when the controller is not confident in prediction and likely to fail. By creating a threshold, it was possible to take control when the controller is not safe and that is missing in a deep neural network-based controller.  ( 3 min )
    Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data. (arXiv:2303.08242v1 [stat.ML])
    The Internet of Things (IoT) system generates massive high-speed temporally correlated streaming data and is often connected with online inference tasks under computational or energy constraints. Online analysis of these streaming time series data often faces a trade-off between statistical efficiency and computational cost. One important approach to balance this trade-off is sampling, where only a small portion of the sample is selected for the model fitting and update. Motivated by the demands of dynamic relationship analysis of IoT system, we study the data-dependent sample selection and online inference problem for a multi-dimensional streaming time series, aiming to provide low-cost real-time analysis of high-speed power grid electricity consumption data. Inspired by D-optimality criterion in design of experiments, we propose a class of online data reduction methods that achieve an optimal sampling criterion and improve the computational efficiency of the online analysis. We show that the optimal solution amounts to a strategy that is a mixture of Bernoulli sampling and leverage score sampling. The leverage score sampling involves auxiliary estimations that have a computational advantage over recursive least squares updates. Theoretical properties of the auxiliary estimations involved are also discussed. When applied to European power grid consumption data, the proposed leverage score based sampling methods outperform the benchmark sampling method in online estimation and prediction. The general applicability of the sampling-assisted online estimation method is assessed via simulation studies.  ( 3 min )
    Digital staining in optical microscopy using deep learning -- a review. (arXiv:2303.08140v1 [eess.IV])
    Until recently, conventional biochemical staining had the undisputed status as well-established benchmark for most biomedical problems related to clinical diagnostics, fundamental research and biotechnology. Despite this role as gold-standard, staining protocols face several challenges, such as a need for extensive, manual processing of samples, substantial time delays, altered tissue homeostasis, limited choice of contrast agents for a given sample, 2D imaging instead of 3D tomography and many more. Label-free optical technologies, on the other hand, do not rely on exogenous and artificial markers, by exploiting intrinsic optical contrast mechanisms, where the specificity is typically less obvious to the human observer. Over the past few years, digital staining has emerged as a promising concept to use modern deep learning for the translation from optical contrast to established biochemical contrast of actual stainings. In this review article, we provide an in-depth analysis of the current state-of-the-art in this field, suggest methods of good practice, identify pitfalls and challenges and postulate promising advances towards potential future implementations and applications.  ( 2 min )
    Dataset Management Platform for Machine Learning. (arXiv:2303.08301v1 [cs.DB])
    The quality of the data in a dataset can have a substantial impact on the performance of a machine learning model that is trained and/or evaluated using the dataset. Effective dataset management, including tasks such as data cleanup, versioning, access control, dataset transformation, automation, integrity and security, etc., can help improve the efficiency and speed of the machine learning process. Currently, engineers spend a substantial amount of manual effort and time to manage dataset versions or to prepare datasets for machine learning tasks. This disclosure describes a platform to manage and use datasets effectively. The techniques integrate dataset management and dataset transformation mechanisms. A storage engine is described that acts as a source of truth for all data and handles versioning, access control etc. The dataset transformation mechanism is a key part to generate a dataset (snapshot) to serve different purposes. The described techniques can support different workflows, pipelines, or data orchestration needs, e.g., for training and/or evaluation of machine learning models.  ( 2 min )
    Few-Shot Classification of Autism Spectrum Disorder using Site-Agnostic Meta-Learning and Brain MRI. (arXiv:2303.08224v1 [eess.IV])
    For machine learning applications in medical imaging, the availability of training data is often limited, which hampers the design of radiological classifiers for subtle conditions such as autism spectrum disorder (ASD). Transfer learning is one method to counter this problem of low training data regimes. Here we explore the use of meta-learning for very low data regimes in the context of having prior data from multiple sites - an approach we term site-agnostic meta-learning. Inspired by the effectiveness of meta-learning for optimizing a model across multiple tasks, here we propose a framework to adapt it to learn across multiple sites. We tested our meta-learning model for classifying ASD versus typically developing controls in 2,201 T1-weighted (T1-w) MRI scans collected from 38 imaging sites as part of Autism Brain Imaging Data Exchange (ABIDE) [age: 5.2-64.0 years]. The method was trained to find a good initialization state for our model that can quickly adapt to data from new unseen sites by fine-tuning on the limited data that is available. The proposed method achieved an ROC-AUC=0.857 on 370 scans from 7 unseen sites in ABIDE using a few-shot setting of 2-way 20-shot i.e., 20 training samples per site. Our results outperformed a transfer learning baseline by generalizing across a wider range of sites as well as other related prior work. We also tested our model in a zero-shot setting on an independent test site without any additional fine-tuning. Our experiments show the promise of the proposed site-agnostic meta-learning framework for challenging neuroimaging tasks involving multi-site heterogeneity with limited availability of training data.  ( 3 min )
    Linking Alternative Fuel Vehicles Adoption with Socioeconomic Status and Air Quality Index. (arXiv:2303.08286v1 [cs.AI])
    This is a study on the potential widespread usage of alternative fuel vehicles, linking them with the socio-economic status of the respective consumers as well as the impact on the resulting air quality index. Research in this area aims to leverage machine learning techniques in order to promote appropriate policies for the proliferation of alternative fuel vehicles such as electric vehicles with due justice to different population groups. Pearson correlation coefficient is deployed in the modeling the relationships between socio-economic data, air quality index and data on alternative fuel vehicles. Linear regression is used to conduct predictive modeling on air quality index as per the adoption of alternative fuel vehicles, based on socio-economic factors. This work exemplifies artificial intelligence for social good.  ( 2 min )
    Artificial intelligence for artificial materials: moir\'e atom. (arXiv:2303.08162v1 [cond-mat.str-el])
    Moir\'e engineering in atomically thin van der Waals heterostructures creates artificial quantum materials with designer properties. We solve the many-body problem of interacting electrons confined to a moir\'e superlattice potential minimum (the moir\'e atom) using a 2D fermionic neural network. We show that strong Coulomb interactions in combination with the anisotropic moir\'e potential lead to striking ``Wigner molecule" charge density distributions observable with scanning tunneling microscopy.  ( 2 min )
    Improving 3D Imaging with Pre-Trained Perpendicular 2D Diffusion Models. (arXiv:2303.08440v1 [eess.IV])
    Diffusion models have become a popular approach for image generation and reconstruction due to their numerous advantages. However, most diffusion-based inverse problem-solving methods only deal with 2D images, and even recently published 3D methods do not fully exploit the 3D distribution prior. To address this, we propose a novel approach using two perpendicular pre-trained 2D diffusion models to solve the 3D inverse problem. By modeling the 3D data distribution as a product of 2D distributions sliced in different directions, our method effectively addresses the curse of dimensionality. Our experimental results demonstrate that our method is highly effective for 3D medical image reconstruction tasks, including MRI Z-axis super-resolution, compressed sensing MRI, and sparse-view CT. Our method can generate high-quality voxel volumes suitable for medical applications.  ( 2 min )
    Model-to-Circuit Cross-Approximation For Printed Machine Learning Classifiers. (arXiv:2303.08255v1 [cs.LG])
    Printed electronics (PE) promises on-demand fabrication, low non-recurring engineering costs, and sub-cent fabrication costs. It also allows for high customization that would be infeasible in silicon, and bespoke architectures prevail to improve the efficiency of emerging PE machine learning (ML) applications. Nevertheless, large feature sizes in PE prohibit the realization of complex ML models in PE, even with bespoke architectures. In this work, we present an automated, cross-layer approximation framework tailored to bespoke architectures that enable complex ML models, such as Multi-Layer Perceptrons (MLPs) and Support Vector Machines (SVMs), in PE. Our framework adopts cooperatively a hardware-driven coefficient approximation of the ML model at algorithmic level, a netlist pruning at logic level, and a voltage over-scaling at the circuit level. Extensive experimental evaluation on 12 MLPs and 12 SVMs and more than 6000 approximate and exact designs demonstrates that our model-to-circuit cross-approximation delivers power and area optimal designs that, compared to the state-of-the-art exact designs, feature on average 51% and 66% area and power reduction, respectively, for less than 5% accuracy loss. Finally, we demonstrate that our framework enables 80% of the examined classifiers to be battery-powered with almost identical accuracy with the exact designs, paving thus the way towards smart complex printed applications.  ( 2 min )
    The Elements of Visual Art Recommendation: Learning Latent Semantic Representations of Paintings. (arXiv:2303.08182v1 [cs.IR])
    Artwork recommendation is challenging because it requires understanding how users interact with highly subjective content, the complexity of the concepts embedded within the artwork, and the emotional and cognitive reflections they may trigger in users. In this paper, we focus on efficiently capturing the elements (i.e., latent semantic relationships) of visual art for personalized recommendation. We propose and study recommender systems based on textual and visual feature learning techniques, as well as their combinations. We then perform a small-scale and a large-scale user-centric evaluation of the quality of the recommendations. Our results indicate that textual features compare favourably with visual ones, whereas a fusion of both captures the most suitable hidden semantic relationships for artwork recommendation. Ultimately, this paper contributes to our understanding of how to deliver content that suitably matches the user's interests and how they are perceived.  ( 2 min )
    Allegro-Legato: Scalable, Fast, and Robust Neural-Network Quantum Molecular Dynamics via Sharpness-Aware Minimization. (arXiv:2303.08169v1 [cs.DC])
    Neural-network quantum molecular dynamics (NNQMD) simulations based on machine learning are revolutionizing atomistic simulations of materials by providing quantum-mechanical accuracy but orders-of-magnitude faster, illustrated by ACM Gordon Bell prize (2020) and finalist (2021). State-of-the-art (SOTA) NNQMD model founded on group theory featuring rotational equivariance and local descriptors has provided much higher accuracy and speed than those models, thus named Allegro (meaning fast). On massively parallel supercomputers, however, it suffers a fidelity-scaling problem, where growing number of unphysical predictions of interatomic forces prohibits simulations involving larger numbers of atoms for longer times. Here, we solve this problem by combining the Allegro model with sharpness aware minimization (SAM) for enhancing the robustness of model through improved smoothness of the loss landscape. The resulting Allegro-Legato (meaning fast and "smooth") model was shown to elongate the time-to-failure $t_\textrm{failure}$, without sacrificing computational speed or accuracy. Specifically, Allegro-Legato exhibits much weaker dependence of timei-to-failure on the problem size, $t_{\textrm{failure}} \propto N^{-0.14}$ ($N$ is the number of atoms) compared to the SOTA Allegro model $\left(t_{\textrm{failure}} \propto N^{-0.29}\right)$, i.e., systematically delayed time-to-failure, thus allowing much larger and longer NNQMD simulations without failure. The model also exhibits excellent computational scalability and GPU acceleration on the Polaris supercomputer at Argonne Leadership Computing Facility. Such scalable, accurate, fast and robust NNQMD models will likely find broad applications in NNQMD simulations on emerging exaflop/s computers, with a specific example of accounting for nuclear quantum effects in the dynamics of ammonia.  ( 3 min )
    Improving Adversarial Robustness with Hypersphere Embedding and Angular-based Regularizations. (arXiv:2303.08289v1 [cs.LG])
    Adversarial training (AT) methods have been found to be effective against adversarial attacks on deep neural networks. Many variants of AT have been proposed to improve its performance. Pang et al. [1] have recently shown that incorporating hypersphere embedding (HE) into the existing AT procedures enhances robustness. We observe that the existing AT procedures are not designed for the HE framework, and thus fail to adequately learn the angular discriminative information available in the HE framework. In this paper, we propose integrating HE into AT with regularization terms that exploit the rich angular information available in the HE framework. Specifically, our method, termed angular-AT, adds regularization terms to AT that explicitly enforce weight-feature compactness and inter-class separation; all expressed in terms of angular features. Experimental results show that angular-AT further improves adversarial robustness.  ( 2 min )
    Learning From High-Dimensional Cyber-Physical Data Streams for Diagnosing Faults in Smart Grids. (arXiv:2303.08300v1 [cs.LG])
    The performance of fault diagnosis systems is highly affected by data quality in cyber-physical power systems. These systems generate massive amounts of data that overburden the system with excessive computational costs. Another issue is the presence of noise in recorded measurements, which prevents building a precise decision model. Furthermore, the diagnostic model is often provided with a mixture of redundant measurements that may deviate it from learning normal and fault distributions. This paper presents the effect of feature engineering on mitigating the aforementioned challenges in cyber-physical systems. Feature selection and dimensionality reduction methods are combined with decision models to simulate data-driven fault diagnosis in a 118-bus power system. A comparative study is enabled accordingly to compare several advanced techniques in both domains. Dimensionality reduction and feature selection methods are compared both jointly and separately. Finally, experiments are concluded, and a setting is suggested that enhances data quality for fault diagnosis.  ( 2 min )
    Towards a Deep Learning Pain-Level Detection Deployment at UAE for Patient-Centric-Pain Management and Diagnosis Support: Framework and Performance Evaluation. (arXiv:2303.08273v1 [cs.HC])
    The outbreak of the COVID-19 pandemic revealed the criticality of timely intervention in a situation exacerbated by a shortage in medical staff and equipment. Pain-level screening is the initial step toward identifying the severity of patient conditions. Automatic recognition of state and feelings help in identifying patient symptoms to take immediate adequate action and providing a patient-centric medical plan tailored to a patient's state. In this paper, we propose a framework for pain-level detection for deployment in the United Arab Emirates and assess its performance using the most used approaches in the literature. Our results show that a deployment of a pain-level deep learning detection framework is promising in identifying the pain level accurately.  ( 2 min )
    Bayesian Beta-Bernoulli Process Sparse Coding with Deep Neural Networks. (arXiv:2303.08230v1 [cs.LG])
    Several approximate inference methods have been proposed for deep discrete latent variable models. However, non-parametric methods which have previously been successfully employed for classical sparse coding models have largely been unexplored in the context of deep models. We propose a non-parametric iterative algorithm for learning discrete latent representations in such deep models. Additionally, to learn scale invariant discrete features, we propose local data scaling variables. Lastly, to encourage sparsity in our representations, we propose a Beta-Bernoulli process prior on the latent factors. We evaluate our spare coding model coupled with different likelihood models. We evaluate our method across datasets with varying characteristics and compare our results to current amortized approximate inference methods.  ( 2 min )
    Hall effect thruster design via deep neural network for additive manufacturing. (arXiv:2303.08227v1 [cs.LG])
    Hall effect thrusters are one of the most versatile and popular electric propulsion systems for space use. Industry trends towards interplanetary missions arise advances in design development of such propulsion systems. It is understood that correct sizing of discharge channel in Hall effect thruster impact performance greatly. Since the complete physics model of such propulsion system is not yet optimized for fast computations and design iterations, most thrusters are being designed using so-called scaling laws. But this work focuses on rather novel approach, which is outlined less frequently than ordinary scaling design approach in literature. Using deep machine learning it is possible to create predictive performance model, which can be used to effortlessly get design of required hall thruster with required characteristics using way less computational power than design from scratch and way more flexible than usual scaling approach.  ( 2 min )
    A 2-opt Algorithm for Locally Optimal Set Partition Optimization. (arXiv:2303.08219v1 [cs.DS])
    Our research deals with the optimization version of the set partition problem, where the objective is to minimize the absolute difference between the sums of the two disjoint partitions. Although this problem is known to be NP-hard and requires exponential time to solve, we propose a less demanding version of this problem where the goal is to find a locally optimal solution. In our approach, we consider the local optimality in respect to any movement of at most two elements. To accomplish this, we developed an algorithm that can generate a locally optimal solution in at most $O(N^2)$ time and $O(N)$ space. Our algorithm can handle arbitrary input precisions and does not require positive or integer inputs. Hence, it can be applied in various problem scenarios with ease.  ( 2 min )
    Machine Learning Approaches in Agile Manufacturing with Recycled Materials for Sustainability. (arXiv:2303.08291v1 [cs.AI])
    It is important to develop sustainable processes in materials science and manufacturing that are environmentally friendly. AI can play a significant role in decision support here as evident from our earlier research leading to tools developed using our proposed machine learning based approaches. Such tools served the purpose of computational estimation and expert systems. This research addresses environmental sustainability in materials science via decision support in agile manufacturing using recycled and reclaimed materials. It is a safe and responsible way to turn a specific waste stream to value-added products. We propose to use data-driven methods in AI by applying machine learning models for predictive analysis to guide decision support in manufacturing. This includes harnessing artificial neural networks to study parameters affecting heat treatment of materials and impacts on their properties; deep learning via advances such as convolutional neural networks to explore grain size detection; and other classifiers such as Random Forests to analyze phrase fraction detection. Results with all these methods seem promising to embark on further work, e.g. ANN yields accuracy around 90\% for predicting micro-structure development as per quench tempering, a heat treatment process. Future work entails several challenges: investigating various computer vision models (VGG, ResNet etc.) to find optimal accuracy, efficiency and robustness adequate for sustainable processes; creating domain-specific tools using machine learning for decision support in agile manufacturing; and assessing impacts on sustainability with metrics incorporating the appropriate use of recycled materials as well as the effectiveness of developed products. Our work makes impacts on green technology for smart manufacturing, and is motivated by related work in the highly interesting realm of AI for materials science.  ( 3 min )
    Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization. (arXiv:2303.08142v1 [cs.SE])
    Performance optimization is an increasingly challenging but often repetitive task. While each platform has its quirks, the underlying code transformations rely on data movement and computational characteristics that recur across applications. This paper proposes to leverage those similarities by constructing an embedding space for subprograms. The continuous space captures both static and dynamic properties of loop nests via symbolic code analysis and performance profiling, respectively. Performance embeddings enable direct knowledge transfer of performance tuning between applications, which can result from autotuning or tailored improvements. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils. Transfer tuning reduces the search complexity by up to four orders of magnitude and outperforms the MKL library in sparse-dense matrix multiplication. The results exhibit clear correspondences between program characteristics and optimizations, outperforming prior specialized state-of-the-art approaches and generalizing beyond their capabilities.  ( 2 min )
    RODD: Robust Outlier Detection in Data Cubes. (arXiv:2303.08193v1 [cs.DB])
    Data cubes are multidimensional databases, often built from several separate databases, that serve as flexible basis for data analysis. Surprisingly, outlier detection on data cubes has not yet been treated extensively. In this work, we provide the first framework to evaluate robust outlier detection methods in data cubes (RODD). We introduce a novel random forest-based outlier detection approach (RODD-RF) and compare it with more traditional methods based on robust location estimators. We propose a general type of test data and examine all methods in a simulation study. Moreover, we apply ROOD-RF to real world data. The results show that RODD-RF can lead to improved outlier detection.  ( 2 min )
  • Open

    Recurrent Neural Networks and Universal Approximation of Bayesian Filters. (arXiv:2211.00335v2 [stat.ML] UPDATED)
    We consider the Bayesian optimal filtering problem: i.e. estimating some conditional statistics of a latent time-series signal from an observation sequence. Classical approaches often rely on the use of assumed or estimated transition and observation models. Instead, we formulate a generic recurrent neural network framework and seek to learn directly a recursive mapping from observational inputs to the desired estimator statistics. The main focus of this article is the approximation capabilities of this framework. We provide approximation error bounds for filtering in general non-compact domains. We also consider strong time-uniform approximation error bounds that guarantee good long-time performance. We discuss and illustrate a number of practical concerns and implications of these results.
    Borda Regret Minimization for Generalized Linear Dueling Bandits. (arXiv:2303.08816v1 [cs.LG])
    Dueling bandits are widely used to model preferential feedback that is prevalent in machine learning applications such as recommendation systems and ranking. In this paper, we study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score while minimizing the cumulative regret. We propose a new and highly expressive generalized linear dueling bandits model, which covers many existing models. Surprisingly, the Borda regret minimization problem turns out to be difficult, as we prove a regret lower bound of order $\Omega(d^{2/3} T^{2/3})$, where $d$ is the dimension of contextual vectors and $T$ is the time horizon. To attain the lower bound, we propose an explore-then-commit type algorithm, which has a nearly matching regret upper bound $\tilde{O}(d^{2/3} T^{2/3})$. When the number of items/arms $K$ is small, our algorithm can achieve a smaller regret $\tilde{O}( (d \log K)^{1/3} T^{2/3})$ with proper choices of hyperparameters. We also conduct empirical experiments on both synthetic data and a simulated real-world environment, which corroborate our theoretical analysis.
    Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting. (arXiv:2110.03135v3 [cs.LG] UPDATED)
    We show that label noise exists in adversarial training. Such label noise is due to the mismatch between the true label distribution of adversarial examples and the label inherited from clean examples - the true label distribution is distorted by the adversarial perturbation, but is neglected by the common practice that inherits labels from clean examples. Recognizing label noise sheds insights on the prevalence of robust overfitting in adversarial training, and explains its intriguing dependence on perturbation radius and data quality. Also, our label noise perspective aligns well with our observations of the epoch-wise double descent in adversarial training. Guided by our analyses, we proposed a method to automatically calibrate the label to address the label noise and robust overfitting. Our method achieves consistent performance improvements across various models and datasets without introducing new hyper-parameters or additional tuning.
    Generalized Kernel Regularized Least Squares. (arXiv:2209.14355v3 [stat.ML] UPDATED)
    Kernel Regularized Least Squares (KRLS) is a popular method for flexibly estimating models that may have complex relationships between variables. However, its usefulness to many researchers is limited for two reasons. First, existing approaches are inflexible and do not allow KRLS to be combined with theoretically-motivated extensions such as random effects, unregularized fixed effects, or non-Gaussian outcomes. Second, estimation is extremely computationally intensive for even modestly sized datasets. Our paper addresses both concerns by introducing generalized KRLS (gKRLS). We note that KRLS can be re-formulated as a hierarchical model thereby allowing easy inference and modular model construction where KRLS can be used alongside random effects, splines, and unregularized fixed effects. Computationally, we also implement random sketching to dramatically accelerate estimation while incurring a limited penalty in estimation quality. We demonstrate that gKRLS can be fit on datasets with tens of thousands of observations in under one minute. Further, state-of-the-art techniques that require fitting the model over a dozen times (e.g. meta-learners) can be estimated quickly.
    Learning Resilient Radio Resource Management Policies with Graph Neural Networks. (arXiv:2203.11012v2 [eess.SP] UPDATED)
    We consider the problems of user selection and power control in wireless interference networks, comprising multiple access points (APs) communicating with a group of user equipment devices (UEs) over a shared wireless medium. To achieve a high aggregate rate, while ensuring fairness across all users, we formulate a resilient radio resource management (RRM) policy optimization problem with per-user minimum-capacity constraints that adapt to the underlying network conditions via learnable slack variables. We reformulate the problem in the Lagrangian dual domain, and show that we can parameterize the RRM policies using a finite set of parameters, which can be trained alongside the slack and dual variables via an unsupervised primal-dual approach thanks to a provably small duality gap. We use a scalable and permutation-equivariant graph neural network (GNN) architecture to parameterize the RRM policies based on a graph topology derived from the instantaneous channel conditions. Through experimental results, we verify that the minimum-capacity constraints adapt to the underlying network configurations and channel conditions. We further demonstrate that, thanks to such adaptation, our proposed method achieves a superior tradeoff between the average rate and the 5th percentile rate -- a metric that quantifies the level of fairness in the resource allocation decisions -- as compared to baseline algorithms.
    Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model. (arXiv:2303.08613v1 [cs.LG])
    We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf. Such a problem is modeled as a Stackelberg game between the principal and the agent, where the principal announces a scoring rule that specifies the payment, and then the agent then chooses an effort level that maximizes her own profit and reports the information. We study the online setting of such a problem from the principal's perspective, i.e., designing the optimal scoring rule by repeatedly interacting with the strategic agent. We design a provably sample efficient algorithm that tailors the UCB algorithm (Auer et al., 2002) to our model, which achieves a sublinear $T^{2/3}$-regret after $T$ iterations. Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized. Furthermore, a key feature of our regret bound is that it is independent of the number of states of the environment.
    Bayesian Beta-Bernoulli Process Sparse Coding with Deep Neural Networks. (arXiv:2303.08230v1 [cs.LG])
    Several approximate inference methods have been proposed for deep discrete latent variable models. However, non-parametric methods which have previously been successfully employed for classical sparse coding models have largely been unexplored in the context of deep models. We propose a non-parametric iterative algorithm for learning discrete latent representations in such deep models. Additionally, to learn scale invariant discrete features, we propose local data scaling variables. Lastly, to encourage sparsity in our representations, we propose a Beta-Bernoulli process prior on the latent factors. We evaluate our spare coding model coupled with different likelihood models. We evaluate our method across datasets with varying characteristics and compare our results to current amortized approximate inference methods.
    Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data. (arXiv:2303.08242v1 [stat.ML])
    The Internet of Things (IoT) system generates massive high-speed temporally correlated streaming data and is often connected with online inference tasks under computational or energy constraints. Online analysis of these streaming time series data often faces a trade-off between statistical efficiency and computational cost. One important approach to balance this trade-off is sampling, where only a small portion of the sample is selected for the model fitting and update. Motivated by the demands of dynamic relationship analysis of IoT system, we study the data-dependent sample selection and online inference problem for a multi-dimensional streaming time series, aiming to provide low-cost real-time analysis of high-speed power grid electricity consumption data. Inspired by D-optimality criterion in design of experiments, we propose a class of online data reduction methods that achieve an optimal sampling criterion and improve the computational efficiency of the online analysis. We show that the optimal solution amounts to a strategy that is a mixture of Bernoulli sampling and leverage score sampling. The leverage score sampling involves auxiliary estimations that have a computational advantage over recursive least squares updates. Theoretical properties of the auxiliary estimations involved are also discussed. When applied to European power grid consumption data, the proposed leverage score based sampling methods outperform the benchmark sampling method in online estimation and prediction. The general applicability of the sampling-assisted online estimation method is assessed via simulation studies.
    Lost in the Shuffle: Testing Power in the Presence of Errorful Network Vertex Labels. (arXiv:2208.08638v3 [stat.ME] UPDATED)
    Many two-sample network hypothesis testing methodologies operate under the implicit assumption that the vertex correspondence across networks is a priori known. In this paper, we consider the degradation of power in two-sample graph hypothesis testing when there are misaligned/label-shuffled vertices across networks. In the context of random dot product and stochastic block model networks, we theoretically explore the power loss due to shuffling for a pair of hypothesis tests based on Frobenius norm differences between estimated edge probability matrices or between adjacency matrices. The loss in testing power is further reinforced by numerous simulations and experiments, both in the stochastic block model and in the random dot product graph model, where we compare the power loss across multiple recently proposed tests in the literature. Lastly, we demonstrate the impact that shuffling can have in real-data testing in a pair of examples from neuroscience and from social network analysis.
    Gradient Gating for Deep Multi-Rate Learning on Graphs. (arXiv:2210.00513v2 [cs.LG] UPDATED)
    We present Gradient Gating (G$^2$), a novel framework for improving the performance of Graph Neural Networks (GNNs). Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph. Local gradients are harnessed to further modulate message passing updates. Our framework flexibly allows one to use any basic GNN layer as a wrapper around which the multi-rate gradient gating mechanism is built. We rigorously prove that G$^2$ alleviates the oversmoothing problem and allows the design of deep GNNs. Empirical results are presented to demonstrate that the proposed framework achieves state-of-the-art performance on a variety of graph learning tasks, including on large-scale heterophilic graphs.
    Distribution-free Deviation Bounds of Learning via Model Selection with Cross-validation Risk Estimation. (arXiv:2303.08777v1 [stat.ML])
    Cross-validation techniques for risk estimation and model selection are widely used in statistics and machine learning. However, the understanding of the theoretical properties of learning via model selection with cross-validation risk estimation is quite low in face of its widespread use. In this context, this paper presents learning via model selection with cross-validation risk estimation as a general systematic learning framework within classical statistical learning theory and establishes distribution-free deviation bounds in terms of VC dimension, giving detailed proofs of the results and considering both bounded and unbounded loss functions. We also deduce conditions under which the deviation bounds of learning via model selection are tighter than that of learning via empirical risk minimization in the whole hypotheses space, supporting the better performance of model selection frameworks observed empirically in some instances.
    Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer. (arXiv:2303.08622v1 [cs.CV])
    Diffusion models have shown great promise in text-guided image style transfer, but there is a trade-off between style transformation and content preservation due to their stochastic nature. Existing methods require computationally expensive fine-tuning of diffusion models or additional neural network. To address this, here we propose a zero-shot contrastive loss for diffusion models that doesn't require additional fine-tuning or auxiliary networks. By leveraging patch-wise contrastive loss between generated samples and original image embeddings in the pre-trained diffusion model, our method can generate images with the same semantic content as the source image in a zero-shot manner. Our approach outperforms existing methods while preserving content and requiring no additional training, not only for image style transfer but also for image-to-image translation and manipulation. Our experimental results validate the effectiveness of our proposed method.
    Interpretable Ensembles of Hyper-Rectangles as Base Models. (arXiv:2303.08625v1 [cs.LG])
    A new extremely simple ensemble-based model with the uniformly generated axis-parallel hyper-rectangles as base models (HRBM) is proposed. Two types of HRBMs are studied: closed rectangles and corners. The main idea behind HRBM is to consider and count training examples inside and outside each rectangle. It is proposed to incorporate HRBMs into the gradient boosting machine (GBM). Despite simplicity of HRBMs, it turns out that these simple base models allow us to construct effective ensemble-based models and avoid overfitting. A simple method for calculating optimal regularization parameters of the ensemble-based model, which can be modified in the explicit way at each iteration of GBM, is considered. Moreover, a new regularization called the "step height penalty" is studied in addition to the standard L1 and L2 regularizations. An extremely simple approach to the proposed ensemble-based model prediction interpretation by using the well-known method SHAP is proposed. It is shown that GBM with HRBM can be regarded as a model extending a set of interpretable models for explaining black-box models. Numerical experiments with real datasets illustrate the proposed GBM with HRBMs for regression and classification problems. Experiments also illustrate computational efficiency of the proposed SHAP modifications. The code of proposed algorithms implementing GBM with HRBM is publicly available.
    Using Model-Based Trees with Boosting to Fit Low-Order Functional ANOVA Models. (arXiv:2207.06950v3 [stat.ML] UPDATED)
    Low-order functional ANOVA (fANOVA) models have been rediscovered in the machine learning (ML) community under the guise of inherently interpretable machine learning. Explainable Boosting Machines or EBM (Lou et al. 2013) and GAMI-Net (Yang et al. 2021) are two recently proposed ML algorithms for fitting functional main effects and second-order interactions. We propose a new algorithm, called GAMI-Tree, that is similar to EBM, but has a number of features that lead to better performance. It uses model-based trees as base learners and incorporates a new interaction filtering method that is better at capturing the underlying interactions. In addition, our iterative training method converges to a model with better predictive performance, and the embedded purification ensures that interactions are hierarchically orthogonal to main effects. The algorithm does not need extensive tuning, and our implementation is fast and efficient. We use simulated and real datasets to compare the performance and interpretability of GAMI-Tree with EBM and GAMI-Net.
    Robust online active learning. (arXiv:2302.00422v2 [stat.ML] UPDATED)
    In many industrial applications, obtaining labeled observations is not straightforward as it often requires the intervention of human experts or the use of expensive testing equipment. In these circumstances, active learning can be highly beneficial in suggesting the most informative data points to be used when fitting a model. Reducing the number of observations needed for model development alleviates both the computational burden required for training and the operational expenses related to labeling. Online active learning, in particular, is useful in high-volume production processes where the decision about the acquisition of the label for a data point needs to be taken within an extremely short time frame. However, despite the recent efforts to develop online active learning strategies, the behavior of these methods in the presence of outliers has not been thoroughly examined. In this work, we investigate the performance of online active linear regression in contaminated data streams. Our study shows that the currently available query strategies are prone to sample outliers, whose inclusion in the training set eventually degrades the predictive performance of the models. To address this issue, we propose a solution that bounds the search area of a conditional D-optimal algorithm and uses a robust estimator. Our approach strikes a balance between exploring unseen regions of the input space and protecting against outliers. Through numerical simulations, we show that the proposed method is effective in improving the performance of online active learning in the presence of outliers, thus expanding the potential applications of this powerful tool.
    Delay-SDE-net: A deep learning approach for time series modelling with memory and uncertainty estimates. (arXiv:2303.08587v1 [cs.LG])
    To model time series accurately is important within a wide range of fields. As the world is generally too complex to be modelled exactly, it is often meaningful to assess the probability of a dynamical system to be in a specific state. This paper presents the Delay-SDE-net, a neural network model based on stochastic delay differential equations (SDDEs). The use of SDDEs with multiple delays as modelling framework makes it a suitable model for time series with memory effects, as it includes memory through previous states of the system. The stochastic part of the Delay-SDE-net provides a basis for estimating uncertainty in modelling, and is split into two neural networks to account for aleatoric and epistemic uncertainty. The uncertainty is provided instantly, making the model suitable for applications where time is sparse. We derive the theoretical error of the Delay-SDE-net and analyze the convergence rate numerically. At comparisons with similar models, the Delay-SDE-net has consistently the best performance, both in predicting time series values and uncertainties.
    Training Neural Networks for Sequential Change-point Detection. (arXiv:2210.17312v2 [cs.LG] UPDATED)
    Detecting an abrupt distributional shift of a data stream, known as change-point detection, is a fundamental problem in statistics and machine learning. We introduce a novel approach for online change-point detection using neural networks. To be specific, our approach is training neural networks to compute the cumulative sum of a detection statistic sequentially, which exhibits a significant change when a change-point occurs. We demonstrated the superiority and potential of the proposed method in detecting change-point using both synthetic and real-world data.
    Probabilistic Reconciliation of Count Time Series. (arXiv:2207.09322v3 [stat.ME] UPDATED)
    Forecast reconciliation is an important research topic. Yet, there is currently neither formal framework nor practical method for the probabilistic reconciliation of count time series. In this paper we propose a definition of coherency and reconciled probabilistic forecast which applies to both real-valued and count variables and a novel method for probabilistic reconciliation. It is based on a generalization of Bayes' rule and it can reconcile both real-value and count variables. When applied to count variables, it yields a reconciled probability mass function. Our experiments with the temporal reconciliation of count variables show a major forecast improvement compared to the probabilistic Gaussian reconciliation.
    Practicality of generalization guarantees for unsupervised domain adaptation with neural networks. (arXiv:2303.08720v1 [cs.LG])
    Understanding generalization is crucial to confidently engineer and deploy machine learning models, especially when deployment implies a shift in the data domain. For such domain adaptation problems, we seek generalization bounds which are tractably computable and tight. If these desiderata can be reached, the bounds can serve as guarantees for adequate performance in deployment. However, in applications where deep neural networks are the models of choice, deriving results which fulfill these remains an unresolved challenge; most existing bounds are either vacuous or has non-estimable terms, even in favorable conditions. In this work, we evaluate existing bounds from the literature with potential to satisfy our desiderata on domain adaptation image classification tasks, where deep neural networks are preferred. We find that all bounds are vacuous and that sample generalization terms account for much of the observed looseness, especially when these terms interact with measures of domain shift. To overcome this and arrive at the tightest possible results, we combine each bound with recent data-dependent PAC-Bayes analysis, greatly improving the guarantees. We find that, when domain overlap can be assumed, a simple importance weighting extension of previous work provides the tightest estimable bound. Finally, we study which terms dominate the bounds and identify possible directions for further improvement.
    The Benefits of Mixup for Feature Learning. (arXiv:2303.08433v1 [cs.LG])
    Mixup, a simple data augmentation method that randomly mixes two data points via linear interpolation, has been extensively applied in various deep learning applications to gain better generalization. However, the theoretical underpinnings of its efficacy are not yet fully understood. In this paper, we aim to seek a fundamental understanding of the benefits of Mixup. We first show that Mixup using different linear interpolation parameters for features and labels can still achieve similar performance to the standard Mixup. This indicates that the intuitive linearity explanation in Zhang et al., (2018) may not fully explain the success of Mixup. Then we perform a theoretical study of Mixup from the feature learning perspective. We consider a feature-noise data model and show that Mixup training can effectively learn the rare features (appearing in a small fraction of data) from its mixture with the common features (appearing in a large fraction of data). In contrast, standard training can only learn the common features but fails to learn the rare features, thus suffering from bad generalization performance. Moreover, our theoretical analysis also shows that the benefits of Mixup for feature learning are mostly gained in the early training phase, based on which we propose to apply early stopping in Mixup. Experimental results verify our theoretical findings and demonstrate the effectiveness of the early-stopped Mixup training.
    Encoding Domain Knowledge in Multi-view Latent Variable Models: A Bayesian Approach with Structured Sparsity. (arXiv:2204.06242v2 [stat.ML] UPDATED)
    Many real-world systems are described not only by data from a single source but via multiple data views. In genomic medicine, for instance, patients can be characterized by data from different molecular layers. Latent variable models with structured sparsity are a commonly used tool for disentangling variation within and across data views. However, their interpretability is cumbersome since it requires a direct inspection and interpretation of each factor from domain experts. Here, we propose MuVI, a novel multi-view latent variable model based on a modified horseshoe prior for modeling structured sparsity. This facilitates the incorporation of limited and noisy domain knowledge, thereby allowing for an analysis of multi-view data in an inherently explainable manner. We demonstrate that our model (i) outperforms state-of-the-art approaches for modeling structured sparsity in terms of the reconstruction error and the precision/recall, (ii) robustly integrates noisy domain expertise in the form of feature sets, (iii) promotes the identifiability of factors and (iv) infers interpretable and biologically meaningful axes of variation in a real-world multi-view dataset of cancer patients.
    Weisfeiler and Leman go Machine Learning: The Story so far. (arXiv:2112.09992v3 [cs.LG] UPDATED)
    In recent years, algorithms and neural architectures based on the Weisfeiler-Leman algorithm, a well-known heuristic for the graph isomorphism problem, have emerged as a powerful tool for machine learning with graphs and relational data. Here, we give a comprehensive overview of the algorithm's use in a machine-learning setting, focusing on the supervised regime. We discuss the theoretical background, show how to use it for supervised graph and node representation learning, discuss recent extensions, and outline the algorithm's connection to (permutation-)equivariant neural architectures. Moreover, we give an overview of current applications and future directions to stimulate further research.
    Statistical learning on measures: an application to persistence diagrams. (arXiv:2303.08456v1 [cs.CG])
    We consider a binary supervised learning classification problem where instead of having data in a finite-dimensional Euclidean space, we observe measures on a compact space $\mathcal{X}$. Formally, we observe data $D_N = (\mu_1, Y_1), \ldots, (\mu_N, Y_N)$ where $\mu_i$ is a measure on $\mathcal{X}$ and $Y_i$ is a label in $\{0, 1\}$. Given a set $\mathcal{F}$ of base-classifiers on $\mathcal{X}$, we build corresponding classifiers in the space of measures. We provide upper and lower bounds on the Rademacher complexity of this new class of classifiers that can be expressed simply in terms of corresponding quantities for the class $\mathcal{F}$. If the measures $\mu_i$ are uniform over a finite set, this classification task boils down to a multi-instance learning problem. However, our approach allows more flexibility and diversity in the input data we can deal with. While such a framework has many possible applications, this work strongly emphasizes on classifying data via topological descriptors called persistence diagrams. These objects are discrete measures on $\mathbb{R}^2$, where the coordinates of each point correspond to the range of scales at which a topological feature exists. We will present several classifiers on measures and show how they can heuristically and theoretically enable a good classification performance in various settings in the case of persistence diagrams.
    Marginalising over Stationary Kernels with Bayesian Quadrature. (arXiv:2106.07452v3 [stat.ML] UPDATED)
    Marginalising over families of Gaussian Process kernels produces flexible model classes with well-calibrated uncertainty estimates. Existing approaches require likelihood evaluations of many kernels, rendering them prohibitively expensive for larger datasets. We propose a Bayesian Quadrature scheme to make this marginalisation more efficient and thereby more practical. Through use of the maximum mean discrepancies between distributions, we define a kernel over kernels that captures invariances between Spectral Mixture (SM) Kernels. Kernel samples are selected by generalising an information-theoretic acquisition function for warped Bayesian Quadrature. We show that our framework achieves more accurate predictions with better calibrated uncertainty than state-of-the-art baselines, especially when given limited (wall-clock) time budgets.
    Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators. (arXiv:2303.08431v1 [cs.LG])
    Nonlinear control systems with partial information to the decision maker are prevalent in a variety of applications. As a step toward studying such nonlinear systems, this work explores reinforcement learning methods for finding the optimal policy in the nearly linear-quadratic regulator systems. In particular, we consider a dynamic system that combines linear and nonlinear components, and is governed by a policy with the same structure. Assuming that the nonlinear component comprises kernels with small Lipschitz coefficients, we characterize the optimization landscape of the cost function. Although the cost function is nonconvex in general, we establish the local strong convexity and smoothness in the vicinity of the global optimizer. Additionally, we propose an initialization mechanism to leverage these properties. Building on the developments, we design a policy gradient algorithm that is guaranteed to converge to the globally optimal policy with a linear rate.
    Online Active Learning for Soft Sensor Development using Semi-Supervised Autoencoders. (arXiv:2212.13067v2 [cs.LG] UPDATED)
    Data-driven soft sensors are extensively used in industrial and chemical processes to predict hard-to-measure process variables whose real value is difficult to track during routine operations. The regression models used by these sensors often require a large number of labeled examples, yet obtaining the label information can be very expensive given the high time and cost required by quality inspections. In this context, active learning methods can be highly beneficial as they can suggest the most informative labels to query. However, most of the active learning strategies proposed for regression focus on the offline setting. In this work, we adapt some of these approaches to the stream-based scenario and show how they can be used to select the most informative data points. We also demonstrate how to use a semi-supervised architecture based on orthogonal autoencoders to learn salient features in a lower dimensional space. The Tennessee Eastman Process is used to compare the predictive performance of the proposed approaches.
    Learning to Reconstruct Signals From Binary Measurements. (arXiv:2303.08691v1 [eess.SP])
    Recent advances in unsupervised learning have highlighted the possibility of learning to reconstruct signals from noisy and incomplete linear measurements alone. These methods play a key role in medical and scientific imaging and sensing, where ground truth data is often scarce or difficult to obtain. However, in practice, measurements are not only noisy and incomplete but also quantized. Here we explore the extreme case of learning from binary observations and provide necessary and sufficient conditions on the number of measurements required for identifying a set of signals from incomplete binary data. Our results are complementary to existing bounds on signal recovery from binary measurements. Furthermore, we introduce a novel self-supervised learning approach, which we name SSBM, that only requires binary data for training. We demonstrate in a series of experiments with real datasets that SSBM performs on par with supervised learning and outperforms sparse reconstruction methods with a fixed wavelet basis by a large margin.
    Predicting Individualized Effects of Internet-Based Treatment for Genito-Pelvic Pain/Penetration Disorder: Development and Internal Validation of a Multivariable Decision Tree Model. (arXiv:2303.08732v1 [stat.AP])
    Genito-Pelvic Pain/Penetration-Disorder (GPPPD) is a common disorder but rarely treated in routine care. Previous research documents that GPPPD symptoms can be treated effectively using internet-based psychological interventions. However, non-response remains common for all state-of-the-art treatments and it is unclear which patient groups are expected to benefit most from an internet-based intervention. Multivariable prediction models are increasingly used to identify predictors of heterogeneous treatment effects, and to allocate treatments with the greatest expected benefits. In this study, we developed and internally validated a multivariable decision tree model that predicts effects of an internet-based treatment on a multidimensional composite score of GPPPD symptoms. Data of a randomized controlled trial comparing the internet-based intervention to a waitlist control group (N =200) was used to develop a decision tree model using model-based recursive partitioning. Model performance was assessed by examining the apparent and bootstrap bias-corrected performance. The final pruned decision tree consisted of one splitting variable, joint dyadic coping, based on which two response clusters emerged. No effect was found for patients with low dyadic coping ($n$=33; $d$=0.12; 95% CI: -0.57-0.80), while large effects ($d$=1.00; 95%CI: 0.68-1.32; $n$=167) are predicted for those with high dyadic coping at baseline. The bootstrap-bias-corrected performance of the model was $R^2$=27.74% (RMSE=13.22).
    Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality. (arXiv:2212.09900v2 [cs.LG] UPDATED)
    This paper studies offline policy learning, which aims at utilizing observations collected a priori (from either fixed or adaptively evolving behavior policies) to learn the optimal individualized decision rule in a given class. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics are lower bounded in the offline dataset. In other words, the performance of these methods depends on the worst-case propensity in the offline dataset. As one has no control over the data collection process, this assumption can be unrealistic in many situations, especially when the behavior policies are allowed to evolve over time with diminishing propensities. In this paper, we propose a new algorithm that optimizes lower confidence bounds (LCBs) -- instead of point estimates -- of the policy values. The LCBs are constructed by quantifying the estimation uncertainty of the augmented inverse propensity weighted (AIPW)-type estimators using knowledge of the behavior policies for collecting the offline data. Without assuming any uniform overlap condition, we establish a data-dependent upper bound for the suboptimality of our algorithm, which depends only on (i) the overlap for the optimal policy, and (ii) the complexity of the policy class. As an implication, for adaptively collected data, we ensure efficient policy learning as long as the propensities for optimal actions are lower bounded over time, while those for suboptimal ones are allowed to diminish arbitrarily fast. In our theoretical analysis, we develop a new self-normalized concentration inequality for IPW estimators, generalizing the well-known empirical Bernstein's inequality to unbounded and non-i.i.d. data.
    Understanding Post-hoc Explainers: The Case of Anchors. (arXiv:2303.08806v1 [stat.ML])
    In many scenarios, the interpretability of machine learning models is a highly required but difficult task. To explain the individual predictions of such models, local model-agnostic approaches have been proposed. However, the process generating the explanations can be, for a user, as mysterious as the prediction to be explained. Furthermore, interpretability methods frequently lack theoretical guarantees, and their behavior on simple models is frequently unknown. While it is difficult, if not impossible, to ensure that an explainer behaves as expected on a cutting-edge model, we can at least ensure that everything works on simple, already interpretable models. In this paper, we present a theoretical analysis of Anchors (Ribeiro et al., 2018): a popular rule-based interpretability method that highlights a small set of words to explain a text classifier's decision. After formalizing its algorithm and providing useful insights, we demonstrate mathematically that Anchors produces meaningful results when used with linear text classifiers on top of a TF-IDF vectorization. We believe that our analysis framework can aid in the development of new explainability methods based on solid theoretical foundations.

  • Open

    [D]What's the best prompt to image generator out there?
    I was thinking the answer was dall-e 2 but I am not sure anymore. What's the best online text-to-image generator? submitted by /u/Periplokos [link] [comments]  ( 42 min )
    [D] Any other ICML reviewers noticing strange scores for the papers they're assigned to?
    I'm reviewing 4 papers, of which I gave one a very positive review. I am the only negative reviewer for 3/4 of the papers I am reviewing. Most of the papers have short, glowing positive reviews that don't meaningfully engage with the paper at all. At least two of the papers have bizarre formatting problems like blurry figures with unreadable text (not publication quality) that don't pass the eye test. A similar thing happened at ICLR reviews this year, and the authors withdrew their papers in spite of having 2x very positive reviews and 1x slightly negative review (mine). No attempt at rebuttal. Has anybody else experienced this? submitted by /u/GinoAcknowledges [link] [comments]  ( 44 min )
    [D] Our community must get serious about opposing OpenAI
    OpenAI was founded for the explicit purpose of democratizing access to AI and acting as a counterbalance to the closed off world of big tech by developing open source tools. They have abandoned this idea entirely. Today, with the release of GPT4 and their direct statement that they will not release details of the model creation due to "safety concerns" and the competitive environment, they have created a precedent worse than those that existed before they entered the field. We're at risk now of other major players, who previously at least published their work and contributed to open source tools, close themselves off as well. AI alignment is a serious issue that we definitely have not solved. Its a huge field with a dizzying array of ideas, beliefs and approaches. We're talking about trying to capture the interests and goals of all humanity, after all. In this space, the one approach that is horrifying (and the one that OpenAI was LITERALLY created to prevent) is a singular or oligarchy of for profit corporations making this decision for us. This is exactly what OpenAI plans to do. I get it, GPT4 is incredible. However, we are talking about the single most transformative technology and societal change that humanity has ever made. It needs to be for everyone or else the average person is going to be left behind. We need to unify around open source development; choose companies that contribute to science, and condemn the ones that don't. This conversation will only ever get more important. submitted by /u/SOCSChamp [link] [comments]  ( 58 min )
    [N] Mozilla launched a responsible AI challenge and I'm stoked about it
    who's applying and what are you planning to build??? https://www.axios.com/2023/03/15/mozilla-responsible-ai-challenge submitted by /u/joodfish [link] [comments]  ( 43 min )
    [D] GPT-3 will ignore tools when it disagrees with them
    https://vgel.me/posts/tools-not-needed/ submitted by /u/MysteryInc152 [link] [comments]  ( 43 min )
    [N] PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever
    Preview of the post since it's dropping in a few hours: https://deploy-preview-1313--pytorch-dot-org-preview.netlify.app/blog/pytorch-2.0-release/ Also a post about Accelerated Diffusers with 2.0: https://deploy-preview-1315--pytorch-dot-org-preview.netlify.app/blog/accelerated-diffusers-pt-20/ GPT Summary: PyTorch 2.0 is a next generation release that offers faster performance and support for dynamic shapes and distributed training using torch.compile as the main API. PyTorch 2.0 also includes a stable version of Accelerated Transformers, which use custom kernels for scaled dot product attention and are integrated with torch.compile. Other beta features include PyTorch MPS Backend for GPU-accelerated training on Mac platforms, functorch APIs in the torch.func module, and AWS Graviton3 optimization for CPU inference. The release also includes prototype features and technologies across TensorParallel, DTensor, 2D parallel, TorchDynamo, AOTAutograd, PrimTorch and TorchInductor. submitted by /u/m____ke [link] [comments]  ( 43 min )
    [D] Closed Domain Chat-GPT / LLM wrangling
    What's the status quo on finetuning LLMs or otherwise controlling them for closed-domain interactions? Is it basically a) prompt engineering to give a LLM a personality and mission statement to only respond according to their given closed-domain status or b) Not prompt engineering. Finetuning and backing with domain-specific knowledge graphs, etc, , such that the closed-domain LLM truly knows its trained domain and won't start giving general world knowledge? ​ Whats the latest? submitted by /u/ZestyData [link] [comments]  ( 43 min )
    [D] Is there an expectation that epochs/learning rates should be kept the same between benchmark experiments?
    I've found that by dramatically lowering the LR and increasing the number of epochs, very simple, baseline models can outperform SoTA models which use far more parameters. Is this considered "cheating" when comparing models? Is this something interesting enough to warrant a short paper? I'm not sure what to do with this information. For example, in the original VGAE paper, when training a GAE, they use a LR of 0.01, and train for 200 epochs to get 0.91 AUC, 0.92 AP on a link prediction experiment. Rerunning the same experiment with a LR of 5e-5 for 1500 epochs gets 0.97 AUC, 0.97 AP which is better than the current leader on papers with code for this dataset. It needs more epochs but has way, way fewer parameters than SoTA models, is this a valid trade-off? Is this even a fair comparison? submitted by /u/TheWittyScreenName [link] [comments]  ( 49 min )
    [R] ConvNextV2
    Hello, I was reading the Convnext2 paper. Apparently they added what they call a global normalization layer to encourage features diversity. I understand the equations but I fail to understand how it encourages features diversity. If anyone have any clue I will grateful. Thanks ! submitted by /u/Meddhouib10 [link] [comments]  ( 45 min )
    [D] Alternatives to Mediapipe's FaceMesh for 3D Face Reconstruction
    Hi there, Currently, I am using mediapipe for FaceMesh, which has decent reliability and is easy to setup in Python. However, I recently discovered Microsoft Research's "3D Face Reconstruction with Dense Landmarks" paper, which appears to be a much better alternative. Does anyone know where I can access Microsoft DenseLandmarks or an equally good alternative? submitted by /u/pakonsy [link] [comments]  ( 43 min )
    [D] To those of you who quit machine learning, what do you do now?
    I'm currently doing my master's degree and have been set on a DL-related career for a while. But recently I noticed it doesn't bring me joy. Coming up with architectures that randomly work/don't work, tuning parameters, waiting for days till the model is trained... the level of uncertainty is just too high for me. Because of that, I don't feel productive working on it and I'm slowly considering switching to another IT field. For those of you who quit machine learning (especially deep learning): What did you switch to? Are you satisfied with your new job? (Is it stressful/intellectually challenging? Is it possible to keep it 9-5?) How to ensure a smooth transition to that field? Thanks in advance! ___ PS I know machine learning isn't all about deep learning, but in my current subfield (computer vision), mostly deep learning is used. submitted by /u/nopainnogain5 [link] [comments]  ( 51 min )
    [P] 24 Fugues (music) in the style of J.S. Bach. Completely generated by a BERT inspired transformer model.
    Here are the samples. My favourite is this one! Which one is your favourite? These samples are the product of a transformer (encoder) model trained on only 3 hours of music. Each sample is seeded by the first four bars of a real piece of music. These are the final samples before I completely overhaul the pre-training stage. The idea is to go from about 2-hours of midi to over 500 hours. I'm very excited to see how this effects the sample quality. If anyone in interesting in following the project. Star the GitHub and follow me on Twitter. submitted by /u/ustainbolt [link] [comments]  ( 43 min )
    [Discussion] What happened to r/NaturalLanguageProcessing ?
    It seems to be locked right now. Was there brigading or sth of that fashion? submitted by /u/MadNietzsche [link] [comments]  ( 43 min )
    [D] When to expect announcement of accepted workshops for IJCAI?
    According to their schedule, IJCAI has sent acceptance notification to workshops organizers at March 6th. When should we expect that the accepted workshop list will be available? submitted by /u/GratisSlagroom [link] [comments]  ( 42 min )
    [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them?
    It seems like the days for open research in AI are gone. Also, since one of the main reasons they say about not releasing any details is competetive pressure (aka commercial interest), I feel it is fair for others to enforce their patents just like in other fields like pharma? I am very interested in the counter arguments and understanding around this. submitted by /u/NoScallion2450 [link] [comments]  ( 7 min )
    [R] Is there any NeRF labelling standard?
    I recently decided to introduce NeRFs in my research. So far, I got a theoretical understanding, but I am struggling to get an overview of the implementation side. I checked multiple implementations and realised there is no standard labelling format. Here, the labels are the camera parameters of each image required by NeRF. I have seen a tendency to use a .json file typically called "transforms". ​ I am mentioning this because I am creating a NeRF dataset of synthetic objects, and I found myself slightly lost when creating labels for the available implementations. I thought of getting some advice from you guys. ​ So far, I have the multiview images and the camera parameters of each image in multiple txt files, each with the camera parameters of the respective image. My initial idea was to write a code to transform these .txt files into json files with the same format as the "transforms" files. Before, I would like to ask your opinion about the "transforms" format. Also, I would like to ask you if you are aware of an alternative method/code/repo to generate these NeRF labels. ​ Thank you so much beforehand, submitted by /u/aiazar [link] [comments]  ( 44 min )
    [D] Testing regimes for Data Science & MLOps code
    Hi folks. I'm looking into best practice for serving models in production environments (e.g. MLOps or equivalent buzz-term). We want to include testing for our models as part of a CI/CD pipeline, but are at a bit of a loss as to what this should look like. I've seen plenty of blog posts on using different packages to carry out testing, but remain unsure what the tests should look like / actually be testing for. For example, say I have a model that takes in a text string, uses NER to identify key entities, and redacts a subset of those entities (in this case, for removal of Personally Identifiable Information) before returning the redacted text string - what would a robust unit test look like for such a model? Would it be suitable to just test that text in gives text out, or do we need to be testing that the expected entities are redacted, or are both tests required? Any tips on standard testing approaches for different problems would be appreciated, like regression problems, computer vision and NLP. submitted by /u/alex_0528 [link] [comments]  ( 43 min )
    [D] ChatGPT Plus waitlist
    I was surprised to find that ChatGPT plus (currently the only way to test a vanilla GPT-4 model) is not only behind a pay wall, it is also behind a "wait wall"! Has anyone played with GPT-4 yet? Is it as good as the paper suggests? Anyone got any idea how long the wait list is for access? submitted by /u/blabboy [link] [comments]  ( 44 min )
    [D] Vegetarian Wolves and Stochastic Parrots: The Future of Prompt Engineering with GPT-4?
    In today's announcement on Hacker News I saw an incredulous comment pointing out GPT-4's failure to solve variations of the wolf, goat, and cabbage problem, using this to dismiss it as anything more than a stochastic parrot. But in my own experience with GPT-4 though Bing chat, I'm constantly being reminded of Li et al Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task (2023). So I tried a variation of this puzzle with a vegetarian wolf and a meat-eating goat. It absolutely did mess up generating an answer, but it also appeared to be able to identify where it was making mistakes under Socratic follow up questioning. It just couldn't get the solution out, and I knew there was a way to help engineer it out of this rut if only I could break the predictiv…  ( 50 min )
    [D] GPT-4 Speculation
    Hi, Since GPT-4 paper does not contain any information about architectures/parameters, as a research or ML practitioner, I want to speculate on what they did to increase the context window to 32k. Because for the type of work I do, a 4k or 8k token limit is pretty much useless. I have seen open-source efforts focused more on matching the number of parameters and quality to the closed-source ones but completely ignoring a giant elephant in the room, i.e., the context window. No OSS model has a context window greater than 2k tokens. I would love to hear more thoughts on the model size (my guess is ~50 B) and how they fit 32k tokens in 8xH100 (640 GB total) GPUs. submitted by /u/super_deap [link] [comments]  ( 46 min )
    [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group?
    We have a 20-40k budget at our lab and we are interested in training LLMs on data that is protected by HIPAA which puts restrictions on using just any cloud provider. We'd need a compute environment with 256gb vram. Would it be better to use AWS EC2 P3 instances or Google Cloud instead of trying to build our own server for this? We could spend the budget on a local server, but would this be obsolete within 2 years once the next gen GPUs are released? submitted by /u/PK_thundr [link] [comments]  ( 45 min )
    [D] Anyone else witnessing a panic inside NLP orgs of big tech companies?
    I'm in a big tech company working along side a science team for a product you've all probably used. We have these year long initiatives to productionalize "state of the art NLP models" that are now completely obsolete in the face of GPT-4. I think at first the science orgs were quiet/in denial. But now it's very obvious we are basically working on worthless technology. And by "we", I mean a large organization with scores of teams. Anyone else seeing this? What is the long term effect on science careers that get disrupted like this? Whats even more odd is the ego's of some of these science people Clearly the model is not a catch all, but still submitted by /u/thrwsitaway4321 [link] [comments]  ( 67 min )
    [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live)
    Baidu will unveil its conversational AI ERNIE Bot, powered by Baidu's in-house LLMs, on March 16. The ERNIE LLM was first proposed as a language understanding model in 2019 and evolved to ERNIE 3.0 Titan with 260 billion parameters. ERNIE 1.0: https://arxiv.org/abs/1904.09223 ERNIE 2.0: https://arxiv.org/abs/1907.12412 ERNIE 3.0: https://arxiv.org/abs/2112.12731 ERNIE for text-to-image: https://arxiv.org/abs/2210.15257 ERNIE Bot live-stream on YouTube: https://www.youtube.com/watch?v=ukvEUI3x0vI submitted by /u/kizumada [link] [comments]  ( 43 min )
    [D]Title: My Discovery of the Fourier Multi-Attention Mechanism
    Hey everyone, I wanted to share an exciting discovery I made while working on my deep neural network. I've successfully implemented a novel approach called the Fourier multi-attention mechanism in my model, and it has shown remarkable results. In case you're not familiar with it, the Fourier multi-attention mechanism combines Fourier transform and self-attention mechanisms to create a more efficient and accurate deep neural network. This approach enables the model to process and analyze data in a hierarchical way, making it much more powerful than traditional neural networks. Through my experimentation and research, I've found that the Fourier multi-attention mechanism greatly enhances the performance of the deep neural network, and can significantly reduce the computational costs of processing large amounts of data. What makes my discovery even more exciting is that it's a novel approach that has not been widely used in the field of artificial intelligence. I'm proud to have made this discovery and to be able to incorporate it into my deep neural network. With this new approach, I believe we can make significant strides in the development of more efficient and powerful AI systems. It's exciting to be a part of this groundbreaking technology, and I look forward to seeing where it takes us in the future. Thanks for reading! submitted by /u/Professional-Top854 [link] [comments]  ( 43 min )
  • Open

    What are some of the best AI image upscaling tool/websites do you use?
    I'm looking for recommendations on AI image upscaling tools or websites that you have personally used and found effective. I find myself working with low-resolution photos and I'm looking for a way to enhance their quality without losing too much detail. I did try to Google the answer to this question, but the last time I found someone asking was over 10 months ago. With new AI tools like GPT4 rapidly coming out, I'm wondering if there are better options available now. submitted by /u/yvgh233 [link] [comments]  ( 41 min )
    Turning drawings into images with the Visuali Editor
    submitted by /u/aigeneration [link] [comments]  ( 41 min )
    With all eyes on AI this year and the rate of progress, is Singularity an inevitability within our life time?
    Looking back in history, we've made predictions in the face of technology trends and high optimism before, only to be disappointed and wrong many times. With the rapid pace and progress of AI, will we see AGI and Singularity event in our life time? I did write an in-depth article on this topic and do believe we will see it sooner than most expect and agree with Kurzweil's predictions with his support evidence in technology trends. I recently listened to a conversation between Naval Ravikant and David Deustch on AI and other topics, and Deustch's perspective is we're not on the right direction of AGI, since AGI would be to learn and develop knowledge of things without the need of supervised learning and rules and information that it wasn't trained or programmed on. Human's also need supervised learning from other humans. I think we'll get there soon but perhaps he's right that supervised learning may only go so far. Deustch also had an interesting conversation with Sam Harris about AI concerns and he thinks they're overblown and we'll have time to work with AGI, to help mitigate any AI apocalyptic risks. With more insight into GPT-4's capabilities, it's hard to say we're not headed in the right direction overall. submitted by /u/dangtheory [link] [comments]  ( 42 min )
    Bing Chatgpt hides Uighur gencocide info
    submitted by /u/Hytsol [link] [comments]  ( 41 min )
    I built a blog where people write prompts and an AI writes a post based on it. Help me proposing more prompts!
    submitted by /u/JaviFesser [link] [comments]  ( 41 min )
    Eyeball – The AI System Changing Soccer Scouting
    submitted by /u/webmanpt [link] [comments]  ( 41 min )
    Text to speech rappers
    submitted by /u/DANGERD0OM [link] [comments]  ( 6 min )
    Microsoft lays off its entire AI Ethics and Society team
    submitted by /u/Prunestand [link] [comments]  ( 41 min )
    Live Wallpaper 4K - Wonderful EPIC Jungle Landscape (AI Dream 172)
    submitted by /u/LordPewPew777 [link] [comments]  ( 41 min )
    Drawed this in paint.
    submitted by /u/Salt-Entertainer3777 [link] [comments]  ( 41 min )
    OpenChatKit: Open-source competition for ChatGPT?
    submitted by /u/Number_5_alive [link] [comments]  ( 41 min )
    First Sale on Site that shows Chronicles by AI
    Got my first sale on a new website i launched! Launched it one week ago so really happy to have already gotten a sale. I post stories written by AI and update them daily. Also have a merch store and you can order custom framed stories (chronicles). please don't be too harsh on the idea lol, its my first ever site. submitted by /u/Sufficient-Yellow461 [link] [comments]  ( 41 min )
    AI image recognition software
    I am looking for a free AI software that can recognize an image I upload, and everytime it sees the image in a selected part of the screen, it clicks on another location on the screen. For example: https://preview.redd.it/6mk0ztpb0yna1.png?width=471&format=png&auto=webp&s=a60279cc9a678e21daa4521272f3ae129b857477 If the AI recognizes the images in the selected yellow rectangle, it will press specific images in the orange box and click done.(I'm aware the orange box would have to be dissected into multiple ones for each image) submitted by /u/fafanna1 [link] [comments]  ( 41 min )
    Artificial Intelligence Helps Detect Asbestos on Residential Buildings
    submitted by /u/webmanpt [link] [comments]  ( 6 min )
    Are we working for free for AI companies?
    submitted by /u/israelavila [link] [comments]  ( 41 min )
    AI to summarize research Articles
    I have been trying to get chat GPT to summarize research articles but i've run into some obstacles. When I provide a link to the free articles whether it is HTML or PDF url, it creates a summary on an entirely different article. When I give it the full title article, I get a little closer, but the data does not match the article, the name of the journal is incorrect and the authors are incorrect. Although the gist of the article is accurate. I was trying to see if there was I way I could upload the pdf myself for it to be analyzed, but haven't found a way to do that. ​ Any suggestions? Thank you submitted by /u/ARIandOtis [link] [comments]  ( 42 min )
    Karpathy says GPT-4 solves his "state of computer vision" problem
    submitted by /u/npsedhain [link] [comments]  ( 42 min )
    GPT-4 shows emergent Theory of Mind on par with an adult. It scored in the 85+ percentile for a lot of major college exams. It can also do taxes and create functional websites from a simple drawing
    submitted by /u/lostlifon [link] [comments]  ( 47 min )
    Google launches PaLM API and generative AI for cloud
    submitted by /u/Peaking_AI [link] [comments]  ( 41 min )
    Microsoft lays off team that taught employees how to make AI tools responsibly
    submitted by /u/vjmde [link] [comments]  ( 41 min )
    Boost Your Grades with Caktus AI - A Must Have Tool for Students
    submitted by /u/webmanpt [link] [comments]  ( 41 min )
    GPT-4 released today. Here’s what was in the demo
    Here’s what it did in a 20 minute demo created a discord bot in seconds live debugged errors and read the entire documentation Explained images very well Proceeded to create a functioning website prototype from a hand drawn image Using the api also gives you 32k tokens which means every time you tell it something, you can feed it roughly 100 pages of text. The fact that ChatGPT released just 4 months ago and now we’re here is insane. Try it here submitted by /u/lostlifon [link] [comments]  ( 41 min )
    Ideas to reverse engineer sourcing / footnotes with AI.
    In the early days of the internet, I had a blog I had written a few =on somewhat obscure professional athletes based on stuff I had heard or picked up during my fandom. I probably had five readers, so I never footnoted or attributed properly...It has occurred to me that this could be a good book. Can anyone suggest a methodology for using AI to find attributions / fact check existing writing? submitted by /u/bbcard1 [link] [comments]  ( 41 min )
    Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live)
    Baidu will unveil its conversational AI ERNIE Bot, powered by Baidu's in-house LLMs, on March 16. The ERNIE LLM was first proposed as a language understanding model in 2019 and evolved to ERNIE 3.0 Titan with 260 billion parameters. ERNIE 1.0: Enhanced Representation through Knowledge Integration https://arxiv.org/abs/1904.09223 ERNIE 2.0: A Continual Pre-training Framework for Language Understanding https://arxiv.org/abs/1907.12412 ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation https://arxiv.org/abs/2112.12731 ERNIE for text-to-image: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts https://arxiv.org/abs/2210.15257 ERNIE Bot live-stream on YouTube: https://www.youtube.com/watch?v=ukvEUI3x0vI submitted by /u/kizumada [link] [comments]  ( 41 min )
    GPTMinusOne - AI that hides the use of ChatGPT and GPT4
    submitted by /u/tomd_96 [link] [comments]  ( 6 min )
    GPT-4 Has Arrived — Here’s What You Should Know
    submitted by /u/arnolds112 [link] [comments]  ( 41 min )
  • Open

    Step by step tutorial to understand multi-armed bandit
    Hi All, Can anbody please point me to any guided and hands on tutorial (with some Python code involved ) that will help me understand multi-armed bandit better? Something which is simpler to implement than Andrej Karpathy's Pong from pixels by yet touches upon the core concepts. ​ Thanks, Sau submitted by /u/Sau001 [link] [comments]  ( 41 min )
    CrowdPlay - a platform for recording human interactions with RL environments for offline reinforcement learning - has now joined the Farama Foundation
    submitted by /u/jkterry1 [link] [comments]  ( 41 min )
    Ideas for MARL project?
    I a Msc. student of Data Science with a mix of basic and intermediate knowledge about RL. This semester I am taking the RL course offered by my faculty, which requires me to develop a project based on RL algorithms throughout the semester. The project doesn't require to be deployed on hardware such as sensors or Arduino, which is a relief as my major is in Economics and I do not have the skills to work with circuits and cables. Finally, I would very much like to work on a project based on a MARL approach. Feel free to leave any ideas! submitted by /u/stinoco [link] [comments]  ( 41 min )
    Hyperparameter tuning and evaluation protocol for Atari
    What’s the usual protocol for hyperparameter tuning in Atari? Is it something like: pick different hyperparam choices, run them for 5 random seeds, pick the best average.Or, do the above, pick the best average hyperparam choice and run that again on newer choice of 5 random seeds? What’s the common practice? submitted by /u/FixOpening [link] [comments]  ( 41 min )
    RL people in the industry
    I am a Ph.D. student who wants to go into industry after graduation. If got an RL job, could you please share anything about your work? e.g., your daily routine, required skills, and maybe salary. submitted by /u/Blasphemer666 [link] [comments]  ( 6 min )
    A Simple AlphaZero Implementation
    Hello everyone, I'd like to show you a "working AlphaZero implementation that's simple enough to be able to understand what's going on at a quick glance, without sacrificing too much." Link: https://github.com/scascin0/alphazero submitted by /u/ayan0k0ji [link] [comments]  ( 41 min )
    Guiding exploration with clustering algorithms
    EDIT: The title is wrong. I have another idea that uses clustering algorithms but I have not made a post about it yet. This title should be "Guiding exploration using surprise in environment models" ​ Hello, this is the sequel to my other post. I'm looking for the same general kind of feedback on whether this sounds interesting and if it has been tried already. I was thinking of ways to sample a state/action space more intelligently than just random sampling during the earliest interactions of an RL agent, before it has gathered enough data to meaningfully shape its policy. This would be especially valuable in extremely large environments and even more so if rewards are rather sparse. For example, it occurred to me that even in a simple environment like a gridworld, if you up the size …  ( 53 min )
    PPO Discrete converges to choosing the same action always
    I have an environment where the agent chooses a point from each of two fixed-size grids of size 15x15. I used a MultiDiscrete action space for the same. Some of the points are infeasible (hence I can either give negative rewards or mask the infeasible actions). However, after prolonged training, the agent seems to have converged to a policy that chooses the same action at every state, sometimes even the infeasible masked action. I have also noticed a similar behaviour without the masked action on discrete action space. What could be the reason for this? FYI: I have been using the Stable-Baselines-3 package along with MaskablePPO of sb3-contrib for action masking. Some hyper-param info: ``` "n_steps": 256,"batch_size": int((cfg['environment']['num_envs']*256)/2),"gamma": 0.995,"learning_rate": 5e-4,"ent_coef": 0.001,"clip_range": 0.2,"n_epochs": 5,"gae_lambda": 0.96,"max_grad_norm": 0.9,"vf_coef": 0.5, net_arch = {"large": [dict(pi=[512, 256, 128], vf=[512, 256, 128])]} ``` Here are some of the training plots for your reference, just in case: https://preview.redd.it/cqg2orhqsvna1.png?width=1544&format=png&auto=webp&s=27bd0412eb9ac43479728125fa3eb4dabf9d23a7 Thanks in advance. submitted by /u/Latter_Bid3254 [link] [comments]  ( 7 min )
    RL Algorithm Idea Using a modified Q-value estimation network
    Hello, I am a layman in this field with no academic experience but I have a deep passion for it. I have had two ideas recently that I am working on implementing right now. It is somewhat slow-going because I am fairly inexperienced. I want to describe the ideas here to A) get general feedback on whether they sound promising or even just interesting, B) find out from anyone if similar approaches have been tried, C) potentially spark a conversation with someone who might have some interest in collaborating to implement and test these ideas. Anyway, here goes for my first idea. This one is for action choice in environment with continuous action spaces. You would train a network that accepts both state and action choice as inputs, and it outputs: not the estimated Q-values of each availabl…  ( 45 min )
    Do RL models overfit? If so, how to detect such a case?
    Hi! I am training a Double-DQN (with a soft policy update (similar to DDPG)). The implementation is similar to this post. I initially trained such a model for about 5k iterations and got a decent performance from that checkpoint. Later, I trained it for 50k iterations and got a model that does well in most cases, but sometimes, its performance is worse than the 5k model. (This is a qualitative analysis). How would one know when to stop training, and which checkpoint to go for? Thanks in advance for any suggestions/comments! submitted by /u/High-Free-Energy [link] [comments]  ( 44 min )
    I am looking for advice about a replay buffer in DQN.
    Hi everyone. I'm a newbie to reinforcement learning. I'm writing to ask a question about replay buffers while studying basic DQN. I understand that the replay buffer stores tuples like State-Action, etc. Are these tuples only stored during training, or are they also stored during the evaluation? ​ Thanks in advance for your help. submitted by /u/ComfortablePaint3223 [link] [comments]  ( 42 min )
  • Open

    PepsiCo Leads in AI-Powered Automation With KoiVision Platform
    Global leader in convenient foods and beverages PepsiCo is deploying advanced machine vision technology from startup KoiReader Technologies, powered by the NVIDIA AI platform and GPUs, to improve efficiency and accuracy in its distribution process. PepsiCo has identified KoiReader’s technology as a solution to enable greater efficiency in reading warehouse labels. This AI-powered innovation helps Read article >  ( 5 min )
    Apple of My AI: Startup Sprouts Multitasking Farm Tool for Organics
    It all started with two software engineers and a tomato farmer on a West Coast road trip. Visiting farms to survey their needs, the three hatched a plan at an apple orchard: build a highly adaptable 3D vision AI system for automating field tasks. Verdant, based in the San Francisco Bay Area, is developing AI Read article >  ( 7 min )
  • Open

    Bring legacy machine learning code into Amazon SageMaker using AWS Step Functions
    Tens of thousands of AWS customers use AWS machine learning (ML) services to accelerate their ML development with fully managed infrastructure and tools. For customers who have been developing ML models on premises, such as their local desktop, they want to migrate their legacy ML models to the AWS Cloud to fully take advantage of […]  ( 11 min )
  • Open

    Do we really understand why a network is working?
    Given some working image classificator. Do we really understand why, e.g. its architecture worked better than others and how exactly it classifies the images? I always felt (and heard) that building neural networks is a mix between science and black arts. Is this still somewhat true or kind of outdated? Thanks for any help submitted by /u/Halvv [link] [comments]  ( 42 min )
    selfmade ai not working on multiple Inputs for decision making
    Hi, I am trying to make the worst possible ai (probably). For that I build my own matrix calculations and neuronal network (set inputs, set outputs, set hiddenLayers and hiidden nodes). I tried it out on a canvas, where I had 5000 dots moving around and they got a randomly filled brain. I set 1 goal in there and gave each dot 2 inputs (dot.pos.x-goal.pos.x/canvasSize.X) same for y. Then I checked which dots were in a specific radius of the goal, saved them and shredded the rest. Then I took 80% of the survivors and made children of a pair of 2 each and mutated the newborn childs per random (chance of 5% to change a weight or bias field for a value of +- 0.5) This actually worked. After about 20 iterations I got good dots which were following the goal wherever it was. Now I wanted to h…  ( 45 min )
  • Open

    Nonparametric Multi-shape Modeling with Uncertainty Quantification. (arXiv:2206.09127v3 [stat.ML] UPDATED)
    The modeling and uncertainty quantification of closed curves is an important problem in the field of shape analysis, and can have significant ramifications for subsequent statistical tasks. Many of these tasks involve collections of closed curves, which often exhibit structural similarities at multiple levels. Modeling multiple closed curves in a way that efficiently incorporates such between-curve dependence remains a challenging problem. In this work, we propose and investigate a multiple-output (a.k.a. multi-output), multi-dimensional Gaussian process modeling framework. We illustrate the proposed methodological advances, and demonstrate the utility of meaningful uncertainty quantification, on several curve and shape-related tasks. This model-based approach not only addresses the problem of inference on closed curves (and their shapes) with kernel constructions, but also opens doors to nonparametric modeling of multi-level dependence for functional objects in general.
    SegViz: A federated-learning based framework for multi-organ segmentation on heterogeneous data sets with partial annotations. (arXiv:2301.07074v3 [cs.CV] UPDATED)
    Segmentation is one of the most primary tasks in deep learning for medical imaging, owing to its multiple downstream clinical applications. However, generating manual annotations for medical images is time-consuming, requires high skill, and is an expensive effort, especially for 3D images. One potential solution is to aggregate knowledge from partially annotated datasets from multiple groups to collaboratively train global models using Federated Learning. To this end, we propose SegViz, a federated learning-based framework to train a segmentation model from distributed non-i.i.d datasets with partial annotations. The performance of SegViz was compared against training individual models separately on each dataset as well as centrally aggregating all the datasets in one place and training a single model. The SegViz framework using FedBN as the aggregation strategy demonstrated excellent performance on the external BTCV set with dice scores of 0.93, 0.83, 0.55, and 0.75 for segmentation of liver, spleen, pancreas, and kidneys, respectively, significantly ($p<0.05$) better (except spleen) than the dice scores of 0.87, 0.83, 0.42, and 0.48 for the baseline models. In contrast, the central aggregation model significantly ($p<0.05$) performed poorly on the test dataset with dice scores of 0.65, 0, 0.55, and 0.68. Our results demonstrate the potential of the SegViz framework to train multi-task models from distributed datasets with partial labels. All our implementations are open-source and available at https://anonymous.4open.science/r/SegViz-B746
    Supervised Feature Selection with Neuron Evolution in Sparse Neural Networks. (arXiv:2303.07200v2 [cs.NE] UPDATED)
    Feature selection that selects an informative subset of variables from data not only enhances the model interpretability and performance but also alleviates the resource demands. Recently, there has been growing attention on feature selection using neural networks. However, existing methods usually suffer from high computational costs when applied to high-dimensional datasets. In this paper, inspired by evolution processes, we propose a novel resource-efficient supervised feature selection method using sparse neural networks, named \enquote{NeuroFS}. By gradually pruning the uninformative features from the input layer of a sparse neural network trained from scratch, NeuroFS derives an informative subset of features efficiently. By performing several experiments on $11$ low and high-dimensional real-world benchmarks of different types, we demonstrate that NeuroFS achieves the highest ranking-based score among the considered state-of-the-art supervised feature selection models. The code is available on GitHub.
    Interaction Pattern Disentangling for Multi-Agent Reinforcement Learning. (arXiv:2207.03902v2 [cs.LG] UPDATED)
    Deep cooperative multi-agent reinforcement learning has demonstrated its remarkable success over a wide spectrum of complex control tasks. However, recent advances in multi-agent learning mainly focus on value decomposition while leaving entity interactions still intertwined, which easily leads to over-fitting on noisy interactions between entities. In this work, we introduce a novel interactiOn Pattern disenTangling (OPT) method, to disentangle not only the joint value function into agent-wise value functions for decentralized execution, but also the entity interactions into interaction prototypes, each of which represents an underlying interaction pattern within a subgroup of the entities. OPT facilitates filtering the noisy interactions between irrelevant entities and thus significantly improves generalizability as well as interpretability. Specifically, OPT introduces a sparse disagreement mechanism to encourage sparsity and diversity among discovered interaction prototypes. Then the model selectively restructures these prototypes into a compact interaction pattern by an aggregator with learnable weights. To alleviate the training instability issue caused by partial observability, we propose to maximize the mutual information between the aggregation weights and the history behaviors of each agent. Experiments on both single-task and multi-task benchmarks demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code is available at https://github.com/liushunyu/OPT.
    Blind Acoustic Room Parameter Estimation Using Phase Features. (arXiv:2303.07449v1 [eess.AS])
    Modeling room acoustics in a field setting involves some degree of blind parameter estimation from noisy and reverberant audio. Modern approaches leverage convolutional neural networks (CNNs) in tandem with time-frequency representation. Using short-time Fourier transforms to develop these spectrogram-like features has shown promising results, but this method implicitly discards a significant amount of audio information in the phase domain. Inspired by recent works in speech enhancement, we propose utilizing novel phase-related features to extend recent approaches to blindly estimate the so-called "reverberation fingerprint" parameters, namely, volume and RT60. The addition of these features is shown to outperform existing methods that rely solely on magnitude-based spectral features across a wide range of acoustics spaces. We evaluate the effectiveness of the deployment of these novel features in both single-parameter and multi-parameter estimation strategies, using a novel dataset that consists of publicly available room impulse responses (RIRs), synthesized RIRs, and in-house measurements of real acoustic spaces.
    Spatial Entropy as an Inductive Bias for Vision Transformers. (arXiv:2206.04636v3 [cs.CV] UPDATED)
    Recent work on Vision Transformers (VTs) showed that introducing a local inductive bias in the VT architecture helps reducing the number of samples necessary for training. However, the architecture modifications lead to a loss of generality of the Transformer backbone, partially contradicting the push towards the development of uniform architectures, shared, e.g., by both the Computer Vision and the Natural Language Processing areas. In this work, we propose a different and complementary direction, in which a local bias is introduced using an auxiliary self-supervised task, performed jointly with standard supervised training. Specifically, we exploit the observation that the attention maps of VTs, when trained with self-supervision, can contain a semantic segmentation structure which does not spontaneously emerge when training is supervised. Thus, we explicitly encourage the emergence of this spatial clustering as a form of training regularization. In more detail, we exploit the assumption that, in a given image, objects usually correspond to few connected regions, and we propose a spatial formulation of the information entropy to quantify this object-based inductive bias. By minimizing the proposed spatial entropy, we include an additional self-supervised signal during training. Using extensive experiments, we show that the proposed regularization leads to equivalent or better results than other VT proposals which include a local bias by changing the basic Transformer architecture, and it can drastically boost the VT final accuracy when using small-medium training sets. The code is available at https://github.com/helia95/SAR.
    Meta-learning approaches for few-shot learning: A survey of recent advances. (arXiv:2303.07502v1 [cs.LG])
    Despite its astounding success in learning deeper multi-dimensional data, the performance of deep learning declines on new unseen tasks mainly due to its focus on same-distribution prediction. Moreover, deep learning is notorious for poor generalization from few samples. Meta-learning is a promising approach that addresses these issues by adapting to new tasks with few-shot datasets. This survey first briefly introduces meta-learning and then investigates state-of-the-art meta-learning methods and recent advances in: (I) metric-based, (II) memory-based, (III), and learning-based methods. Finally, current challenges and insights for future researches are discussed.
    Multiway clustering of 3-order tensor via affinity matrix. (arXiv:2303.07757v1 [cs.LG])
    We propose a new method of multiway clustering for 3-order tensors via affinity matrix (MCAM). Based on a notion of similarity between the tensor slices and the spread of information of each slice, our model builds an affinity/similarity matrix on which we apply advanced clustering methods. The combination of all clusters of the three modes delivers the desired multiway clustering. Finally, MCAM achieves competitive results compared with other known algorithms on synthetics and real datasets.
    A Contrastive Knowledge Transfer Framework for Model Compression and Transfer Learning. (arXiv:2303.07599v1 [cs.LG])
    Knowledge Transfer (KT) achieves competitive performance and is widely used for image classification tasks in model compression and transfer learning. Existing KT works transfer the information from a large model ("teacher") to train a small model ("student") by minimizing the difference of their conditionally independent output distributions. However, these works overlook the high-dimension structural knowledge from the intermediate representations of the teacher, which leads to limited effectiveness, and they are motivated by various heuristic intuitions, which makes it difficult to generalize. This paper proposes a novel Contrastive Knowledge Transfer Framework (CKTF), which enables the transfer of sufficient structural knowledge from the teacher to the student by optimizing multiple contrastive objectives across the intermediate representations between them. Also, CKTF provides a generalized agreement to existing KT techniques and increases their performance significantly by deriving them as specific cases of CKTF. The extensive evaluation shows that CKTF consistently outperforms the existing KT works by 0.04% to 11.59% in model compression and by 0.4% to 4.75% in transfer learning on various models and datasets.
    Adaptive Policy Learning for Offline-to-Online Reinforcement Learning. (arXiv:2303.07693v1 [cs.LG])
    Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is impractical when online interactions are costly. Offline RL provides an alternative solution by directly learning from the previously collected dataset. However, it will yield unsatisfactory performance if the quality of the offline datasets is poor. In this paper, we consider an offline-to-online setting where the agent is first learned from the offline dataset and then trained online, and propose a framework called Adaptive Policy Learning for effectively taking advantage of offline and online data. Specifically, we explicitly consider the difference between the online and offline data and apply an adaptive update scheme accordingly, that is, a pessimistic update strategy for the offline dataset and an optimistic/greedy update scheme for the online dataset. Such a simple and effective method provides a way to mix the offline and online RL and achieve the best of both worlds. We further provide two detailed algorithms for implementing the framework through embedding value or policy-based RL algorithms into it. Finally, we conduct extensive experiments on popular continuous control tasks, and results show that our algorithm can learn the expert policy with high sample efficiency even when the quality of offline dataset is poor, e.g., random dataset.
    CECT: Controllable Ensemble CNN and Transformer for COVID-19 Image Classification. (arXiv:2302.02314v2 [eess.IV] UPDATED)
    Most computer vision models are developed based on either convolutional neural network (CNN) or transformer, while the former (latter) method captures local (global) features. To relieve model performance limitations due to the lack of global (local) features, we develop a novel classification network CECT by controllable ensemble CNN and transformer. CECT is composed of a convolutional encoder block, a transposed-convolutional decoder block, and a transformer classification block. Different from conventional CNN- or transformer-based methods, our CECT can capture features at both multi-local and global scales. Besides, the contribution of local features at different scales can be controlled with the proposed ensemble coefficients. We evaluate CECT on two public COVID-19 datasets and it outperforms existing state-of-the-art methods on all evaluation metrics. With remarkable feature capture ability, we believe CECT can be extended to other medical image classification scenarios as a diagnosis assistant.
    Sequential three-way decisions with a single hidden layer feedforward neural network. (arXiv:2303.07589v1 [cs.LG])
    The three-way decisions strategy has been employed to construct network topology in a single hidden layer feedforward neural network (SFNN). However, this model has a general performance, and does not consider the process costs, since it has fixed threshold parameters. Inspired by the sequential three-way decisions (STWD), this paper proposes STWD with an SFNN (STWD-SFNN) to enhance the performance of networks on structured datasets. STWD-SFNN adopts multi-granularity levels to dynamically learn the number of hidden layer nodes from coarse to fine, and set the sequential threshold parameters. Specifically, at the coarse granular level, STWD-SFNN handles easy-to-classify instances by applying strict threshold conditions, and with the increasing number of hidden layer nodes at the fine granular level, STWD-SFNN focuses more on disposing of the difficult-to-classify instances by applying loose threshold conditions, thereby realizing the classification of instances. Moreover, STWD-SFNN considers and reports the process cost produced from each granular level. The experimental results verify that STWD-SFNN has a more compact network on structured datasets than other SFNN models, and has better generalization performance than the competitive models. All models and datasets can be downloaded from https://github.com/wuc567/Machine-learning/tree/main/STWD-SFNN.
    Leveraging Demonstrations with Latent Space Priors. (arXiv:2210.14685v2 [cs.LG] UPDATED)
    Demonstrations provide insight into relevant state or action space regions, bearing great potential to boost the efficiency and practicality of reinforcement learning agents. In this work, we propose to leverage demonstration datasets by combining skill learning and sequence modeling. Starting with a learned joint latent space, we separately train a generative model of demonstration sequences and an accompanying low-level policy. The sequence model forms a latent space prior over plausible demonstration behaviors to accelerate learning of high-level policies. We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning on transfer tasks. Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance. We benchmark our approach on a set of challenging sparse-reward environments with a complex, simulated humanoid, and on offline RL benchmarks for navigation and object manipulation. Videos, source code and pre-trained models are available at the corresponding project website at https://facebookresearch.github.io/latent-space-priors .
    Don't PANIC: Prototypical Additive Neural Network for Interpretable Classification of Alzheimer's Disease. (arXiv:2303.07125v2 [cs.LG] UPDATED)
    Alzheimer's disease (AD) has a complex and multifactorial etiology, which requires integrating information about neuroanatomy, genetics, and cerebrospinal fluid biomarkers for accurate diagnosis. Hence, recent deep learning approaches combined image and tabular information to improve diagnostic performance. However, the black-box nature of such neural networks is still a barrier for clinical applications, in which understanding the decision of a heterogeneous model is integral. We propose PANIC, a prototypical additive neural network for interpretable AD classification that integrates 3D image and tabular data. It is interpretable by design and, thus, avoids the need for post-hoc explanations that try to approximate the decision of a network. Our results demonstrate that PANIC achieves state-of-the-art performance in AD classification, while directly providing local and global explanations. Finally, we show that PANIC extracts biologically meaningful signatures of AD, and satisfies a set of desirable desiderata for trustworthy machine learning. Our implementation is available at https://github.com/ai-med/PANIC .
    Mitigating Algorithmic Bias with Limited Annotations. (arXiv:2207.10018v3 [cs.LG] UPDATED)
    Existing work on fairness modeling commonly assumes that sensitive attributes for all instances are fully available, which may not be true in many real-world applications due to the high cost of acquiring sensitive information. When sensitive attributes are not disclosed or available, it is needed to manually annotate a small part of the training data to mitigate bias. However, the skewed distribution across different sensitive groups preserves the skewness of the original dataset in the annotated subset, which leads to non-optimal bias mitigation. To tackle this challenge, we propose Active Penalization Of Discrimination (APOD), an interactive framework to guide the limited annotations towards maximally eliminating the effect of algorithmic bias. The proposed APOD integrates discrimination penalization with active instance selection to efficiently utilize the limited annotation budget, and it is theoretically proved to be capable of bounding the algorithmic bias. According to the evaluation on five benchmark datasets, APOD outperforms the state-of-the-arts baseline methods under the limited annotation budget, and shows comparable performance to fully annotated bias mitigation, which demonstrates that APOD could benefit real-world applications when sensitive information is limited.
    A Broad Ensemble Learning System for Drifting Stream Classification. (arXiv:2110.03540v2 [cs.LG] UPDATED)
    In a data stream environment, classification models must handle concept drift efficiently and effectively. Ensemble methods are widely used for this purpose; however, the ones available in the literature either use a large data chunk to update the model or learn the data one by one. In the former, the model may miss the changes in the data distribution, and in the latter, the model may suffer from inefficiency and instability. To address these issues, we introduce a novel ensemble approach based on the Broad Learning System (BLS), where mini chunks are used at each update. BLS is an effective lightweight neural architecture recently developed for incremental learning. Although it is fast, it requires huge data chunks for effective updates, and is unable to handle dynamic changes observed in data streams. Our proposed approach named Broad Ensemble Learning System (BELS) uses a novel updating method that significantly improves best-in-class model accuracy. It employs an ensemble of output layers to address the limitations of BLS and handle drifts. Our model tracks the changes in the accuracy of the ensemble components and react to these changes. We present the mathematical derivation of BELS, perform comprehensive experiments with 20 datasets that demonstrate the adaptability of our model to various drift types, and provide hyperparameter and ablation analysis of our proposed model. Our experiments show that the proposed approach outperforms nine state-of-the-art baselines and supplies an overall improvement of 13.28% in terms of average prequential accuracy.
    Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits. (arXiv:2302.06025v2 [stat.ML] UPDATED)
    We consider the sequential decision-making problem where the mean outcome is a non-linear function of the chosen action. Compared with the linear model, two curious phenomena arise in non-linear models: first, in addition to the "learning phase" with a standard parametric rate for estimation or regret, there is an "burn-in period" with a fixed cost determined by the non-linear function; second, achieving the smallest burn-in cost requires new exploration algorithms. For a special family of non-linear functions named ridge functions in the literature, we derive upper and lower bounds on the optimal burn-in cost, and in addition, on the entire learning trajectory during the burn-in period via differential equations. In particular, a two-stage algorithm that first finds a good initial action and then treats the problem as locally linear is statistically optimal. In contrast, several classical algorithms, such as UCB and algorithms relying on regression oracles, are provably suboptimal.
    ISimDL: Importance Sampling-Driven Acceleration of Fault Injection Simulations for Evaluating the Robustness of Deep Learning. (arXiv:2303.08035v1 [cs.LG])
    Deep Learning (DL) systems have proliferated in many applications, requiring specialized hardware accelerators and chips. In the nano-era, devices have become increasingly more susceptible to permanent and transient faults. Therefore, we need an efficient methodology for analyzing the resilience of advanced DL systems against such faults, and understand how the faults in neural accelerator chips manifest as errors at the DL application level, where faults can lead to undetectable and unrecoverable errors. Using fault injection, we can perform resilience investigations of the DL system by modifying neuron weights and outputs at the software-level, as if the hardware had been affected by a transient fault. Existing fault models reduce the search space, allowing faster analysis, but requiring a-priori knowledge on the model, and not allowing further analysis of the filtered-out search space. Therefore, we propose ISimDL, a novel methodology that employs neuron sensitivity to generate importance sampling-based fault-scenarios. Without any a-priori knowledge of the model-under-test, ISimDL provides an equivalent reduction of the search space as existing works, while allowing long simulations to cover all the possible faults, improving on existing model requirements. Our experiments show that the importance sampling provides up to 15x higher precision in selecting critical faults than the random uniform sampling, reaching such precision in less than 100 faults. Additionally, we showcase another practical use-case for importance sampling for reliable DNN design, namely Fault Aware Training (FAT). By using ISimDL to select the faults leading to errors, we can insert the faults during the DNN training process to harden the DNN against such faults. Using importance sampling in FAT reduces the overhead required for finding faults that lead to a predetermined drop in accuracy by more than 12x.
    Label Information Bottleneck for Label Enhancement. (arXiv:2303.06836v2 [cs.LG] UPDATED)
    In this work, we focus on the challenging problem of Label Enhancement (LE), which aims to exactly recover label distributions from logical labels, and present a novel Label Information Bottleneck (LIB) method for LE. For the recovery process of label distributions, the label irrelevant information contained in the dataset may lead to unsatisfactory recovery performance. To address this limitation, we make efforts to excavate the essential label relevant information to improve the recovery performance. Our method formulates the LE problem as the following two joint processes: 1) learning the representation with the essential label relevant information, 2) recovering label distributions based on the learned representation. The label relevant information can be excavated based on the "bottleneck" formed by the learned representation. Significantly, both the label relevant information about the label assignments and the label relevant information about the label gaps can be explored in our method. Evaluation experiments conducted on several benchmark label distribution learning datasets verify the effectiveness and competitiveness of LIB. Our source codes are available https://github.com/qinghai-zheng/LIBLE
    Network Anomaly Detection Using Federated Learning. (arXiv:2303.07452v1 [cs.LG])
    Due to the veracity and heterogeneity in network traffic, detecting anomalous events is challenging. The computational load on global servers is a significant challenge in terms of efficiency, accuracy, and scalability. Our primary motivation is to introduce a robust and scalable framework that enables efficient network anomaly detection. We address the issue of scalability and efficiency for network anomaly detection by leveraging federated learning, in which multiple participants train a global model jointly. Unlike centralized training architectures, federated learning does not require participants to upload their training data to the server, preventing attackers from exploiting the training data. Moreover, most prior works have focused on traditional centralized machine learning, making federated machine learning under-explored in network anomaly detection. Therefore, we propose a deep neural network framework that could work on low to mid-end devices detecting network anomalies while checking if a request from a specific IP address is malicious or not. Compared to multiple traditional centralized machine learning models, the deep neural federated model reduces training time overhead. The proposed method performs better than baseline machine learning techniques on the UNSW-NB15 data set as measured by experiments conducted with an accuracy of 97.21% and a faster computation time.
    Physics-driven machine learning models coupling PyTorch and Firedrake. (arXiv:2303.06871v2 [cs.LG] UPDATED)
    Partial differential equations (PDEs) are central to describing and modelling complex physical systems that arise in many disciplines across science and engineering. However, in many realistic applications PDE modelling provides an incomplete description of the physics of interest. PDE-based machine learning techniques are designed to address this limitation. In this approach, the PDE is used as an inductive bias enabling the coupled model to rely on fundamental physical laws while requiring less training data. The deployment of high-performance simulations coupling PDEs and machine learning to complex problems necessitates the composition of capabilities provided by machine learning and PDE-based frameworks. We present a simple yet effective coupling between the machine learning framework PyTorch and the PDE system Firedrake that provides researchers, engineers and domain specialists with a high productive way of specifying coupled models while only requiring trivial changes to existing code.
    Style Feature Extraction Using Contrastive Conditioned Variational Autoencoders with Mutual Information Constraints. (arXiv:2303.08068v1 [cs.CV])
    It is crucial to extract fine-grained features such as styles from unlabeled data in data analysis. Unsupervised methods, such as variational autoencoders (VAEs), can extract styles, but the extracted styles are usually mixed with other features. We can isolate the styles using VAEs conditioned by class labels, known as conditional VAEs (CVAEs). However, methods to extract only styles using unlabeled data are not established. In this paper, we construct a CVAE-based method that extracts style features using only unlabeled data. The proposed model roughly consists of two parallel parts; a contrastive learning (CL) part that extracts style-independent features and a CVAE part that extracts style features. CL models generally learn representations independent of data augmentation, which can be seen as a perturbation in styles, in a self-supervised way. Taking the style-independent features as a condition, the CVAE learns to extract only styles. In the training procedure, a CL model is trained beforehand, and then the CVAE is trained while the CL model is fixed. Additionally, to prevent the CVAE from learning to ignore the condition and failing to extract only styles, we introduce a constraint based on mutual information between the CL features and the VAE features. Experiments on two simple datasets, MNIST and an original dataset based on Google Fonts, show that the proposed method efficiently extracts style features. Further experiments using real-world natural image datasets also show the method's extendability.
    Augmenting Softmax Information for Selective Classification with Out-of-Distribution Data. (arXiv:2207.07506v3 [cs.LG] UPDATED)
    Detecting out-of-distribution (OOD) data is a task that is receiving an increasing amount of research attention in the domain of deep learning for computer vision. However, the performance of detection methods is generally evaluated on the task in isolation, rather than also considering potential downstream tasks in tandem. In this work, we examine selective classification in the presence of OOD data (SCOD). That is to say, the motivation for detecting OOD samples is to reject them so their impact on the quality of predictions is reduced. We show under this task specification, that existing post-hoc methods perform quite differently compared to when evaluated only on OOD detection. This is because it is no longer an issue to conflate in-distribution (ID) data with OOD data if the ID data is going to be misclassified. However, the conflation within ID data of correct and incorrect predictions becomes undesirable. We also propose a novel method for SCOD, Softmax Information Retaining Combination (SIRC), that augments softmax-based confidence scores with feature-agnostic information such that their ability to identify OOD samples is improved without sacrificing separation between correct and incorrect ID predictions. Experiments on a wide variety of ImageNet-scale datasets and convolutional neural network architectures show that SIRC is able to consistently match or outperform the baseline for SCOD, whilst existing OOD detection methods fail to do so.
    Training Stronger Spiking Neural Networks with Biomimetic Adaptive Internal Association Neurons. (arXiv:2207.11670v2 [cs.NE] UPDATED)
    As the third generation of neural networks, spiking neural networks (SNNs) are dedicated to exploring more insightful neural mechanisms to achieve near-biological intelligence. Intuitively, biomimetic mechanisms are crucial to understanding and improving SNNs. For example, the associative long-term potentiation (ALTP) phenomenon suggests that in addition to learning mechanisms between neurons, there are associative effects within neurons. However, most existing methods only focus on the former and lack exploration of the internal association effects. In this paper, we propose a novel Adaptive Internal Association~(AIA) neuron model to establish previously ignored influences within neurons. Consistent with the ALTP phenomenon, the AIA neuron model is adaptive to input stimuli, and internal associative learning occurs only when both dendrites are stimulated at the same time. In addition, we employ weighted weights to measure internal associations and introduce intermediate caches to reduce the volatility of associations. Extensive experiments on prevailing neuromorphic datasets show that the proposed method can potentiate or depress the firing of spikes more specifically, resulting in better performance with fewer spikes. It is worth noting that without adding any parameters at inference, the AIA model achieves state-of-the-art performance on DVS-CIFAR10~(83.9\%) and N-CARS~(95.64\%) datasets.
    Uni-RXN: A Unified Framework Bridging the Gap between Chemical Reaction Pretraining and Conditional Molecule Generation. (arXiv:2303.06965v2 [cs.LG] UPDATED)
    Chemical reactions are the fundamental building blocks of drug design and organic chemistry research. In recent years, there has been a growing need for a large-scale deep-learning framework that can efficiently capture the basic rules of chemical reactions. In this paper, we have proposed a unified framework that addresses both the reaction representation learning and molecule generation tasks, which allows for a more holistic approach. Inspired by the organic chemistry mechanism, we develop a novel pretraining framework that enables us to incorporate inductive biases into the model. Our framework achieves state-of-the-art results on challenging downstream tasks. By possessing chemical knowledge, this framework can be applied to reaction-based generative models, overcoming the limitations of current molecule generation models that rely on a small number of reaction templates. In the extensive experiments, our model generates synthesizable drug-like structures of high quality. Overall, our work presents a significant step toward a large-scale deep-learning framework for a variety of reaction-based applications.
    Information-Theoretic Regret Bounds for Bandits with Fixed Expert Advice. (arXiv:2303.08102v1 [cs.LG])
    We investigate the problem of bandits with expert advice when the experts are fixed and known distributions over the actions. Improving on previous analyses, we show that the regret in this setting is controlled by information-theoretic quantities that measure the similarity between experts. In some natural special cases, this allows us to obtain the first regret bound for EXP4 that can get arbitrarily close to zero if the experts are similar enough. While for a different algorithm, we provide another bound that describes the similarity between the experts in terms of the KL-divergence, and we show that this bound can be smaller than the one of EXP4 in some cases. Additionally, we provide lower bounds for certain classes of experts showing that the algorithms we analyzed are nearly optimal in some cases.
    Symbolic Synthesis of Neural Networks. (arXiv:2303.03340v2 [cs.NE] UPDATED)
    Neural networks adapt very well to distributed and continuous representations, but struggle to generalize from small amounts of data. Symbolic systems commonly achieve data efficient generalization by exploiting modularity to benefit from local and discrete features of a representation. These features allow symbolic programs to be improved one module at a time and to experience combinatorial growth in the values they can successfully process. However, it is difficult to design a component that can be used to form symbolic abstractions and which is adequately overparametrized to learn arbitrary high-dimensional transformations. I present Graph-based Symbolically Synthesized Neural Networks (G-SSNNs), a class of neural modules that operate on representations modified with synthesized symbolic programs to include a fixed set of local and discrete features. I demonstrate that the choice of injected features within a G-SSNN module modulates the data efficiency and generalization of baseline neural models, creating predictable patterns of both heightened and curtailed generalization. By training G-SSNNs, we also derive information about desirable semantics of symbolic programs without manual engineering. This information is compact and amenable to abstraction, but can also be flexibly recontextualized for other high-dimensional settings. In future work, I will investigate data efficient generalization and the transferability of learned symbolic representations in more complex G-SSNN designs based on more complex classes of symbolic programs. Experimental code and data are available at https://github.com/shlomenu/symbolically_synthesized_networks .
    A law of adversarial risk, interpolation, and label noise. (arXiv:2207.03933v3 [stat.ML] UPDATED)
    In supervised learning, it has been shown that label noise in the data can be interpolated without penalties on test accuracy. We show that interpolating label noise induces adversarial vulnerability, and prove the first theorem showing the relationship between label noise and adversarial risk for any data distribution. Our results are almost tight if we do not make any assumptions on the inductive bias of the learning algorithm. We then investigate how different components of this problem affect this result, including properties of the distribution. We also discuss non-uniform label noise distributions; and prove a new theorem showing uniform label noise induces nearly as large an adversarial risk as the worst poisoning with the same noise rate. Then, we provide theoretical and empirical evidence that uniform label noise is more harmful than typical real-world label noise. Finally, we show how inductive biases amplify the effect of label noise and argue the need for future work in this direction.
    Improving physics-informed neural networks with meta-learned optimization. (arXiv:2303.07127v2 [cs.LG] UPDATED)
    We show that the error achievable using physics-informed neural networks for solving systems of differential equations can be substantially reduced when these networks are trained using meta-learned optimization methods rather than to using fixed, hand-crafted optimizers as traditionally done. We choose a learnable optimization method based on a shallow multi-layer perceptron that is meta-trained for specific classes of differential equations. We illustrate meta-trained optimizers for several equations of practical relevance in mathematical physics, including the linear advection equation, Poisson's equation, the Korteweg--de Vries equation and Burgers' equation. We also illustrate that meta-learned optimizers exhibit transfer learning abilities, in that a meta-trained optimizer on one differential equation can also be successfully deployed on another differential equation.
    A Concept Knowledge Graph for User Next Intent Prediction at Alipay. (arXiv:2301.00503v3 [cs.CL] UPDATED)
    This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. To explicitly characterize user intent, we propose AlipayKG, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.
    Reachability Analysis of Neural Networks with Uncertain Parameters. (arXiv:2303.07917v1 [eess.SY])
    The literature on reachability analysis methods for neural networks currently only focuses on uncertainties on the network's inputs. In this paper, we introduce two new approaches for the reachability analysis of neural networks with additional uncertainties on their internal parameters (weight matrices and bias vectors of each layer), which may open the field of formal methods on neural networks to new topics, such as safe training or network repair. The first and main method that we propose relies on existing reachability analysis approach based on mixed monotonicity (initially introduced for dynamical systems). The second proposed approach extends the ESIP (Error-based Symbolic Interval Propagation) approach which was first implemented in the verification tool Neurify, and first mentioned in the publication of the tool VeriNet. Although the ESIP approach has been shown to often outperform the mixed-monotonicity reachability analysis in the classical case with uncertainties only on the network's inputs, we show in this paper through numerical simulations that the situation is greatly reversed (in terms of precision, computation time, memory usage, and broader applicability) when dealing with uncertainties on the weights and biases.
    Generalization of generative model for neuronal ensemble inference method. (arXiv:2211.05634v2 [q-bio.NC] UPDATED)
    Various brain functions that are necessary to maintain life activities materialize through the interaction of countless neurons. Therefore, it is important to analyze the structure of functional neuronal network. To elucidate the mechanism of brain function, many studies are being actively conducted on the structure of functional neuronal ensemble and hub, including all areas of neuroscience. In addition, recent study suggests that the existence of functional neuronal ensembles and hubs contributes to the efficiency of information processing. For these reasons, there is a demand for methods to infer functional neuronal ensembles from neuronal activity data, and methods based on Bayesian inference have been proposed. However, there is a problem in modeling the activity in Bayesian inference. The features of each neuron's activity have non-stationarity depending on physiological experimental conditions. As a result, the assumption of stationarity in Bayesian inference model impedes inference, which leads to destabilization of inference results and degradation of inference accuracy. In this study, we extend the range of the variable for expressing the neuronal state, and generalize the likelihood of the model for extended variables. By comparing with the previous study, our model can express the neuronal state in larger space. This generalization without restriction of the binary input enables us to perform soft clustering and apply the method to non-stationary neuroactivity data. In addition, for the effectiveness of the method, we apply the developed method to multiple synthetic fluorescence data generated from the electrical potential data in leaky integrated-and-fire model.
    Bayesian Prompt Learning for Image-Language Model Generalization. (arXiv:2210.02390v2 [cs.CV] UPDATED)
    Foundational image-language models have generated considerable interest due to their efficient adaptation to downstream tasks by prompt learning. Prompt learning treats part of the language model input as trainable while freezing the rest, and optimizes an Empirical Risk Minimization objective. However, Empirical Risk Minimization is known to suffer from distributional shifts which hurt generalizability to prompts unseen during training. By leveraging the regularization ability of Bayesian methods, we frame prompt learning from the Bayesian perspective and formulate it as a variational inference problem. Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts. Our framework is implemented by modeling the input prompt space in a probabilistic manner, as an a priori distribution which makes our proposal compatible with prompt learning approaches that are unconditional or conditional on the image. We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space, prevents learning spurious features, and exploits transferable invariant features. This results in better generalization of unseen prompts, even across different datasets and domains.
    Quantifying Causes of Arctic Amplification via Deep Learning based Time-series Causal Inference. (arXiv:2303.07122v2 [cs.AI] UPDATED)
    The warming of the Arctic, also known as Arctic amplification, is led by several atmospheric and oceanic drivers, however, the details of its underlying thermodynamic causes are still unknown. Inferring the causal effects of atmospheric processes on sea ice melt using fixed treatment effect strategies leads to unrealistic counterfactual estimations. Such models are also prone to bias due to time-varying confoundedness. In order to tackle these challenges, we propose TCINet - time-series causal inference model to infer causation under continuous treatment using recurrent neural networks. Through experiments on synthetic and observational data, we show how our research can substantially improve the ability to quantify the leading causes of Arctic sea ice melt.
    Training Robust Spiking Neural Networks on Neuromorphic Data with Spatiotemporal Fragments. (arXiv:2207.11659v3 [cs.CV] UPDATED)
    Neuromorphic vision sensors (event cameras) are inherently suitable for spiking neural networks (SNNs) and provide novel neuromorphic vision data for this biomimetic model. Due to the spatiotemporal characteristics, novel data augmentations are required to process the unconventional visual signals of these cameras. In this paper, we propose a novel Event SpatioTemporal Fragments (ESTF) augmentation method. It preserves the continuity of neuromorphic data by drifting or inverting fragments of the spatiotemporal event stream to simulate the disturbance of brightness variations, leading to more robust spiking neural networks. Extensive experiments are performed on prevailing neuromorphic datasets. It turns out that ESTF provides substantial improvements over pure geometric transformations and outperforms other event data augmentation methods. It is worth noting that the SNNs with ESTF achieve the state-of-the-art accuracy of 83.9\% on the CIFAR10-DVS dataset.
    Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning. (arXiv:2303.06710v2 [cs.RO] UPDATED)
    In a Human-in-the-Loop paradigm, a robotic agent is able to act mostly autonomously in solving a task, but can request help from an external expert when needed. However, knowing when to request such assistance is critical: too few requests can lead to the robot making mistakes, but too many requests can overload the expert. In this paper, we present a Reinforcement Learning based approach to this problem, where a semi-autonomous agent asks for external assistance when it has low confidence in the eventual success of the task. The confidence level is computed by estimating the variance of the return from the current state. We show that this estimate can be iteratively improved during training using a Bellman-like recursion. On discrete navigation problems with both fully- and partially-observable state information, we show that our method makes effective use of a limited budget of expert calls at run-time, despite having no access to the expert at training time.
    Dataset Distillation Using Parameter Pruning. (arXiv:2209.14609v4 [cs.CV] UPDATED)
    In many fields, the acquisition of advanced models depends on large datasets, making data storage and model training expensive. As a solution, dataset distillation can synthesize a small dataset that preserves most information of the original large dataset. The recently proposed dataset distillation method by matching network parameters has been proven effective for several datasets. However, the dimensions of network parameters are typically large. Furthermore, some parameters are difficult to match during the distillation process, degrading distillation performance. Based on this observation, this study proposes a novel dataset distillation method based on parameter pruning that solves the problem. The proposed method can synthesize more robust distilled datasets and improve distillation performance by pruning difficult-to-match parameters during the distillation process. Experimental results on three datasets show that the proposed method outperforms other state-of-the-art dataset distillation methods.
    Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial Zonotopes. (arXiv:2210.10691v2 [cs.RO] UPDATED)
    While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and is implemented via mixed-integer optimization. The safety constraints for action projection are obtained by applying parameterized reachability analysis using polynomial zonotopes, which enables to accurately capture the nonlinear effects of the actions on the system. In contrast to other state-of-the-art approaches for action projection, our safety shield can efficiently handle input constraints and dynamic obstacles, eases incorporation of the spatial robot dimensions into the safety constraints, guarantees robust safety despite process noise and measurement errors, and is well suited for high-dimensional systems, as we demonstrate on several challenging benchmark systems.
    Statistical Hardware Design With Multi-model Active Learning. (arXiv:2303.08054v1 [cs.AR])
    With the rising complexity of numerous novel applications that serve our modern society comes the strong need to design efficient computing platforms. Designing efficient hardware is, however, a complex multi-objective problem that deals with multiple parameters and their interactions. Given that there are a large number of parameters and objectives involved in hardware design, synthesizing all possible combinations is not a feasible method to find the optimal solution. One promising approach to tackle this problem is statistical modeling of a desired hardware performance. Here, we propose a model-based active learning approach to solve this problem. Our proposed method uses Bayesian models to characterize various aspects of hardware performance. We also use transfer learning and Gaussian regression bootstrapping techniques in conjunction with active learning to create more accurate models. Our proposed statistical modeling method provides hardware models that are sufficiently accurate to perform design space exploration as well as performance prediction simultaneously. We use our proposed method to perform design space exploration and performance prediction for various hardware setups, such as micro-architecture design and OpenCL kernels for FPGA targets. Our experiments show that the number of samples required to create performance models significantly reduces while maintaining the predictive power of our proposed statistical models. For instance, in our performance prediction setting, the proposed method needs 65\% fewer samples to create the model, and in the design space exploration setting, our proposed method can find the best parameter settings by exploring less than 50 samples.
    A Diffusion Model Predicts 3D Shapes from 2D Microscopy Images. (arXiv:2208.14125v3 [cs.CV] UPDATED)
    Diffusion models are a special type of generative model, capable of synthesising new data from a learnt distribution. We introduce DISPR, a diffusion-based model for solving the inverse problem of three-dimensional (3D) cell shape prediction from two-dimensional (2D) single cell microscopy images. Using the 2D microscopy image as a prior, DISPR is conditioned to predict realistic 3D shape reconstructions. To showcase the applicability of DISPR as a data augmentation tool in a feature-based single cell classification task, we extract morphological features from the red blood cells grouped into six highly imbalanced classes. Adding features from the DISPR predictions to the three minority classes improved the macro F1 score from $F1_\text{macro} = 55.2 \pm 4.6\%$ to $F1_\text{macro} = 72.2 \pm 4.9\%$. We thus demonstrate that diffusion models can be successfully applied to inverse biomedical problems, and that they learn to reconstruct 3D shapes with realistic morphological features from 2D microscopy images.
    Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization. (arXiv:2209.15382v2 [cs.LG] UPDATED)
    We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-linear policy parametrizations in infinite-horizon discounted Markov decision processes. In the deterministic case, when the Q-value is known and can be approximated by a linear combination of a known feature function up to a bias error, we show that a geometrically-increasing step size yields a linear convergence rate towards an optimal policy. We then consider the sample-based case, when the best representation of the Q- value function among linear combinations of a known feature function is known up to an estimation error. In this setting, we show that the algorithm enjoys the same linear guarantees as in the deterministic case up to an error term that depends on the estimation error, the bias error, and the condition number of the feature covariance matrix. Our results build upon the general framework of policy mirror descent and extend previous findings for the softmax tabular parametrization to the log-linear policy class.
    Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP. (arXiv:2210.04150v2 [cs.CV] UPDATED)
    Open-vocabulary semantic segmentation aims to segment an image into semantic regions according to text descriptions, which may not have been seen during training. Recent two-stage methods first generate class-agnostic mask proposals and then leverage pre-trained vision-language models, e.g., CLIP, to classify masked regions. We identify the performance bottleneck of this paradigm to be the pre-trained CLIP model, since it does not perform well on masked images. To address this, we propose to finetune CLIP on a collection of masked image regions and their corresponding text descriptions. We collect training data by mining an existing image-caption dataset (e.g., COCO Captions), using CLIP to match masked image regions to nouns in the image captions. Compared with the more precise and manually annotated segmentation labels with fixed classes (e.g., COCO-Stuff), we find our noisy but diverse dataset can better retain CLIP's generalization ability. Along with finetuning the entire model, we utilize the "blank" areas in masked images using a method we dub mask prompt tuning. Experiments demonstrate mask prompt tuning brings significant improvement without modifying any weights of CLIP, and it can further improve a fully finetuned model. In particular, when trained on COCO and evaluated on ADE20K-150, our best model achieves 29.6% mIoU, which is +8.5% higher than the previous state-of-the-art. For the first time, open-vocabulary generalist models match the performance of supervised specialist models in 2017 without dataset-specific adaptations.
    A new Potential-Based Reward Shaping for Reinforcement Learning Agent. (arXiv:1902.06239v3 [cs.AI] UPDATED)
    Potential-based reward shaping (PBRS) is a particular category of machine learning methods which aims to improve the learning speed of a reinforcement learning agent by extracting and utilizing extra knowledge while performing a task. There are two steps in the process of transfer learning: extracting knowledge from previously learned tasks and transferring that knowledge to use it in a target task. The latter step is well discussed in the literature with various methods being proposed for it, while the former has been explored less. With this in mind, the type of knowledge that is transmitted is very important and can lead to considerable improvement. Among the literature of both the transfer learning and the potential-based reward shaping, a subject that has never been addressed is the knowledge gathered during the learning process itself. In this paper, we presented a novel potential-based reward shaping method that attempted to extract knowledge from the learning process. The proposed method extracts knowledge from episodes' cumulative rewards. The proposed method has been evaluated in the Arcade learning environment and the results indicate an improvement in the learning process in both the single-task and the multi-task reinforcement learner agents.
    Epicasting: An Ensemble Wavelet Neural Network (EWNet) for Forecasting Epidemics. (arXiv:2206.10696v3 [cs.LG] UPDATED)
    Infectious diseases remain among the top contributors to human illness and death worldwide, among which many diseases produce epidemic waves of infection. The unavailability of specific drugs and ready-to-use vaccines to prevent most of these epidemics makes the situation worse. These force public health officials and policymakers to rely on early warning systems generated by reliable and accurate forecasts of epidemics. Accurate forecasts of epidemics can assist stakeholders in tailoring countermeasures, such as vaccination campaigns, staff scheduling, and resource allocation, to the situation at hand, which could translate to reductions in the impact of a disease. Unfortunately, most of these past epidemics exhibit nonlinear and non-stationary characteristics due to their spreading fluctuations based on seasonal-dependent variability and the nature of these epidemics. We analyse a wide variety of epidemic time series datasets using a maximal overlap discrete wavelet transform (MODWT) based autoregressive neural network and call it EWNet model. MODWT techniques effectively characterize non-stationary behavior and seasonal dependencies in the epidemic time series and improve the nonlinear forecasting scheme of the autoregressive neural network in the proposed ensemble wavelet network framework. From a nonlinear time series viewpoint, we explore the asymptotic stationarity of the proposed EWNet model to show the asymptotic behavior of the associated Markov Chain. We also theoretically investigate the effect of learning stability and the choice of hidden neurons in the proposal. From a practical perspective, we compare our proposed EWNet framework with several statistical, machine learning, and deep learning models. Experimental results show that the proposed EWNet is highly competitive compared to the state-of-the-art epidemic forecasting methods.
    Pseudo-Inverted Bottleneck Convolution for DARTS Search Space. (arXiv:2301.01286v2 [cs.LG] UPDATED)
    Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based neural architecture search method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-design changes inspired by ConvNeXt and studying the trade-off between accuracy, evaluation layer count, and computational cost. We introduce the Pseudo-Inverted Bottleneck Conv (PIBConv) block intending to reduce the computational footprint of the inverted bottleneck block proposed in ConvNeXt. Our proposed architecture is much less sensitive to evaluation layer count and outperforms a DARTS network with similar size significantly, at layer counts as small as 2. Furthermore, with less layers, not only does it achieve higher accuracy with lower computational footprint (measured in GMACs) and parameter count, GradCAM comparisons show that our network can better detect distinctive features of target objects compared to DARTS. Code is available from https://github.com/mahdihosseini/PIBConv.
    Component Segmentation of Engineering Drawings Using Graph Convolutional Networks. (arXiv:2212.00290v2 [cs.CV] UPDATED)
    We present a data-driven framework to automate the vectorization and machine interpretation of 2D engineering part drawings. In industrial settings, most manufacturing engineers still rely on manual reads to identify the topological and manufacturing requirements from drawings submitted by designers. The interpretation process is laborious and time-consuming, which severely inhibits the efficiency of part quotation and manufacturing tasks. While recent advances in image-based computer vision methods have demonstrated great potential in interpreting natural images through semantic segmentation approaches, the application of such methods in parsing engineering technical drawings into semantically accurate components remains a significant challenge. The severe pixel sparsity in engineering drawings also restricts the effective featurization of image-based data-driven methods. To overcome these challenges, we propose a deep learning based framework that predicts the semantic type of each vectorized component. Taking a raster image as input, we vectorize all components through thinning, stroke tracing, and cubic bezier fitting. Then a graph of such components is generated based on the connectivity between the components. Finally, a graph convolutional neural network is trained on this graph data to identify the semantic type of each component. We test our framework in the context of semantic segmentation of text, dimension and, contour components in engineering drawings. Results show that our method yields the best performance compared to recent image, and graph-based segmentation methods.
    Combinatorial Pure Exploration of Causal Bandits. (arXiv:2206.07883v3 [cs.LG] UPDATED)
    The combinatorial pure exploration of causal bandits is the following online learning task: given a causal graph with unknown causal inference distributions, in each round we choose a subset of variables to intervene or do no intervention, and observe the random outcomes of all random variables, with the goal that using as few rounds as possible, we can output an intervention that gives the best (or almost best) expected outcome on the reward variable $Y$ with probability at least $1-\delta$, where $\delta$ is a given confidence level. We provide the first gap-dependent and fully adaptive pure exploration algorithms on two types of causal models -- the binary generalized linear model (BGLM) and general graphs. For BGLM, our algorithm is the first to be designed specifically for this setting and achieves polynomial sample complexity, while all existing algorithms for general graphs have either sample complexity exponential to the graph size or some unreasonable assumptions. For general graphs, our algorithm provides a significant improvement on sample complexity, and it nearly matches the lower bound we prove. Our algorithms achieve such improvement by a novel integration of prior causal bandit algorithms and prior adaptive pure exploration algorithms, the former of which utilize the rich observational feedback in causal bandits but are not adaptive to reward gaps, while the latter of which have the issue in reverse.
    Dynamic Efficient Adversarial Training Guided by Gradient Magnitude. (arXiv:2103.03076v2 [cs.LG] UPDATED)
    Adversarial training is an effective but time-consuming way to train robust deep neural networks that can withstand strong adversarial attacks. As a response to its inefficiency, we propose Dynamic Efficient Adversarial Training (DEAT), which gradually increases the adversarial iteration during training. We demonstrate that the gradient's magnitude correlates with the curvature of the trained model's loss landscape, allowing it to reflect the effect of adversarial training. Therefore, based on the magnitude of the gradient, we propose a general acceleration strategy, M+ acceleration, which enables an automatic and highly effective method of adjusting the training procedure. M+ acceleration is computationally efficient and easy to implement. It is suited for DEAT and compatible with the majority of existing adversarial training techniques. Extensive experiments have been done on CIFAR-10 and ImageNet datasets with various training environments. The results show that the proposed M+ acceleration significantly improves the training efficiency of existing adversarial training methods while achieving similar robustness performance. This demonstrates that the strategy is highly adaptive and offers a valuable solution for automatic adversarial training.
    Domain Generalization in Machine Learning Models for Wireless Communications: Concepts, State-of-the-Art, and Open Issues. (arXiv:2303.08106v1 [cs.LG])
    Data-driven machine learning (ML) is promoted as one potential technology to be used in next-generations wireless systems. This led to a large body of research work that applies ML techniques to solve problems in different layers of the wireless transmission link. However, most of these applications rely on supervised learning which assumes that the source (training) and target (test) data are independent and identically distributed (i.i.d). This assumption is often violated in the real world due to domain or distribution shifts between the source and the target data. Thus, it is important to ensure that these algorithms generalize to out-of-distribution (OOD) data. In this context, domain generalization (DG) tackles the OOD-related issues by learning models on different and distinct source domains/datasets with generalization capabilities to unseen new domains without additional finetuning. Motivated by the importance of DG requirements for wireless applications, we present a comprehensive overview of the recent developments in DG and the different sources of domain shift. We also summarize the existing DG methods and review their applications in selected wireless communication problems, and conclude with insights and open questions.
    Adaptive-SpikeNet: Event-based Optical Flow Estimation using Spiking Neural Networks with Learnable Neuronal Dynamics. (arXiv:2209.11741v2 [cs.CV] UPDATED)
    Event-based cameras have recently shown great potential for high-speed motion estimation owing to their ability to capture temporally rich information asynchronously. Spiking Neural Networks (SNNs), with their neuro-inspired event-driven processing can efficiently handle such asynchronous data, while neuron models such as the leaky-integrate and fire (LIF) can keep track of the quintessential timing information contained in the inputs. SNNs achieve this by maintaining a dynamic state in the neuron memory, retaining important information while forgetting redundant data over time. Thus, we posit that SNNs would allow for better performance on sequential regression tasks compared to similarly sized Analog Neural Networks (ANNs). However, deep SNNs are difficult to train due to vanishing spikes at later layers. To that effect, we propose an adaptive fully-spiking framework with learnable neuronal dynamics to alleviate the spike vanishing problem. We utilize surrogate gradient-based backpropagation through time (BPTT) to train our deep SNNs from scratch. We validate our approach for the task of optical flow estimation on the Multi-Vehicle Stereo Event-Camera (MVSEC) dataset and the DSEC-Flow dataset. Our experiments on these datasets show an average reduction of 13% in average endpoint error (AEE) compared to state-of-the-art ANNs. We also explore several down-scaled models and observe that our SNN models consistently outperform similarly sized ANNs offering 10%-16% lower AEE. These results demonstrate the importance of SNNs for smaller models and their suitability at the edge. In terms of efficiency, our SNNs offer substantial savings in network parameters (48.3x) and computational energy (10.2x) while attaining ~10% lower EPE compared to the state-of-the-art ANN implementations.
    Neural Network Compression for Noisy Storage Devices. (arXiv:2102.07725v2 [cs.LG] UPDATED)
    Compression and efficient storage of neural network (NN) parameters is critical for applications that run on resource-constrained devices. Despite the significant progress in NN model compression, there has been considerably less investigation in the actual \textit{physical} storage of NN parameters. Conventionally, model compression and physical storage are decoupled, as digital storage media with error-correcting codes (ECCs) provide robust error-free storage. However, this decoupled approach is inefficient as it ignores the overparameterization present in most NNs and forces the memory device to allocate the same amount of resources to every bit of information regardless of its importance. In this work, we investigate analog memory devices as an alternative to digital media -- one that naturally provides a way to add more protection for significant bits unlike its counterpart, but is noisy and may compromise the stored model's performance if used naively. We develop a variety of robust coding strategies for NN weight storage on analog devices, and propose an approach to jointly optimize model compression and memory resource allocation. We then demonstrate the efficacy of our approach on models trained on MNIST, CIFAR-10 and ImageNet datasets for existing compression techniques. Compared to conventional error-free digital storage, our method reduces the memory footprint by up to one order of magnitude, without significantly compromising the stored model's accuracy.
    Ensemble forecasts in reproducing kernel Hilbert space family: dynamical systems in Wonderland. (arXiv:2207.14653v2 [math-ph] UPDATED)
    A methodological framework for ensemble-based estimation and simulation of high dimensional dynamical systems such as the oceanic or atmospheric flows is proposed. To that end, the dynamical system is embedded in a family of reproducing kernel Hilbert spaces with kernel functions driven by the dynamics. This family is nicknamed Wonderland for its appealing properties. In Wonderland the Koopman and Perron-Frobenius operators are unitary and uniformly continuous. This property warrants they can be expressed in exponential series of diagonalizable bounded infinitesimal generators. Access to Lyapunov exponents and to exact ensemble based expressions of the tangent linear dynamics are directly available as well. Wonderland enables us the devise of strikingly simple ensemble data assimilation methods for trajectory reconstructions in terms of constant-in-time linear combinations of trajectory samples. Such an embarrassingly simple strategy is made possible through a fully justified superposition principle ensuing from several fundamental theorems.
    Optimizing Quantum Federated Learning Based on Federated Quantum Natural Gradient Descent. (arXiv:2303.08116v1 [quant-ph])
    Quantum federated learning (QFL) is a quantum extension of the classical federated learning model across multiple local quantum devices. An efficient optimization algorithm is always expected to minimize the communication overhead among different quantum participants. In this work, we propose an efficient optimization algorithm, namely federated quantum natural gradient descent (FQNGD), and further, apply it to a QFL framework that is composed of a variational quantum circuit (VQC)-based quantum neural networks (QNN). Compared with stochastic gradient descent methods like Adam and Adagrad, the FQNGD algorithm admits much fewer training iterations for the QFL to get converged. Moreover, it can significantly reduce the total communication overhead among local quantum devices. Our experiments on a handwritten digit classification dataset justify the effectiveness of the FQNGD for the QFL framework in terms of a faster convergence rate on the training set and higher accuracy on the test set.
    Learning Audio-Visual Dereverberation. (arXiv:2106.07732v2 [cs.SD] UPDATED)
    Reverberation not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition. Prior work attempts to remove reverberation based on the audio modality only. Our idea is to learn to dereverberate speech from audio-visual observations. The visual environment surrounding a human speaker reveals important cues about the room geometry, materials, and speaker location, all of which influence the precise reverberation effects. We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed monaural sound and visual scene. In support of this new task, we develop a large-scale dataset SoundSpaces-Speech that uses realistic acoustic renderings of speech in real-world 3D scans of homes offering a variety of room acoustics. Demonstrating our approach on both simulated and real imagery for speech enhancement, speech recognition, and speaker identification, we show it achieves state-of-the-art performance and substantially improves over audio-only methods.
    Investigating Deep Learning Model Calibration for Classification Problems in Mechanics. (arXiv:2212.00881v2 [cs.LG] UPDATED)
    Recently, there has been a growing interest in applying machine learning methods to problems in engineering mechanics. In particular, there has been significant interest in applying deep learning techniques to predicting the mechanical behavior of heterogeneous materials and structures. Researchers have shown that deep learning methods are able to effectively predict mechanical behavior with low error for systems ranging from engineered composites, to geometrically complex metamaterials, to heterogeneous biological tissue. However, there has been comparatively little attention paid to deep learning model calibration, i.e., the match between predicted probabilities of outcomes and the true probabilities of outcomes. In this work, we perform a comprehensive investigation into ML model calibration across seven open access engineering mechanics datasets that cover three distinct types of mechanical problems. Specifically, we evaluate both model and model calibration error for multiple machine learning methods, and investigate the influence of ensemble averaging and post hoc model calibration via temperature scaling. Overall, we find that ensemble averaging of deep neural networks is both an effective and consistent tool for improving model calibration, while temperature scaling has comparatively limited benefits. Looking forward, we anticipate that this investigation will lay the foundation for future work in developing mechanics specific approaches to deep learning model calibration.
    Learning Representations of Bi-level Knowledge Graphs for Reasoning beyond Link Prediction. (arXiv:2302.02601v2 [cs.LG] UPDATED)
    Knowledge graphs represent known facts using triplets. While existing knowledge graph embedding methods only consider the connections between entities, we propose considering the relationships between triplets. For example, let us consider two triplets $T_1$ and $T_2$ where $T_1$ is (Academy_Awards, Nominates, Avatar) and $T_2$ is (Avatar, Wins, Academy_Awards). Given these two base-level triplets, we see that $T_1$ is a prerequisite for $T_2$. In this paper, we define a higher-level triplet to represent a relationship between triplets, e.g., $\langle T_1$, PrerequisiteFor, $T_2\rangle$ where PrerequisiteFor is a higher-level relation. We define a bi-level knowledge graph that consists of the base-level and the higher-level triplets. We also propose a data augmentation strategy based on the random walks on the bi-level knowledge graph to augment plausible triplets. Our model called BiVE learns embeddings by taking into account the structures of the base-level and the higher-level triplets, with additional consideration of the augmented triplets. We propose two new tasks: triplet prediction and conditional link prediction. Given a triplet $T_1$ and a higher-level relation, the triplet prediction predicts a triplet that is likely to be connected to $T_1$ by the higher-level relation, e.g., $\langle T_1$, PrerequisiteFor, ?$\rangle$. The conditional link prediction predicts a missing entity in a triplet conditioned on another triplet, e.g., $\langle T_1$, PrerequisiteFor, (Avatar, Wins, ?)$\rangle$. Experimental results show that BiVE significantly outperforms all other methods in the two new tasks and the typical base-level link prediction in real-world bi-level knowledge graphs.
    Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs. (arXiv:2303.08114v1 [cs.LG])
    Training data attribution (TDA) methods offer to trace a model's prediction on any given example back to specific influential training examples. Existing approaches do so by assigning a scalar influence score to each training example, under a simplifying assumption that influence is additive. But in reality, we observe that training examples interact in highly non-additive ways due to factors such as inter-example redundancy, training order, and curriculum learning effects. To study such interactions, we propose Simfluence, a new paradigm for TDA where the goal is not to produce a single influence score per example, but instead a training run simulator: the user asks, ``If my model had trained on example $z_1$, then $z_2$, ..., then $z_n$, how would it behave on $z_{test}$?''; the simulator should then output a simulated training run, which is a time series predicting the loss on $z_{test}$ at every step of the simulated run. This enables users to answer counterfactual questions about what their model would have learned under different training curricula, and to directly see where in training that learning would occur. We present a simulator, Simfluence-Linear, that captures non-additive interactions and is often able to predict the spiky trajectory of individual example losses with surprising fidelity. Furthermore, we show that existing TDA methods such as TracIn and influence functions can be viewed as special cases of Simfluence-Linear. This enables us to directly compare methods in terms of their simulation accuracy, subsuming several prior TDA approaches to evaluation. In experiments on large language model (LLM) fine-tuning, we show that our method predicts loss trajectories with much higher accuracy than existing TDA methods (doubling Spearman's correlation and reducing mean-squared error by 75%) across several tasks, models, and training methods.
    Variational Inference with Gaussian Mixture by Entropy Approximation. (arXiv:2202.13059v3 [stat.ML] UPDATED)
    Variational inference is a technique for approximating intractable posterior distributions in order to quantify the uncertainty of machine learning. Although the unimodal Gaussian distribution is usually chosen as a parametric distribution, it hardly approximates the multimodality. In this paper, we employ the Gaussian mixture distribution as a parametric distribution. A main difficulty of variational inference with the Gaussian mixture is how to approximate the entropy of the Gaussian mixture. We approximate the entropy of the Gaussian mixture as the sum of the entropy of the unimodal Gaussian, which can be analytically calculated. In addition, we theoretically analyze the approximation error between the true entropy and approximated one in order to reveal when our approximation works well. Specifically, the approximation error is controlled by the ratios of the distances between the means to the sum of the variances of the Gaussian mixture. Furthermore, it converges to zero when the ratios go to infinity. This situation seems to be more likely to occur in higher dimensional parametric spaces because of the curse of dimensionality. Therefore, our result guarantees that our approximation works well, for example, in neural networks that assume a large number of weights.
    Training set cleansing of backdoor poisoning by self-supervised representation learning. (arXiv:2210.10272v2 [cs.LG] UPDATED)
    A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN) classifiers, wherein the training dataset is poisoned with a small number of samples that each possess the backdoor pattern (usually a pattern that is either imperceptible or innocuous) and which are mislabeled to the attacker's target class. When trained on a backdoor-poisoned dataset, a DNN behaves normally on most benign test samples but makes incorrect predictions to the target class when the test sample has the backdoor pattern incorporated (i.e., contains a backdoor trigger). Here we focus on image classification tasks and show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin. By contrast, self-supervised representation learning ignores the labels of samples and learns a feature embedding based on images' semantic content. %We thus propose to use unsupervised representation learning to avoid emphasising backdoor-poisoned training samples and learn a similar feature embedding for samples of the same class. Using a feature embedding found by self-supervised representation learning, a data cleansing method, which combines sample filtering and re-labeling, is developed. Experiments on CIFAR-10 benchmark datasets show that our method achieves state-of-the-art performance in mitigating backdoor attacks.
    Detection of Abuse in Financial Transaction Descriptions Using Machine Learning. (arXiv:2303.08016v1 [cs.CL])
    Since introducing changes to the New Payments Platform (NPP) to include longer messages as payment descriptions, it has been identified that people are now using it for communication, and in some cases, the system was being used as a targeted form of domestic and family violence. This type of tech-assisted abuse poses new challenges in terms of identification, actions and approaches to rectify this behaviour. Commonwealth Bank of Australia's Artificial Intelligence Labs team (CBA AI Labs) has developed a new system using advances in deep learning models for natural language processing (NLP) to create a powerful abuse detector that periodically scores all the transactions, and identifies cases of high-risk abuse in millions of records. In this paper, we describe the problem of tech-assisted abuse in the context of banking services, outline the developed model and its performance, and the operating framework more broadly.
    CycleSense: Detecting Near Miss Incidents in Bicycle Traffic from Mobile Motion Sensors. (arXiv:2204.10416v2 [cs.LG] UPDATED)
    In cities worldwide, cars cause health and traffic problems whichcould be partly mitigated through an increased modal share of bicycles. Many people, however, avoid cycling due to a lack of perceived safety. For city planners, addressing this is hard as they lack insights intowhere cyclists feel safe and where they do not. To gain such insights,we have in previous work proposed the crowdsourcing platform SimRa,which allows cyclists to record their rides and report near miss incidentsvia a smartphone app. In this paper, we present CycleSense, a combination of signal pro-cessing and Machine Learning techniques, which partially automatesthe detection of near miss incidents, thus making the reporting of nearmiss incidents easier. Using the SimRa data set, we evaluate CycleSenseby comparing it to a baseline method used by SimRa and show that itsignificantly improves incident detection.
    CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting. (arXiv:2303.06274v2 [cs.CV] UPDATED)
    Nuclear detection, segmentation and morphometric profiling are essential in helping us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of reproducible algorithms for cellular recognition with real-time result inspection on public leaderboards. We conducted an extensive post-challenge analysis based on the top-performing models using 1,658 whole-slide images of colon tissue. With around 700 million detected nuclei per model, associated features were used for dysplasia grading and survival analysis, where we demonstrated that the challenge's improvement over the previous state-of-the-art led to significant boosts in downstream performance. Our findings also suggest that eosinophils and neutrophils play an important role in the tumour microevironment. We release challenge models and WSI-level results to foster the development of further methods for biomarker discovery.
    Thought Flow Nets: From Single Predictions to Trains of Model Thought. (arXiv:2107.12220v2 [cs.LG] UPDATED)
    When humans solve complex problems, they typically create a sequence of ideas (involving an intuitive decision, reflection, error correction, etc.) in order to reach a conclusive decision. Contrary to this, today's models are mostly trained to map an input to one single and fixed output. In this paper, we investigate how we can give models the opportunity of a second, third and $k$-th thought. Taking inspiration from Hegel's dialectics, we propose the concept of a thought flow which creates a sequence of predictions. We present a self-correction mechanism that is trained to estimate the model's correctness and performs iterative prediction updates based on the correctness prediction's gradient. We introduce our method at the example of question answering and conduct extensive experiments that demonstrate (i) our method's ability to correct its own predictions and (ii) its potential to notably improve model performances. In addition, we conduct a qualitative analysis of thought flow correction patterns and explore how thought flow predictions affect human users within a crowdsourcing study. We find that (iii) thought flows enable improved user performance and are perceived as more natural, correct, and intelligent as single and/or top-3 predictions.
    TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR. (arXiv:2301.00656v2 [eess.AS] UPDATED)
    Self-supervised learning (SSL) models confront challenges of abrupt informational collapse or slow dimensional collapse. We propose TriNet, which introduces a novel triple-branch architecture for preventing collapse and stabilizing the pre-training. TriNet learns the SSL latent embedding space and incorporates it to a higher level space for predicting pseudo target vectors generated by a frozen teacher. Our experimental results show that the proposed method notably stabilizes and accelerates pre-training and achieves a relative word error rate reduction (WERR) of 6.06% compared to the state-of-the-art (SOTA) Data2vec for a downstream benchmark ASR task. We will release our code at https://github.com/tencent-ailab/.
    CB2: Collaborative Natural Language Interaction Research Platform. (arXiv:2303.08127v1 [cs.LG])
    CB2 is a multi-agent platform to study collaborative natural language interaction in a grounded task-oriented scenario. It includes a 3D game environment, a backend server designed to serve trained models to human agents, and various tools and processes to enable scalable studies. We deploy CB2 at https://cb2.ai as a system demonstration with a learned instruction following model.
    EdgeServe: An Execution Layer for Decentralized Prediction. (arXiv:2303.08028v1 [cs.DB])
    The relevant features for a machine learning task may be aggregated from data sources collected on different nodes in a network. This problem, which we call decentralized prediction, creates a number of interesting systems challenges in managing data routing, placing computation, and time-synchronization. This paper presents EdgeServe, a machine learning system that can serve decentralized predictions. EdgeServe relies on a low-latency message broker to route data through a network to nodes that can serve predictions. EdgeServe relies on a series of novel optimizations that can tradeoff computation, communication, and accuracy. We evaluate EdgeServe on three decentralized prediction tasks: (1) multi-camera object tracking, (2) network intrusion detection, and (3) human activity recognition.
    One-Step Abductive Multi-Target Learning with Diverse Noisy Samples and Its Application to Tumour Segmentation for Breast Cancer. (arXiv:2110.10325v9 [cs.LG] UPDATED)
    Recent studies have demonstrated the effectiveness of the combination of machine learning and logical reasoning, including data-driven logical reasoning, knowledge driven machine learning and abductive learning, in inventing advanced artificial intelligence technologies. One-step abductive multi-target learning (OSAMTL), an approach inspired by abductive learning, via simply combining machine learning and logical reasoning in a one-step balanced way, has as well shown its effectiveness in handling complex noisy labels of a single noisy sample in medical histopathology whole slide image analysis (MHWSIA). However, OSAMTL is not suitable for the situation where diverse noisy samples (DiNS) are provided for a learning task. In this paper, giving definition of DiNS, we propose one-step abductive multi-target learning with DiNS (OSAMTL-DiNS) to expand the original OSAMTL to handle complex noisy labels of DiNS. Applying OSAMTL-DiNS to tumour segmentation for breast cancer in MHWSIA, we show that OSAMTL-DiNS is able to enable various state-of-the-art approaches for learning from noisy labels to achieve more rational predictions.
    Expectation Distance-based Distributional Clustering for Noise-Robustness. (arXiv:2110.08871v4 [cs.LG] UPDATED)
    This paper presents a clustering technique that reduces the susceptibility to data noise by learning and clustering the data-distribution and then assigning the data to the cluster of its distribution. In the process, it reduces the impact of noise on clustering results. This method involves introducing a new distance among distributions, namely the expectation distance (denoted, ED), that goes beyond the state-of-art distribution distance of optimal mass transport (denoted, $W_2$ for $2$-Wasserstein): The latter essentially depends only on the marginal distributions while the former also employs the information about the joint distributions. Using the ED, the paper extends the classical $K$-means and $K$-medoids clustering to those over data-distributions (rather than raw-data) and introduces $K$-medoids using $W_2$. The paper also presents the closed-form expressions of the $W_2$ and ED distance measures. The implementation results of the proposed ED and the $W_2$ distance measures to cluster real-world weather data as well as stock data are also presented, which involves efficiently extracting and using the underlying data distributions -- Gaussians for weather data versus lognormals for stock data. The results show striking performance improvement over classical clustering of raw-data, with higher accuracy realized for ED. Also, not only does the distribution-based clustering offer higher accuracy, but it also lowers the computation time due to reduced time-complexity.
    Eliciting Latent Predictions from Transformers with the Tuned Lens. (arXiv:2303.08112v1 [cs.LG])
    We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer. To do so, we train an affine probe for each block in a frozen pretrained model, making it possible to decode every hidden state into a distribution over the vocabulary. Our method, the \emph{tuned lens}, is a refinement of the earlier ``logit lens'' technique, which yielded useful insights but is often brittle. We test our method on various autoregressive language models with up to 20B parameters, showing it to be more predictive, reliable and unbiased than the logit lens. With causal experiments, we show the tuned lens uses similar features to the model itself. We also find the trajectory of latent predictions can be used to detect malicious inputs with high accuracy. All code needed to reproduce our results can be found at https://github.com/AlignmentResearch/tuned-lens.
    Deep Learning-Based Estimation and Goodness-of-Fit for Large-Scale Confirmatory Item Factor Analysis. (arXiv:2109.09500v2 [stat.ML] UPDATED)
    We investigate novel parameter estimation and goodness-of-fit (GOF) assessment methods for large-scale confirmatory item factor analysis (IFA) with many respondents, items, and latent factors. For parameter estimation, we extend Urban and Bauer's (2021) deep learning algorithm for exploratory IFA to the confirmatory setting by showing how to handle constraints on loadings and factor correlations. For GOF assessment, we explore simulation-based tests and indices that extend the classifier two-sample test (C2ST), a method that tests whether a deep neural network can distinguish between observed data and synthetic data sampled from a fitted IFA model. Proposed extensions include a test of approximate fit wherein the user specifies what percentage of observed and synthetic data should be distinguishable as well as a relative fit index (RFI) that is similar in spirit to the RFIs used in structural equation modeling. Via simulation studies, we show that: (1) the confirmatory extension of Urban and Bauer's (2021) algorithm obtains comparable estimates to a state-of-the-art estimation procedure in less time; (2) C2ST-based GOF tests control the empirical type I error rate and detect when the latent dimensionality is misspecified; and (3) the sampling distribution of the C2ST-based RFI depends on the sample size.
    Pretrained Language Models are Symbolic Mathematics Solvers too!. (arXiv:2110.03501v3 [stat.ML] UPDATED)
    Solving symbolic mathematics has always been of in the arena of human ingenuity that needs compositional reasoning and recurrence. However, recent studies have shown that large-scale language models such as transformers are universal and surprisingly can be trained as a sequence-to-sequence task to solve complex mathematical equations. These large transformer models need humongous amounts of training data to generalize to unseen symbolic mathematics problems. In this paper, we present a sample efficient way of solving the symbolic tasks by first pretraining the transformer model with language translation and then fine-tuning the pretrained transformer model to solve the downstream task of symbolic mathematics. We achieve comparable accuracy on the integration task with our pretrained model while using around $1.5$ orders of magnitude less number of training samples with respect to the state-of-the-art deep learning for symbolic mathematics. The test accuracy on differential equation tasks is considerably lower comparing with integration as they need higher order recursions that are not present in language translations. We propose the generalizability of our pretrained language model from Anna Karenina Principle (AKP). We pretrain our model with different pairs of language translations. Our results show language bias in solving symbolic mathematics tasks. Finally, we study the robustness of the fine-tuned model on symbolic math tasks against distribution shift, and our approach generalizes better in distribution shift scenarios for the function integration.
    Vision-based route following by an embodied insect-inspired sparse neural network. (arXiv:2303.08109v1 [cs.NE])
    We compared the efficiency of the FlyHash model, an insect-inspired sparse neural network (Dasgupta et al., 2017), to similar but non-sparse models in an embodied navigation task. This requires a model to control steering by comparing current visual inputs to memories stored along a training route. We concluded the FlyHash model is more efficient than others, especially in terms of data encoding.
    Do Transformers Parse while Predicting the Masked Word?. (arXiv:2303.08117v1 [cs.CL])
    Pre-trained language models have been shown to encode linguistic structures, e.g. dependency and constituency parse trees, in their embeddings while being trained on unsupervised loss functions like masked language modeling. Some doubts have been raised whether the models actually are doing parsing or only some computation weakly correlated with it. We study questions: (a) Is it possible to explicitly describe transformers with realistic embedding dimension, number of heads, etc. that are capable of doing parsing -- or even approximate parsing? (b) Why do pre-trained models capture parsing structure? This paper takes a step toward answering these questions in the context of generative modeling with PCFGs. We show that masked language models like BERT or RoBERTa of moderate sizes can approximately execute the Inside-Outside algorithm for the English PCFG [Marcus et al, 1993]. We also show that the Inside-Outside algorithm is optimal for masked language modeling loss on the PCFG-generated data. We also give a construction of transformers with $50$ layers, $15$ attention heads, and $1275$ dimensional embeddings in average such that using its embeddings it is possible to do constituency parsing with $>70\%$ F1 score on PTB dataset. We conduct probing experiments on models pre-trained on PCFG-generated data to show that this not only allows recovery of approximate parse tree, but also recovers marginal span probabilities computed by the Inside-Outside algorithm, which suggests an implicit bias of masked language modeling towards this algorithm.
    MeshDiffusion: Score-based Generative 3D Mesh Modeling. (arXiv:2303.08133v1 [cs.GR])
    We consider the task of generating realistic 3D shapes, which is useful for a variety of applications such as automatic scene generation and physical simulation. Compared to other 3D representations like voxels and point clouds, meshes are more desirable in practice, because (1) they enable easy and arbitrary manipulation of shapes for relighting and simulation, and (2) they can fully leverage the power of modern graphics pipelines which are mostly optimized for meshes. Previous scalable methods for generating meshes typically rely on sub-optimal post-processing, and they tend to produce overly-smooth or noisy surfaces without fine-grained geometric details. To overcome these shortcomings, we take advantage of the graph structure of meshes and use a simple yet very effective generative modeling method to generate 3D meshes. Specifically, we represent meshes with deformable tetrahedral grids, and then train a diffusion model on this direct parametrization. We demonstrate the effectiveness of our model on multiple generative tasks.
    Is Nash Equilibrium Approximator Learnable?. (arXiv:2108.07472v6 [cs.GT] UPDATED)
    In this paper, we investigate the learnability of the function approximator that approximates Nash equilibrium (NE) for games generated from a distribution. First, we offer a generalization bound using the Probably Approximately Correct (PAC) learning model. The bound describes the gap between the expected loss and empirical loss of the NE approximator. Afterward, we prove the agnostic PAC learnability of the Nash approximator. In addition to theoretical analysis, we demonstrate an application of NE approximator in experiments. The trained NE approximator can be used to warm-start and accelerate classical NE solvers. Together, our results show the practicability of approximating NE through function approximation.
    A Theory of Emergent In-Context Learning as Implicit Structure Induction. (arXiv:2303.07971v1 [cs.CL])
    Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on recombination of compositional operations found in natural language data. We derive an information-theoretic bound showing how in-context learning abilities arise from generic next-token prediction when the pretraining distribution has sufficient amounts of compositional structure, under linguistically motivated assumptions. A second bound provides a theoretical justification for the empirical success of prompting LLMs to output intermediate steps towards an answer. To validate theoretical predictions, we introduce a controlled setup for inducing in-context learning; unlike previous approaches, it accounts for the compositional nature of language. Trained transformers can perform in-context learning for a range of tasks, in a manner consistent with the theoretical results. Mirroring real-world LLMs in a miniature setup, in-context learning emerges when scaling parameters and data, and models perform better when prompted to output intermediate steps. Probing shows that in-context learning is supported by a representation of the input's compositional structure. Taken together, these results provide a step towards theoretical understanding of emergent behavior in large language models.
    Fast Rates for Maximum Entropy Exploration. (arXiv:2303.08059v1 [stat.ML])
    We consider the reinforcement learning (RL) setting, in which the agent has to act in unknown environment driven by a Markov Decision Process (MDP) with sparse or even reward free signals. In this situation, exploration becomes the main challenge. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization that was previously considered by Hazan et al. (2019) in the discounted setting. For this type of exploration, we propose an algorithm based on a game theoretic representation that has $\widetilde{\mathcal{O}}(H^3 S^2 A / \varepsilon^2)$ sample complexity thus improving the $\varepsilon$-dependence of Hazan et al. (2019), where $S$ is a number of states, $A$ is a number of actions, $H$ is an episode length, and $\varepsilon$ is a desired accuracy. The second type of entropy we study is the trajectory entropy. This objective function is closely related to the entropy-regularized MDPs, and we propose a simple modification of the UCBVI algorithm that has a sample complexity of order $\widetilde{\mathcal{O}}(1/\varepsilon)$ ignoring dependence in $S, A, H$. Interestingly enough, it is the first theoretical result in RL literature establishing that the exploration problem for the regularized MDPs can be statistically strictly easier (in terms of sample complexity) than for the ordinary MDPs.
    FPUS23: An Ultrasound Fetus Phantom Dataset with Deep Neural Network Evaluations for Fetus Orientations, Fetal Planes, and Anatomical Features. (arXiv:2303.07852v1 [eess.IV])
    Ultrasound imaging is one of the most prominent technologies to evaluate the growth, progression, and overall health of a fetus during its gestation. However, the interpretation of the data obtained from such studies is best left to expert physicians and technicians who are trained and well-versed in analyzing such images. To improve the clinical workflow and potentially develop an at-home ultrasound-based fetal monitoring platform, we present a novel fetus phantom ultrasound dataset, FPUS23, which can be used to identify (1) the correct diagnostic planes for estimating fetal biometric values, (2) fetus orientation, (3) their anatomical features, and (4) bounding boxes of the fetus phantom anatomies at 23 weeks gestation. The entire dataset is composed of 15,728 images, which are used to train four different Deep Neural Network models, built upon a ResNet34 backbone, for detecting aforementioned fetus features and use-cases. We have also evaluated the models trained using our FPUS23 dataset, to show that the information learned by these models can be used to substantially increase the accuracy on real-world ultrasound fetus datasets. We make the FPUS23 dataset and the pre-trained models publicly accessible at https://github.com/bharathprabakaran/FPUS23, which will further facilitate future research on fetal ultrasound imaging and analysis.
    Meta contrastive label correction for financial time series. (arXiv:2303.08103v1 [cs.LG])
    Financial applications such as stock price forecasting, usually face an issue that under the predefined labeling rules, it is hard to accurately predict the directions of stock movement. This is because traditional ways of labeling, taking Triple Barrier Method, for example, usually gives us inaccurate or even corrupted labels. To address this issue, we focus on two main goals. One is that our proposed method can automatically generate correct labels for noisy time series patterns, while at the same time, the method is capable of boosting classification performance on this new labeled dataset. Based on the aforementioned goals, our approach has the following three novelties: First, we fuse a new contrastive learning algorithm into the meta-learning framework to estimate correct labels iteratively when updating the classification model inside. Moreover, we utilize images generated from time series data through Gramian angular field and representative learning. Most important of all, we adopt multi-task learning to forecast temporal-variant labels. In the experiments, we work on 6% clean data and the rest unlabeled data. It is shown that our method is competitive and outperforms a lot compared with benchmarks.
    Understanding Model Complexity for temporal tabular and multi-variate time series, case study with Numerai data science tournament. (arXiv:2303.07925v1 [cs.LG])
    In this paper, we explore the use of different feature engineering and dimensionality reduction methods in multi-variate time-series modelling. Using a feature-target cross correlation time series dataset created from Numerai tournament, we demonstrate under over-parameterised regime, both the performance and predictions from different feature engineering methods converge to the same equilibrium, which can be characterised by the reproducing kernel Hilbert space. We suggest a new Ensemble method, which combines different random non-linear transforms followed by ridge regression for modelling high dimensional time-series. Compared to some commonly used deep learning models for sequence modelling, such as LSTM and transformers, our method is more robust (lower model variance over different random seeds and less sensitive to the choice of architecture) and more efficient. An additional advantage of our method is model simplicity as there is no need to use sophisticated deep learning frameworks such as PyTorch. The learned feature rankings are then applied to the temporal tabular prediction problem in the Numerai tournament, and the predictive power of feature rankings obtained from our method is better than the baseline prediction model based on moving averages
    Practically Solving LPN in High Noise Regimes Faster Using Neural Networks. (arXiv:2303.07987v1 [cs.LG])
    We conduct a systematic study of solving the learning parity with noise problem (LPN) using neural networks. Our main contribution is designing families of two-layer neural networks that practically outperform classical algorithms in high-noise, low-dimension regimes. We consider three settings where the numbers of LPN samples are abundant, very limited, and in between. In each setting we provide neural network models that solve LPN as fast as possible. For some settings we are also able to provide theories that explain the rationale of the design of our models. Comparing with the previous experiments of Esser, Kubler, and May (CRYPTO 2017), for dimension $n = 26$, noise rate $\tau = 0.498$, the ''Guess-then-Gaussian-elimination'' algorithm takes 3.12 days on 64 CPU cores, whereas our neural network algorithm takes 66 minutes on 8 GPUs. Our algorithm can also be plugged into the hybrid algorithms for solving middle or large dimension LPN instances.
    BODEGA: Benchmark for Adversarial Example Generation in Credibility Assessment. (arXiv:2303.08032v1 [cs.CL])
    Text classification methods have been widely investigated as a way to detect content of low credibility: fake news, social media bots, propaganda, etc. Quite accurate models (likely based on deep neural networks) help in moderating public electronic platforms and often cause content creators to face rejection of their submissions or removal of already published texts. Having the incentive to evade further detection, content creators try to come up with a slightly modified version of the text (known as an attack with an adversarial example) that exploit the weaknesses of classifiers and result in a different output. Here we introduce BODEGA: a benchmark for testing both victim models and attack methods on four misinformation detection tasks in an evaluation framework designed to simulate real use-cases of content moderation. We also systematically test the robustness of popular text classifiers against available attacking techniques and discover that, indeed, in some cases barely significant changes in input text can mislead the models. We openly share the BODEGA code and data in hope of enhancing the comparability and replicability of further research in this area.
    Partial Neural Optimal Transport. (arXiv:2303.07988v1 [cs.LG])
    We propose a novel neural method to compute partial optimal transport (OT) maps, i.e., OT maps between parts of measures of the specified masses. We test our partial neural optimal transport algorithm on synthetic examples.
    Good Neighbors Are All You Need for Chinese Grapheme-to-Phoneme Conversion. (arXiv:2303.07726v1 [cs.CL])
    Most Chinese Grapheme-to-Phoneme (G2P) systems employ a three-stage framework that first transforms input sequences into character embeddings, obtains linguistic information using language models, and then predicts the phonemes based on global context about the entire input sequence. However, linguistic knowledge alone is often inadequate. Language models frequently encode overly general structures of a sentence and fail to cover specific cases needed to use phonetic knowledge. Also, a handcrafted post-processing system is needed to address the problems relevant to the tone of the characters. However, the system exhibits inconsistency in the segmentation of word boundaries which consequently degrades the performance of the G2P system. To address these issues, we propose the Reinforcer that provides strong inductive bias for language models by emphasizing the phonological information between neighboring characters to help disambiguate pronunciations. Experimental results show that the Reinforcer boosts the cutting-edge architectures by a large margin. We also combine the Reinforcer with a large-scale pre-trained model and demonstrate the validity of using neighboring context in knowledge transfer scenarios.
    A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition. (arXiv:2303.08027v1 [eess.AS])
    As a common way of emotion signaling via non-linguistic vocalizations, vocal burst (VB) plays an important role in daily social interaction. Understanding and modeling human vocal bursts are indispensable for developing robust and general artificial intelligence. Exploring computational approaches for understanding vocal bursts is attracting increasing research attention. In this work, we propose a hierarchical framework, based on chain regression models, for affective recognition from VBs, that explicitly considers multiple relationships: (i) between emotional states and diverse cultures; (ii) between low-dimensional (arousal & valence) and high-dimensional (10 emotion classes) emotion spaces; and (iii) between various emotion classes within the high-dimensional space. To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules. The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE'' tasks. Experimental results based on the ACII Challenge 2022 dataset demonstrate the superior performance of the proposed system and the effectiveness of considering multiple relationships using hierarchical regression chain models.
    Large statistical learning models effectively forecast diverse chaotic systems. (arXiv:2303.08011v1 [cs.LG])
    Chaos and unpredictability are traditionally synonymous, yet recent advances in statistical forecasting suggest that large machine learning models can derive unexpected insight from extended observation of complex systems. Here, we study the forecasting of chaos at scale, by performing a large-scale comparison of 24 representative state-of-the-art multivariate forecasting methods on a crowdsourced database of 135 distinct low-dimensional chaotic systems. We find that large, domain-agnostic time series forecasting methods based on artificial neural networks consistently exhibit strong forecasting performance, in some cases producing accurate predictions lasting for dozens of Lyapunov times. Best-in-class results for forecasting chaos are achieved by recently-introduced hierarchical neural basis function models, though even generic transformers and recurrent neural networks perform strongly. However, physics-inspired hybrid methods like neural ordinary equations and reservoir computers contain inductive biases conferring greater data efficiency and lower training times in data-limited settings. We observe consistent correlation across all methods despite their widely-varying architectures, as well as universal structure in how predictions decay over long time intervals. Our results suggest that a key advantage of modern forecasting methods stems not from their architectural details, but rather from their capacity to learn the large-scale structure of chaotic attractors.
    Context Normalization for Robust Image Classification. (arXiv:2303.07651v1 [cs.CV])
    Normalization is a pre-processing step that converts the data into a more usable representation. As part of the deep neural networks (DNNs), the batch normalization (BN) technique uses normalization to address the problem of internal covariate shift. It can be packaged as general modules, which have been extensively integrated into various DNNs, to stabilize and accelerate training, presumably leading to improved generalization. However, the effect of BN is dependent on the mini-batch size and it does not take into account any groups or clusters that may exist in the dataset when estimating population statistics. This study proposes a new normalization technique, called context normalization, for image data. This approach adjusts the scaling of features based on the characteristics of each sample, which improves the model's convergence speed and performance by adapting the data values to the context of the target task. The effectiveness of context normalization is demonstrated on various datasets, and its performance is compared to other standard normalization techniques.
    Text-to-image Diffusion Model in Generative AI: A Survey. (arXiv:2303.07909v1 [cs.CV])
    This survey reviews text-to-image diffusion models in the context that diffusion models have emerged to be popular for a wide range of generative tasks. As a self-contained work, this survey starts with a brief introduction of how a basic diffusion model works for image synthesis, followed by how condition or guidance improves learning. Based on that, we present a review of state-of-the-art methods on text-conditioned image synthesis, i.e., text-to-image. We further summarize applications beyond text-to-image generation: text-guided creative generation and text-guided image editing. Beyond the progress made so far, we discuss existing challenges and promising future directions.
    Demographic Parity Inspector: Fairness Audits via the Explanation Space. (arXiv:2303.08040v1 [cs.LG])
    Even if deployed with the best intentions, machine learning methods can perpetuate, amplify or even create social biases. Measures of (un-)fairness have been proposed as a way to gauge the (non-)discriminatory nature of machine learning models. However, proxies of protected attributes causing discriminatory effects remain challenging to address. In this work, we propose a new algorithmic approach that measures group-wise demographic parity violations and allows us to inspect the causes of inter-group discrimination. Our method relies on the novel idea of measuring the dependence of a model on the protected attribute based on the explanation space, an informative space that allows for more sensitive audits than the primary space of input data or prediction distributions, and allowing for the assertion of theoretical demographic parity auditing guarantees. We provide a mathematical analysis, synthetic examples, and experimental evaluation of real-world data. We release an open-source Python package with methods, routines, and tutorials.
    Best arm identification in rare events. (arXiv:2303.07627v1 [cs.LG])
    We consider the best arm identification problem in the stochastic multi-armed bandit framework where each arm has a tiny probability of realizing large rewards while with overwhelming probability the reward is zero. A key application of this framework is in online advertising where click rates of advertisements could be a fraction of a single percent and final conversion to sales, while highly profitable, may again be a small fraction of the click rates. Lately, algorithms for BAI problems have been developed that minimise sample complexity while providing statistical guarantees on the correct arm selection. As we observe, these algorithms can be computationally prohibitive. We exploit the fact that the reward process for each arm is well approximated by a Compound Poisson process to arrive at algorithms that are faster, with a small increase in sample complexity. We analyze the problem in an asymptotic regime as rarity of reward occurrence reduces to zero, and reward amounts increase to infinity. This helps illustrate the benefits of the proposed algorithm. It also sheds light on the underlying structure of the optimal BAI algorithms in the rare event setting.
    Bayes Complexity of Learners vs Overfitting. (arXiv:2303.07874v1 [cs.LG])
    We introduce a new notion of complexity of functions and we show that it has the following properties: (i) it governs a PAC Bayes-like generalization bound, (ii) for neural networks it relates to natural notions of complexity of functions (such as the variation), and (iii) it explains the generalization gap between neural networks and linear schemes. While there is a large set of papers which describes bounds that have each such property in isolation, and even some that have two, as far as we know, this is a first notion that satisfies all three of them. Moreover, in contrast to previous works, our notion naturally generalizes to neural networks with several layers. Even though the computation of our complexity is nontrivial in general, an upper-bound is often easy to derive, even for higher number of layers and functions with structure, such as period functions. An upper-bound we derive allows to show a separation in the number of samples needed for good generalization between 2 and 4-layer neural networks for periodic functions.
    WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning. (arXiv:2301.04488v2 [cs.SD] UPDATED)
    Although deep learning has revolutionized music generation, existing methods for structured melody generation follow an end-to-end left-to-right note-by-note generative paradigm and treat each note equally. Here, we present WuYun, a knowledge-enhanced deep learning architecture for improving the structure of generated melodies, which first generates the most structurally important notes to construct a melodic skeleton and subsequently infills it with dynamically decorative notes into a full-fledged melody. Specifically, we use music domain knowledge to extract melodic skeletons and employ sequence learning to reconstruct them, which serve as additional knowledge to provide auxiliary guidance for the melody generation process. We demonstrate that WuYun can generate melodies with better long-term structure and musicality and outperforms other state-of-the-art methods by 0.51 on average on all subjective evaluation metrics. Our study provides a multidisciplinary lens to design melodic hierarchical structures and bridge the gap between data-driven and knowledge-based approaches for numerous music generation tasks.
    Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers. (arXiv:2303.07991v1 [cs.CL])
    Long-sequence transformers are designed to improve the representation of longer texts by language models and their performance on downstream document-level tasks. However, not much is understood about the quality of token-level predictions in long-form models. We investigate the performance of such architectures in the context of document classification with unsupervised rationale extraction. We find standard soft attention methods to perform significantly worse when combined with the Longformer language model. We propose a compositional soft attention architecture that applies RoBERTa sentence-wise to extract plausible rationales at the token-level. We find this method to significantly outperform Longformer-driven baselines on sentiment classification datasets, while also exhibiting significantly lower runtimes.
    X-Former: In-Memory Acceleration of Transformers. (arXiv:2303.07470v1 [cs.LG])
    Transformers have achieved great success in a wide variety of natural language processing (NLP) tasks due to the attention mechanism, which assigns an importance score for every word relative to other words in a sequence. However, these models are very large, often reaching hundreds of billions of parameters, and therefore require a large number of DRAM accesses. Hence, traditional deep neural network (DNN) accelerators such as GPUs and TPUs face limitations in processing Transformers efficiently. In-memory accelerators based on non-volatile memory promise to be an effective solution to this challenge, since they provide high storage density while performing massively parallel matrix vector multiplications within memory arrays. However, attention score computations, which are frequently used in Transformers (unlike CNNs and RNNs), require matrix vector multiplications (MVM) where both operands change dynamically for each input. As a result, conventional NVM-based accelerators incur high write latency and write energy when used for Transformers, and further suffer from the low endurance of most NVM technologies. To address these challenges, we present X-Former, a hybrid in-memory hardware accelerator that consists of both NVM and CMOS processing elements to execute transformer workloads efficiently. To improve the hardware utilization of X-Former, we also propose a sequence blocking dataflow, which overlaps the computations of the two processing elements and reduces execution time. Across several benchmarks, we show that X-Former achieves upto 85x and 7.5x improvements in latency and energy over a NVIDIA GeForce GTX 1060 GPU and upto 10.7x and 4.6x improvements in latency and energy over a state-of-the-art in-memory NVM accelerator.
    Kinematic Data-Based Action Segmentation for Surgical Applications. (arXiv:2303.07814v1 [cs.CV])
    Action segmentation is a challenging task in high-level process analysis, typically performed on video or kinematic data obtained from various sensors. In the context of surgical procedures, action segmentation is critical for workflow analysis algorithms. This work presents two contributions related to action segmentation on kinematic data. Firstly, we introduce two multi-stage architectures, MS-TCN-BiLSTM and MS-TCN-BiGRU, specifically designed for kinematic data. The architectures consist of a prediction generator with intra-stage regularization and Bidirectional LSTM or GRU-based refinement stages. Secondly, we propose two new data augmentation techniques, World Frame Rotation and Horizontal-Flip, which utilize the strong geometric structure of kinematic data to improve algorithm performance and robustness. We evaluate our models on three datasets of surgical suturing tasks: the Variable Tissue Simulation (VTS) Dataset and the newly introduced Bowel Repair Simulation (BRS) Dataset, both of which are open surgery simulation datasets collected by us, as well as the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a well-known benchmark in robotic surgery. Our methods achieve state-of-the-art performance on all benchmark datasets and establish a strong baseline for the BRS dataset.
    Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection. (arXiv:2303.08019v1 [eess.AS])
    With the global population aging rapidly, Alzheimer's disease (AD) is particularly prominent in older adults, which has an insidious onset and leads to a gradual, irreversible deterioration in cognitive domains (memory, communication, etc.). Speech-based AD detection opens up the possibility of widespread screening and timely disease intervention. Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations. This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features. Based on these features, the paper also proposes a novel task-oriented approach by modeling the relationship between the participants' description and the cognitive task. Experiments are carried out on the ADReSS dataset in a binary classification setup, and models are evaluated on the unseen test set. Results and comparison with recent literature demonstrate the efficiency and superior performance of proposed acoustic, linguistic and task-oriented methods. The findings also show the importance of semantic and syntactic information, and feasibility of automation and generalization with the promising audio-only and task-oriented methods for the AD detection task.
    On the Connection between Concept Drift and Uncertainty in Industrial Artificial Intelligence. (arXiv:2303.07940v1 [cs.LG])
    AI-based digital twins are at the leading edge of the Industry 4.0 revolution, which are technologically empowered by the Internet of Things and real-time data analysis. Information collected from industrial assets is produced in a continuous fashion, yielding data streams that must be processed under stringent timing constraints. Such data streams are usually subject to non-stationary phenomena, causing that the data distribution of the streams may change, and thus the knowledge captured by models used for data analysis may become obsolete (leading to the so-called concept drift effect). The early detection of the change (drift) is crucial for updating the model's knowledge, which is challenging especially in scenarios where the ground truth associated to the stream data is not readily available. Among many other techniques, the estimation of the model's confidence has been timidly suggested in a few studies as a criterion for detecting drifts in unsupervised settings. The goal of this manuscript is to confirm and expose solidly the connection between the model's confidence in its output and the presence of a concept drift, showcasing it experimentally and advocating for a major consideration of uncertainty estimation in comparative studies to be reported in the future.
    Recent Advances and Applications of Machine Learning in Experimental Solid Mechanics: A Review. (arXiv:2303.07647v1 [cs.LG])
    For many decades, experimental solid mechanics has played a crucial role in characterizing and understanding the mechanical properties of natural and novel materials. Recent advances in machine learning (ML) provide new opportunities for the field, including experimental design, data analysis, uncertainty quantification, and inverse problems. As the number of papers published in recent years in this emerging field is exploding, it is timely to conduct a comprehensive and up-to-date review of recent ML applications in experimental solid mechanics. Here, we first provide an overview of common ML algorithms and terminologies that are pertinent to this review, with emphasis placed on physics-informed and physics-based ML methods. Then, we provide thorough coverage of recent ML applications in traditional and emerging areas of experimental mechanics, including fracture mechanics, biomechanics, nano- and micro-mechanics, architected materials, and 2D material. Finally, we highlight some current challenges of applying ML to multi-modality and multi-fidelity experimental datasets and propose several future research directions. This review aims to provide valuable insights into the use of ML methods as well as a variety of examples for researchers in solid mechanics to integrate into their experiments.
    Window-Based Early-Exit Cascades for Uncertainty Estimation: When Deep Ensembles are More Efficient than Single Models. (arXiv:2303.08010v1 [cs.LG])
    Deep Ensembles are a simple, reliable, and effective method of improving both the predictive performance and uncertainty estimates of deep learning approaches. However, they are widely criticised as being computationally expensive, due to the need to deploy multiple independent models. Recent work has challenged this view, showing that for predictive accuracy, ensembles can be more computationally efficient (at inference) than scaling single models within an architecture family. This is achieved by cascading ensemble members via an early-exit approach. In this work, we investigate extending these efficiency gains to tasks related to uncertainty estimation. As many such tasks, e.g. selective classification, are binary classification, our key novel insight is to only pass samples within a window close to the binary decision boundary to later cascade stages. Experiments on ImageNet-scale data across a number of network architectures and uncertainty tasks show that the proposed window-based early-exit approach is able to achieve a superior uncertainty-computation trade-off compared to scaling single models. For example, a cascaded EfficientNet-B2 ensemble is able to achieve similar coverage at 5% risk as a single EfficientNet-B4 with <30% the number of MACs. We also find that cascades/ensembles give more reliable improvements on OOD data vs scaling models up. Code for this work is available at: https://github.com/Guoxoug/window-early-exit.
    Improving Accented Speech Recognition with Multi-Domain Training. (arXiv:2303.07924v1 [cs.LG])
    Thanks to the rise of self-supervised learning, automatic speech recognition (ASR) systems now achieve near-human performance on a wide variety of datasets. However, they still lack generalization capability and are not robust to domain shifts like accent variations. In this work, we use speech audio representing four different French accents to create fine-tuning datasets that improve the robustness of pre-trained ASR models. By incorporating various accents in the training set, we obtain both in-domain and out-of-domain improvements. Our numerical experiments show that we can reduce error rates by up to 25% (relative) on African and Belgian accents compared to single-domain training while keeping a good performance on standard French.
    Teacher-Student Knowledge Distillation for Radar Perception on Embedded Accelerators. (arXiv:2303.07586v1 [cs.AI])
    Many radar signal processing methodologies are being developed for critical road safety perception tasks. Unfortunately, these signal processing algorithms are often poorly suited to run on embedded hardware accelerators used in automobiles. Conversely, end-to-end machine learning (ML) approaches better exploit the performance gains brought by specialized accelerators. In this paper, we propose a teacher-student knowledge distillation approach for low-level radar perception tasks. We utilize a hybrid model for stationary object detection as a teacher to train an end-to-end ML student model. The student can efficiently harness embedded compute for real-time deployment. We demonstrate that the proposed student model runs at speeds 100x faster than the teacher model.
    Can neural networks do arithmetic? A survey on the elementary numerical skills of state-of-the-art deep learning models. (arXiv:2303.07735v1 [cs.AI])
    Creating learning models that can exhibit sophisticated reasoning skills is one of the greatest challenges in deep learning research, and mathematics is rapidly becoming one of the target domains for assessing scientific progress in this direction. In the past few years there has been an explosion of neural network architectures, data sets, and benchmarks specifically designed to tackle mathematical problems, reporting notable success in disparate fields such as automated theorem proving, numerical integration, and discovery of new conjectures or matrix multiplication algorithms. However, despite these impressive achievements it is still unclear whether deep learning models possess an elementary understanding of quantities and symbolic numbers. In this survey we critically examine the recent literature, concluding that even state-of-the-art architectures often fall short when probed with relatively simple tasks designed to test basic numerical and arithmetic knowledge.
    GANN: Graph Alignment Neural Network for Semi-Supervised Learning. (arXiv:2303.07778v1 [cs.LG])
    Graph neural networks (GNNs) have been widely investigated in the field of semi-supervised graph machine learning. Most methods fail to exploit adequate graph information when labeled data is limited, leading to the problem of oversmoothing. To overcome this issue, we propose the Graph Alignment Neural Network (GANN), a simple and effective graph neural architecture. A unique learning algorithm with three alignment rules is proposed to thoroughly explore hidden information for insufficient labels. Firstly, to better investigate attribute specifics, we suggest the feature alignment rule to align the inner product of both the attribute and embedding matrices. Secondly, to properly utilize the higher-order neighbor information, we propose the cluster center alignment rule, which involves aligning the inner product of the cluster center matrix with the unit matrix. Finally, to get reliable prediction results with few labels, we establish the minimum entropy alignment rule by lining up the prediction probability matrix with its sharpened result. Extensive studies on graph benchmark datasets demonstrate that GANN can achieve considerable benefits in semi-supervised node classification and outperform state-of-the-art competitors.
    ICICLE: Interpretable Class Incremental Continual Learning. (arXiv:2303.07811v1 [cs.LG])
    Continual learning enables incremental learning of new tasks without forgetting those previously learned, resulting in positive knowledge transfer that can enhance performance on both new and old tasks. However, continual learning poses new challenges for interpretability, as the rationale behind model predictions may change over time, leading to interpretability concept drift. We address this problem by proposing Interpretable Class-InCremental LEarning (ICICLE), an exemplar-free approach that adopts a prototypical part-based approach. It consists of three crucial novelties: interpretability regularization that distills previously learned concepts while preserving user-friendly positive reasoning; proximity-based prototype initialization strategy dedicated to the fine-grained setting; and task-recency bias compensation devoted to prototypical parts. Our experimental results demonstrate that ICICLE reduces the interpretability concept drift and outperforms the existing exemplar-free methods of common class-incremental learning when applied to concept-based models. We make the code available.
    Constrained Adversarial Learning and its applicability to Automated Software Testing: a systematic review. (arXiv:2303.07546v1 [cs.SE])
    Every novel technology adds hidden vulnerabilities ready to be exploited by a growing number of cyber-attacks. Automated software testing can be a promising solution to quickly analyze thousands of lines of code by generating and slightly modifying function-specific testing data to encounter a multitude of vulnerabilities and attack vectors. This process draws similarities to the constrained adversarial examples generated by adversarial learning methods, so there could be significant benefits to the integration of these methods in automated testing tools. Therefore, this systematic review is focused on the current state-of-the-art of constrained data generation methods applied for adversarial learning and software testing, aiming to guide researchers and developers to enhance testing tools with adversarial learning methods and improve the resilience and robustness of their digital systems. The found constrained data generation applications for adversarial machine learning were systematized, and the advantages and limitations of approaches specific for software testing were thoroughly analyzed, identifying research gaps and opportunities to improve testing tools with adversarial attack methods.
    Koos Classification of Vestibular Schwannoma via Image Translation-Based Unsupervised Cross-Modality Domain Adaptation. (arXiv:2303.07674v1 [eess.IV])
    The Koos grading scale is a classification system for vestibular schwannoma (VS) used to characterize the tumor and its effects on adjacent brain structures. The Koos classification captures many of the characteristics of treatment deci-sions and is often used to determine treatment plans. Although both contrast-enhanced T1 (ceT1) scanning and high-resolution T2 (hrT2) scanning can be used for Koos Classification, hrT2 scanning is gaining interest because of its higher safety and cost-effectiveness. However, in the absence of annotations for hrT2 scans, deep learning methods often inevitably suffer from performance deg-radation due to unsupervised learning. If ceT1 scans and their annotations can be used for unsupervised learning of hrT2 scans, the performance of Koos classifi-cation using unlabeled hrT2 scans will be greatly improved. In this regard, we propose an unsupervised cross-modality domain adaptation method based on im-age translation by transforming annotated ceT1 scans into hrT2 modality and us-ing their annotations to achieve supervised learning of hrT2 modality. Then, the VS and 7 adjacent brain structures related to Koos classification in hrT2 scans were segmented. Finally, handcrafted features are extracted from the segmenta-tion results, and Koos grade is classified using a random forest classifier. The proposed method received rank 1 on the Koos classification task of the Cross-Modality Domain Adaptation (crossMoDA 2022) challenge, with Macro-Averaged Mean Absolute Error (MA-MAE) of 0.2148 for the validation set and 0.26 for the test set.
    Fast exploration and learning of latent graphs with aliased observations. (arXiv:2303.07397v1 [cs.LG])
    Consider this scenario: an agent navigates a latent graph by performing actions that take it from one node to another. The chosen action determines the probability distribution over the next visited node. At each node, the agent receives an observation, but this observation is not unique, so it does not identify the node, making the problem aliased. The purpose of this work is to provide a policy that approximately maximizes exploration efficiency (i.e., how well the graph is recovered for a given exploration budget). In the unaliased case, we show improved performance w.r.t. state-of-the-art reinforcement learning baselines. For the aliased case we are not aware of suitable baselines and instead show faster recovery w.r.t. a random policy for a wide variety of topologies, and exponentially faster recovery than a random policy for challenging topologies. We dub the algorithm eFeX (from eFficient eXploration).
    Sinkhorn-Flow: Predicting Probability Mass Flow in Dynamical Systems Using Optimal Transport. (arXiv:2303.07675v1 [cs.LG])
    Predicting how distributions over discrete variables vary over time is a common task in time series forecasting. But whereas most approaches focus on merely predicting the distribution at subsequent time steps, a crucial piece of information in many settings is to determine how this probability mass flows between the different elements over time. We propose a new approach to predicting such mass flow over time using optimal transport. Specifically, we propose a generic approach to predicting transport matrices in end-to-end deep learning systems, replacing the standard softmax operation with Sinkhorn iterations. We apply our approach to the task of predicting how communities will evolve over time in social network settings, and show that the approach improves substantially over alternative prediction methods. We specifically highlight results on the task of predicting faction evolution in Ukrainian parliamentary voting.
    BoundaryCAM: A Boundary-based Refinement Framework for Weakly Supervised Semantic Segmentation of Medical Images. (arXiv:2303.07853v1 [cs.CV])
    Weakly Supervised Semantic Segmentation (WSSS) with only image-level supervision is a promising approach to deal with the need for Segmentation networks, especially for generating a large number of pixel-wise masks in a given dataset. However, most state-of-the-art image-level WSSS techniques lack an understanding of the geometric features embedded in the images since the network cannot derive any object boundary information from just image-level labels. We define a boundary here as the line separating an object and its background, or two different objects. To address this drawback, we propose our novel BoundaryCAM framework, which deploys state-of-the-art class activation maps combined with various post-processing techniques in order to achieve fine-grained higher-accuracy segmentation masks. To achieve this, we investigate a state-of-the-art unsupervised semantic segmentation network that can be used to construct a boundary map, which enables BoundaryCAM to predict object locations with sharper boundaries. By applying our method to WSSS predictions, we were able to achieve up to 10% improvements even to the benefit of the current state-of-the-art WSSS methods for medical imaging. The framework is open-source and accessible online at https://github.com/bharathprabakaran/BoundaryCAM.
    Fractional dynamics foster deep learning of COPD stage prediction. (arXiv:2303.07537v1 [cs.LG])
    Chronic obstructive pulmonary disease (COPD) is one of the leading causes of death worldwide. Current COPD diagnosis (i.e., spirometry) could be unreliable because the test depends on an adequate effort from the tester and testee. Moreover, the early diagnosis of COPD is challenging. We address COPD detection by constructing two novel physiological signals datasets (4432 records from 54 patients in the WestRo COPD dataset and 13824 medical records from 534 patients in the WestRo Porti COPD dataset). The authors demonstrate their complex coupled fractal dynamical characteristics and perform a fractional-order dynamics deep learning analysis to diagnose COPD. The authors found that the fractional-order dynamical modeling can extract distinguishing signatures from the physiological signals across patients with all COPD stages from stage 0 (healthy) to stage 4 (very severe). They use the fractional signatures to develop and train a deep neural network that predicts COPD stages based on the input features (such as thorax breathing effort, respiratory rate, or oxygen saturation). The authors show that the fractional dynamic deep learning model (FDDLM) achieves a COPD prediction accuracy of 98.66% and can serve as a robust alternative to spirometry. The FDDLM also has high accuracy when validated on a dataset with different physiological signals.
    Domain Generalization via Nuclear Norm Regularization. (arXiv:2303.07527v1 [cs.LG])
    The ability to generalize to unseen domains is crucial for machine learning systems deployed in the real world, especially when we only have data from limited training domains. In this paper, we propose a simple and effective regularization method based on the nuclear norm of the learned features for domain generalization. Intuitively, the proposed regularizer mitigates the impacts of environmental features and encourages learning domain-invariant features. Theoretically, we provide insights into why nuclear norm regularization is more effective compared to ERM and alternative regularization methods. Empirically, we conduct extensive experiments on both synthetic and real datasets. We show that nuclear norm regularization achieves strong performance compared to baselines in a wide range of domain generalization tasks. Moreover, our regularizer is broadly applicable with various methods such as ERM and SWAD with consistently improved performance, e.g., 1.7% and 0.9% test accuracy improvements respectively on the DomainBed benchmark.
    RE-MOVE: An Adaptive Policy Design Approach for Dynamic Environments via Language-Based Feedback. (arXiv:2303.07622v1 [cs.RO])
    Reinforcement learning-based policies for continuous control robotic navigation tasks often fail to adapt to changes in the environment during real-time deployment, which may result in catastrophic failures. To address this limitation, we propose a novel approach called RE-MOVE (\textbf{RE}quest help and \textbf{MOVE} on), which uses language-based feedback to adjust trained policies to real-time changes in the environment. In this work, we enable the trained policy to decide \emph{when to ask for feedback} and \emph{how to incorporate feedback into trained policies}. RE-MOVE incorporates epistemic uncertainty to determine the optimal time to request feedback from humans and uses language-based feedback for real-time adaptation. We perform extensive synthetic and real-world evaluations to demonstrate the benefits of our proposed approach in several test-time dynamic navigation scenarios. Our approach enable robots to learn from human feedback and adapt to previously unseen adversarial situations.
    Traffic4cast at NeurIPS 2022 -- Predict Dynamics along Graph Edges from Sparse Node Data: Whole City Traffic and ETA from Stationary Vehicle Detectors. (arXiv:2303.07758v1 [cs.LG])
    The global trends of urbanization and increased personal mobility force us to rethink the way we live and use urban space. The Traffic4cast competition series tackles this problem in a data-driven way, advancing the latest methods in machine learning for modeling complex spatial systems over time. In this edition, our dynamic road graph data combine information from road maps, $10^{12}$ probe data points, and stationary vehicle detectors in three cities over the span of two years. While stationary vehicle detectors are the most accurate way to capture traffic volume, they are only available in few locations. Traffic4cast 2022 explores models that have the ability to generalize loosely related temporal vertex data on just a few nodes to predict dynamic future traffic states on the edges of the entire road graph. In the core challenge, participants are invited to predict the likelihoods of three congestion classes derived from the speed levels in the GPS data for the entire road graph in three cities 15 min into the future. We only provide vehicle count data from spatially sparse stationary vehicle detectors in these three cities as model input for this task. The data are aggregated in 15 min time bins for one hour prior to the prediction time. For the extended challenge, participants are tasked to predict the average travel times on super-segments 15 min into the future - super-segments are longer sequences of road segments in the graph. The competition results provide an important advance in the prediction of complex city-wide traffic states just from publicly available sparse vehicle data and without the need for large amounts of real-time floating vehicle data.
    Generalised Scale-Space Properties for Probabilistic Diffusion Models. (arXiv:2303.07900v1 [eess.IV])
    Probabilistic diffusion models enjoy increasing popularity in the deep learning community. They generate convincing samples from a learned distribution of input images with a wide field of practical applications. Originally, these approaches were motivated from drift-diffusion processes, but these origins find less attention in recent, practice-oriented publications. We investigate probabilistic diffusion models from the viewpoint of scale-space research and show that they fulfil generalised scale-space properties on evolving probability distributions. Moreover, we discuss similarities and differences between interpretations of the physical core concept of drift-diffusion in the deep learning and model-based world. To this end, we examine relations of probabilistic diffusion to osmosis filters.
    Forecasting COVID-19 Infections in Gulf Cooperation Council (GCC) Countries using Machine Learning. (arXiv:2303.07600v1 [cs.LG])
    COVID-19 has infected more than 68 million people worldwide since it was first detected about a year ago. Machine learning time series models have been implemented to forecast COVID-19 infections. In this paper, we develop time series models for the Gulf Cooperation Council (GCC) countries using the public COVID-19 dataset from Johns Hopkins. The dataset set includes the one-year cumulative COVID-19 cases between 22/01/2020 to 22/01/2021. We developed different models for the countries under study based on the spatial distribution of the infection data. Our experimental results show that the developed models can forecast COVID-19 infections with high precision.
    From Discrimination to Generation: Knowledge Graph Completion with Generative Transformer. (arXiv:2202.02113v7 [cs.CL] UPDATED)
    Knowledge graph completion aims to address the problem of extending a KG with missing triples. In this paper, we provide an approach GenKGC, which converts knowledge graph completion to sequence-to-sequence generation task with the pre-trained language model. We further introduce relation-guided demonstration and entity-aware hierarchical decoding for better representation learning and fast inference. Experimental results on three datasets show that our approach can obtain better or comparable performance than baselines and achieve faster inference speed compared with previous methods with pre-trained language models. We also release a new large-scale Chinese knowledge graph dataset AliopenKG500 for research purpose. Code and datasets are available in https://github.com/zjunlp/PromptKG/tree/main/GenKGC.
    Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects. (arXiv:2211.02247v2 [eess.AS] UPDATED)
    We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording. All our models are trained in a self-supervised manner from an already-processed wet multitrack dataset with an effective data preprocessing method that alleviates the data scarcity of obtaining unprocessed dry data. We analyze the proposed encoder for the disentanglement capability of audio effects and also validate its performance for mixing style transfer through both objective and subjective evaluations. From the results, we show the proposed system not only converts the mixing style of multitrack audio close to a reference but is also robust with mixture-wise style transfer upon using a music source separation model.
    Masked Images Are Counterfactual Samples for Robust Fine-tuning. (arXiv:2303.03052v2 [cs.CV] UPDATED)
    Deep learning models are challenged by the distribution shift between the training data and test data. Recently, the large models pre-trained on diverse data demonstrate unprecedented robustness to various distribution shifts. However, fine-tuning on these models can lead to a trade-off between in-distribution (ID) performance and out-of-distribution (OOD) robustness. Existing methods for tackling this trade-off do not explicitly address the OOD robustness problem. In this paper, based on causal analysis on the aforementioned problems, we propose a novel fine-tuning method, which use masked images as counterfactual samples that help improving the robustness of the fine-tuning model. Specifically, we mask either the semantics-related or semantics-unrelated patches of the images based on class activation map to break the spurious correlation, and refill the masked patches with patches from other images. The resulting counterfactual samples are used in feature-based distillation with the pre-trained model. Extensive experiments verify that regularizing the fine-tuning with the proposed masked images can achieve a better trade-off between ID and OOD performance, surpassing previous methods on the OOD performance. Our code will be publicly available.
    Explanation Shift: Investigating Interactions between Models and Shifting Data Distributions. (arXiv:2303.08081v1 [cs.LG])
    As input data distributions evolve, the predictive performance of machine learning models tends to deteriorate. In practice, new input data tend to come without target labels. Then, state-of-the-art techniques model input data distributions or model prediction distributions and try to understand issues regarding the interactions between learned models and shifting distributions. We suggest a novel approach that models how explanation characteristics shift when affected by distribution shifts. We find that the modeling of explanation shifts can be a better indicator for detecting out-of-distribution model behaviour than state-of-the-art techniques. We analyze different types of distribution shifts using synthetic examples and real-world data sets. We provide an algorithmic method that allows us to inspect the interaction between data set features and learned models and compare them to the state-of-the-art. We release our methods in an open-source Python package, as well as the code used to reproduce our experiments.
    Reliable Beamforming at Terahertz Bands: Are Causal Representations the Way Forward?. (arXiv:2303.08017v1 [cs.IT])
    Future wireless services, such as the metaverse require high information rate, reliability, and low latency. Multi-user wireless systems can meet such requirements by utilizing the abundant terahertz bandwidth with a massive number of antennas, creating narrow beamforming solutions. However, existing solutions lack proper modeling of channel dynamics, resulting in inaccurate beamforming solutions in high-mobility scenarios. Herein, a dynamic, semantically aware beamforming solution is proposed for the first time, utilizing novel artificial intelligence algorithms in variational causal inference to compute the time-varying dynamics of the causal representation of multi-modal data and the beamforming. Simulations show that the proposed causality-guided approach for Terahertz (THz) beamforming outperforms classical MIMO beamforming techniques.
    Relphormer: Relational Graph Transformer for Knowledge Graph Representations. (arXiv:2205.10852v5 [cs.CL] UPDATED)
    Transformers have achieved remarkable performance in widespread fields, including natural language processing, computer vision and graph mining. However, vanilla Transformer architectures have not yielded promising improvements in the Knowledge Graph (KG) representations, where the translational distance paradigm dominates this area. Note that vanilla Transformer architectures struggle to capture the intrinsically heterogeneous structural and semantic information of knowledge graphs. To this end, we propose a new variant of Transformer for knowledge graph representations dubbed Relphormer. Specifically, we introduce Triple2Seq which can dynamically sample contextualized sub-graph sequences as the input to alleviate the heterogeneity issue. We propose a novel structure-enhanced self-attention mechanism to encode the relational information and keep the semantic information within entities and relations. Moreover, we utilize masked knowledge modeling for general knowledge graph representation learning, which can be applied to various KG-based tasks including knowledge graph completion, question answering, and recommendation. Experimental results on six datasets show that Relphormer can obtain better performance compared with baselines. Code is available in https://github.com/zjunlp/Relphormer.
    SPARF: Large-Scale Learning of 3D Sparse Radiance Fields from Few Input Images. (arXiv:2212.09100v2 [cs.CV] UPDATED)
    Recent advances in Neural Radiance Fields (NeRFs) treat the problem of novel view synthesis as Sparse Radiance Field (SRF) optimization using sparse voxels for efficient and fast rendering (plenoxels,InstantNGP). In order to leverage machine learning and adoption of SRFs as a 3D representation, we present SPARF, a large-scale ShapeNet-based synthetic dataset for novel view synthesis consisting of $\sim$ 17 million images rendered from nearly 40,000 shapes at high resolution (400 X 400 pixels). The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis and includes more than one million 3D-optimized radiance fields with multiple voxel resolutions. Furthermore, we propose a novel pipeline (SuRFNet) that learns to generate sparse voxel radiance fields from only few views. This is done by using the densely collected SPARF dataset and 3D sparse convolutions. SuRFNet employs partial SRFs from few/one images and a specialized SRF loss to learn to generate high-quality sparse voxel radiance fields that can be rendered from novel views. Our approach achieves state-of-the-art results in the task of unconstrained novel view synthesis based on few views on ShapeNet as compared to recent baselines. The SPARF dataset will be made public with the code and models on the project website https://abdullahamdi.com/sparf/ .
    NIERT: Accurate Numerical Interpolation through Unifying Scattered Data Representations using Transformer Encoder. (arXiv:2209.09078v3 [cs.LG] UPDATED)
    Interpolation for scattered data is a classical problem in numerical analysis, with a long history of theoretical and practical contributions. Recent advances have utilized deep neural networks to construct interpolators, exhibiting excellent and generalizable performance. However, they still fall short in two aspects: \textbf{1) inadequate representation learning}, resulting from separate embeddings of observed and target points in popular encoder-decoder frameworks and \textbf{2) limited generalization power}, caused by overlooking prior interpolation knowledge shared across different domains. To overcome these limitations, we present a \textbf{N}umerical \textbf{I}nterpolation approach using \textbf{E}ncoder \textbf{R}epresentation of \textbf{T}ransformers (called \textbf{NIERT}). On one hand, NIERT utilizes an encoder-only framework rather than the encoder-decoder structure. This way, NIERT can embed observed and target points into a unified encoder representation space, thus effectively exploiting the correlations among them and obtaining more precise representations. On the other hand, we propose to pre-train NIERT on large-scale synthetic mathematical functions to acquire prior interpolation knowledge, and transfer it to multiple interpolation domains with consistent performance gain. On both synthetic and real-world datasets, NIERT outperforms the existing approaches by a large margin, i.e., 4.3$\sim$14.3$\times$ lower MAE on TFRD subsets, and 1.7/1.8/8.7$\times$ lower MSE on Mathit/PhysioNet/PTV datasets. The source code of NIERT is available at https://github.com/DingShizhe/NIERT.
    Human-Inspired Framework to Accelerate Reinforcement Learning. (arXiv:2303.08115v1 [cs.LG])
    While deep reinforcement learning (RL) is becoming an integral part of good decision-making in data science, it is still plagued with sample inefficiency. This can be challenging when applying deep-RL in real-world environments where physical interactions are expensive and can risk system safety. To improve the sample efficiency of RL algorithms, this paper proposes a novel human-inspired framework that facilitates fast exploration and learning for difficult RL tasks. The main idea is to first provide the learning agent with simpler but similar tasks that gradually grow in difficulty and progress toward the main task. The proposed method requires no pre-training phase. Specifically, the learning of simpler tasks is only done for one iteration. The generated knowledge could be used by any transfer learning, including value transfer and policy transfer, to reduce the sample complexity while not adding to the computational complexity. So, it can be applied to any goal, environment, and reinforcement learning algorithm - both value-based methods and policy-based methods and both tabular methods and deep-RL methods. We have evaluated our proposed framework on both a simple Random Walk for illustration purposes and on more challenging optimal control problems with constraint. The experiments show the good performance of our proposed framework in improving the sample efficiency of RL-learning algorithms, especially when the main task is difficult.
    Regret Lower Bounds for Learning Linear Quadratic Gaussian Systems. (arXiv:2201.01680v3 [cs.LG] UPDATED)
    TWe establish regret lower bounds for adaptively controlling an unknown linear Gaussian system with quadratic costs. We combine ideas from experiment design, estimation theory and a perturbation bound of certain information matrices to derive regret lower bounds exhibiting scaling on the order of magnitude $\sqrt{T}$ in the time horizon $T$. Our bounds accurately capture the role of control-theoretic parameters and we are able to show that systems that are hard to control are also hard to learn to control; when instantiated to state feedback systems we recover the dimensional dependency of earlier work but with improved scaling with system-theoretic constants such as system costs and Gramians. Furthermore, we extend our results to a class of partially observed systems and demonstrate that systems with poor observability structure also are hard to learn to control.
    CNN-based Euler's Elastica Inpainting with Deep Energy and Deep Image Prior. (arXiv:2207.07921v2 [cs.CV] UPDATED)
    Euler's elastica constitute an appealing variational image inpainting model. It minimises an energy that involves the total variation as well as the level line curvature. These components are transparent and make it attractive for shape completion tasks. However, its gradient flow is a singular, anisotropic, and nonlinear PDE of fourth order, which is numerically challenging: It is difficult to find efficient algorithms that offer sharp edges and good rotation invariance. As a remedy, we design the first neural algorithm that simulates inpainting with Euler's Elastica. We use the deep energy concept which employs the variational energy as neural network loss. Furthermore, we pair it with a deep image prior where the network architecture itself acts as a prior. This yields better inpaintings by steering the optimisation trajectory closer to the desired solution. Our results are qualitatively on par with state-of-the-art algorithms on elastica-based shape completion. They combine good rotation invariance with sharp edges. Moreover, we benefit from the high efficiency and effortless parallelisation within a neural framework. Our neural elastica approach only requires 3x3 central difference stencils. It is thus much simpler than other well-performing algorithms for elastica inpainting. Last but not least, it is unsupervised as it requires no ground truth training data.
    Few-Shot Unlearning by Model Inversion. (arXiv:2205.15567v2 [cs.LG] UPDATED)
    We consider a practical scenario of machine unlearning to erase a target dataset, which causes unexpected behavior from the trained model. The target dataset is often assumed to be fully identifiable in a standard unlearning scenario. Such a flawless identification, however, is almost impossible if the training dataset is inaccessible at the time of unlearning. Unlike previous approaches requiring a complete set of targets, we consider few-shot unlearning scenario when only a few samples of target data are available. To this end, we formulate the few-shot unlearning problem specifying intentions behind the unlearning request (e.g., purely unlearning, mislabel correction, privacy protection), and we devise a straightforward framework that (i) retrieves a proxy of the training data via model inversion fully exploiting information available in the context of unlearning; (ii) adjusts the proxy according to the unlearning intention; and (iii) updates the model with the adjusted proxy. We demonstrate that our method using only a subset of target data can outperform the state-of-the-art unlearning methods even with a complete indication of target data.
    Relational Multi-Task Learning: Modeling Relations between Data and Tasks. (arXiv:2303.07666v1 [cs.LG])
    A key assumption in multi-task learning is that at the inference time the multi-task model only has access to a given data point but not to the data point's labels from other tasks. This presents an opportunity to extend multi-task learning to utilize data point's labels from other auxiliary tasks, and this way improves performance on the new task. Here we introduce a novel relational multi-task learning setting where we leverage data point labels from auxiliary tasks to make more accurate predictions on the new task. We develop MetaLink, where our key innovation is to build a knowledge graph that connects data points and tasks and thus allows us to leverage labels from auxiliary tasks. The knowledge graph consists of two types of nodes: (1) data nodes, where node features are data embeddings computed by the neural network, and (2) task nodes, with the last layer's weights for each task as node features. The edges in this knowledge graph capture data-task relationships, and the edge label captures the label of a data point on a particular task. Under MetaLink, we reformulate the new task as a link label prediction problem between a data node and a task node. The MetaLink framework provides flexibility to model knowledge transfer from auxiliary task labels to the task of interest. We evaluate MetaLink on 6 benchmark datasets in both biochemical and vision domains. Experiments demonstrate that MetaLink can successfully utilize the relations among different tasks, outperforming the state-of-the-art methods under the proposed relational multi-task learning setting, with up to 27% improvement in ROC AUC.
    Contrastive Identity-Aware Learning for Multi-Agent Value Decomposition. (arXiv:2211.12712v2 [cs.LG] UPDATED)
    Value Decomposition (VD) aims to deduce the contributions of agents for decentralized policies in the presence of only global rewards, and has recently emerged as a powerful credit assignment paradigm for tackling cooperative Multi-Agent Reinforcement Learning (MARL) problems. One of the main challenges in VD is to promote diverse behaviors among agents, while existing methods directly encourage the diversity of learned agent networks with various strategies. However, we argue that these dedicated designs for agent networks are still limited by the indistinguishable VD network, leading to homogeneous agent behaviors and thus downgrading the cooperation capability. In this paper, we propose a novel Contrastive Identity-Aware learning (CIA) method, explicitly boosting the credit-level distinguishability of the VD network to break the bottleneck of multi-agent diversity. Specifically, our approach leverages contrastive learning to maximize the mutual information between the temporal credits and identity representations of different agents, encouraging the full expressiveness of credit assignment and further the emergence of individualities. The algorithm implementation of the proposed CIA module is simple yet effective that can be readily incorporated into various VD architectures. Experiments on the SMAC benchmarks and across different VD backbones demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code is available at https://github.com/liushunyu/CIA.
    DC-Art-GAN: Stable Procedural Content Generation using DC-GANs for Digital Art. (arXiv:2209.02847v2 [cs.CV] UPDATED)
    Art is an artistic method of using digital technologies as a part of the generative or creative process. With the advent of digital currency and NFTs (Non-Fungible Token), the demand for digital art is growing aggressively. In this manuscript, we advocate the concept of using deep generative networks with adversarial training for a stable and variant art generation. The work mainly focuses on using the Deep Convolutional Generative Adversarial Network (DC-GAN) and explores the techniques to address the common pitfalls in GAN training. We compare various architectures and designs of DC-GANs to arrive at a recommendable design choice for a stable and realistic generation. The main focus of the work is to generate realistic images that do not exist in reality but are synthesised from random noise by the proposed model. We provide visual results of generated animal face images (some pieces of evidence showing a blend of species) along with recommendations for training, architecture and design choices. We also show how training image preprocessing plays a massive role in GAN training.
    Incremental Class Learning using Variational Autoencoders with Similarity Learning. (arXiv:2110.01303v3 [cs.LG] UPDATED)
    Catastrophic forgetting in neural networks during incremental learning remains a challenging problem. Previous research investigated catastrophic forgetting in fully connected networks, with some earlier work exploring activation functions and learning algorithms. Applications of neural networks have been extended to include similarity learning. Understanding how similarity learning loss functions would be affected by catastrophic forgetting is of significant interest. Our research investigates catastrophic forgetting for four well-known similarity-based loss functions during incremental class learning. The loss functions are Angular, Contrastive, Center, and Triplet loss. Our results show that the catastrophic forgetting rate differs across loss functions on multiple datasets. The Angular loss was least affected, followed by Contrastive, Triplet loss, and Center loss with good mining techniques. We implemented three existing incremental learning techniques, iCaRL, EWC, and EBLL. We further proposed a novel technique using Variational Autoencoders (VAEs) to generate representation as exemplars passed through the network's intermediate layers. Our method outperformed three existing state-of-the-art techniques. We show that one does not require stored images (exemplars) for incremental learning with similarity learning. The generated representations from VAEs help preserve regions of the embedding space used by prior knowledge so that new knowledge does not ``overwrite'' it.
    Transfer Learning for Real-time Deployment of a Screening Tool for Depression Detection Using Actigraphy. (arXiv:2303.07847v1 [cs.LG])
    Automated depression screening and diagnosis is a highly relevant problem today. There are a number of limitations of the traditional depression detection methods, namely, high dependence on clinicians and biased self-reporting. In recent years, research has suggested strong potential in machine learning (ML) based methods that make use of the user's passive data collected via wearable devices. However, ML is data hungry. Especially in the healthcare domain primary data collection is challenging. In this work, we present an approach based on transfer learning, from a model trained on a secondary dataset, for the real time deployment of the depression screening tool based on the actigraphy data of users. This approach enables machine learning modelling even with limited primary data samples. A modified version of leave one out cross validation approach performed on the primary set resulted in mean accuracy of 0.96, where in each iteration one subject's data from the primary set was set aside for testing.
    Sensitive Region-based Metamorphic Testing Framework using Explainable AI. (arXiv:2303.07580v1 [cs.LG])
    Deep Learning (DL) is one of the most popular research topics in machine learning and DL-driven image recognition systems have developed rapidly. Recent research has used metamorphic testing (MT) to detect misclassified images. Most of them discuss metamorphic relations (MR), with little discussion on which regions should be transformed. We focus on the fact that there are sensitive regions where even a small transformation can easily change the prediction results and propose an MT framework that efficiently tests for regions prone to misclassification by transforming the sensitive regions. Our evaluation showed that the sensitive regions can be specified by Explainable AI (XAI) and our framework effectively detects faults.
    Feature representations useful for predicting image memorability. (arXiv:2303.07679v1 [cs.CV])
    Predicting image memorability has attracted interest in various fields. Consequently, prediction accuracy with convolutional neural network (CNN) models has been approaching the empirical upper bound estimated based on human consistency. However, identifying which feature representations embedded in CNN models are responsible for such high prediction accuracy of memorability remains an open question. To tackle this problem, this study sought to identify memorability-related feature representations in CNN models using brain similarity. Specifically, memorability prediction accuracy and brain similarity were examined and assessed by Brain-Score across 16,860 layers in 64 CNN models pretrained for object recognition. A clear tendency was shown in this comprehensive analysis that layers with high memorability prediction accuracy had higher brain similarity with the inferior temporal (IT) cortex, which is the highest stage in the ventral visual pathway. Furthermore, fine-tuning the 64 CNN models revealed that brain similarity with the IT cortex at the penultimate layer was positively correlated with memorability prediction accuracy. This analysis also showed that the best fine-tuned model provided accuracy comparable to the state-of-the-art CNN models developed specifically for memorability prediction. Overall, this study's results indicated that the CNN models' great success in predicting memorability relies on feature representation acquisition similar to the IT cortex. This study advanced our understanding of feature representations and its use for predicting image memorability.
    eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers. (arXiv:2211.01324v5 [cs.CV] UPDATED)
    Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion while conditioning on text prompts. We find that their synthesis behavior qualitatively changes throughout this process: Early in sampling, generation strongly relies on the text prompt to generate text-aligned content, while later, the text conditioning is almost entirely ignored. This suggests that sharing model parameters throughout the entire generation process may not be ideal. Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages. To maintain training efficiency, we initially train a single model, which is then split into specialized models that are trained for the specific stages of the iterative generation process. Our ensemble of diffusion models, called eDiff-I, results in improved text alignment while maintaining the same inference computation cost and preserving high visual quality, outperforming previous large-scale text-to-image diffusion models on the standard benchmark. In addition, we train our model to exploit a variety of embeddings for conditioning, including the T5 text, CLIP text, and CLIP image embeddings. We show that these different embeddings lead to different behaviors. Notably, the CLIP image embedding allows an intuitive way of transferring the style of a reference image to the target text-to-image output. Lastly, we show a technique that enables eDiff-I's "paint-with-words" capability. A user can select the word in the input text and paint it in a canvas to control the output, which is very handy for crafting the desired image in mind. The project page is available at https://deepimagination.cc/eDiff-I/
    DualMix: Unleashing the Potential of Data Augmentation for Online Class-Incremental Learning. (arXiv:2303.07864v1 [cs.LG])
    Online Class-Incremental (OCI) learning has sparked new approaches to expand the previously trained model knowledge from sequentially arriving data streams with new classes. Unfortunately, OCI learning can suffer from catastrophic forgetting (CF) as the decision boundaries for old classes can become inaccurate when perturbated by new ones. Existing literature have applied the data augmentation (DA) to alleviate the model forgetting, while the role of DA in OCI has not been well understood so far. In this paper, we theoretically show that augmented samples with lower correlation to the original data are more effective in preventing forgetting. However, aggressive augmentation may also reduce the consistency between data and corresponding labels, which motivates us to exploit proper DA to boost the OCI performance and prevent the CF problem. We propose the Enhanced Mixup (EnMix) method that mixes the augmented samples and their labels simultaneously, which is shown to enhance the sample diversity while maintaining strong consistency with corresponding labels. Further, to solve the class imbalance problem, we design an Adaptive Mixup (AdpMix) method to calibrate the decision boundaries by mixing samples from both old and new classes and dynamically adjusting the label mixing ratio. Our approach is demonstrated to be effective on several benchmark datasets through extensive experiments, and it is shown to be compatible with other replay-based techniques.
    On the Implicit Geometry of Cross-Entropy Parameterizations for Label-Imbalanced Data. (arXiv:2303.07608v1 [cs.LG])
    Various logit-adjusted parameterizations of the cross-entropy (CE) loss have been proposed as alternatives to weighted CE for training large models on label-imbalanced data far beyond the zero train error regime. The driving force behind those designs has been the theory of implicit bias, which for linear(ized) models, explains why they successfully induce bias on the optimization path towards solutions that favor minorities. Aiming to extend this theory to non-linear models, we investigate the implicit geometry of classifiers and embeddings that are learned by different CE parameterizations. Our main result characterizes the global minimizers of a non-convex cost-sensitive SVM classifier for the unconstrained features model, which serves as an abstraction of deep nets. We derive closed-form formulas for the angles and norms of classifiers and embeddings as a function of the number of classes, the imbalance and the minority ratios, and the loss hyperparameters. Using these, we show that logit-adjusted parameterizations can be appropriately tuned to learn symmetric geometries irrespective of the imbalance ratio. We complement our analysis with experiments and an empirical study of convergence accuracy in deep-nets.
    DBSCAN of Multi-Slice Clustering for three-order Tensor. (arXiv:2303.07768v1 [cs.LG])
    Several methods for triclustering three-dimensional data require the cluster size or the number of clusters in each dimension to be specified. To address this issue, the Multi-Slice Clustering (MSC) for 3-order tensor finds signal slices that lie in a low dimensional subspace for a rank-one tensor dataset in order to find a cluster based on the threshold similarity. We propose an extension algorithm called MSC-DBSCAN to extract the different clusters of slices that lie in the different subspaces from the data if the dataset is a sum of r rank-one tensor (r > 1). Our algorithm uses the same input as the MSC algorithm and can find the same solution for rank-one tensor data as MSC.
    Sample-efficient Adversarial Imitation Learning. (arXiv:2303.07846v1 [cs.LG])
    Imitation learning, in which learning is performed by demonstration, has been studied and advanced for sequential decision-making tasks in which a reward function is not predefined. However, imitation learning methods still require numerous expert demonstration samples to successfully imitate an expert's behavior. To improve sample efficiency, we utilize self-supervised representation learning, which can generate vast training signals from the given data. In this study, we propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations that are robust to diverse distortions and temporally predictive, on non-image control tasks. In particular, in comparison with existing self-supervised learning methods for tabular data, we propose a different corruption method for state and action representations that is robust to diverse distortions. We theoretically and empirically observe that making an informative feature manifold with less sample complexity significantly improves the performance of imitation learning. The proposed method shows a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs. Moreover, we conduct comprehensive ablations and additional experiments using demonstrations with varying optimality to provide insights into a range of factors.
    Solar Power Prediction Using Machine Learning. (arXiv:2303.07875v1 [cs.LG])
    This paper presents a machine learning-based approach for predicting solar power generation with high accuracy using a 99% AUC (Area Under the Curve) metric. The approach includes data collection, pre-processing, feature selection, model selection, training, evaluation, and deployment. High-quality data from multiple sources, including weather data, solar irradiance data, and historical solar power generation data, are collected and pre-processed to remove outliers, handle missing values, and normalize the data. Relevant features such as temperature, humidity, wind speed, and solar irradiance are selected for model training. Support Vector Machines (SVM), Random Forest, and Gradient Boosting are used as machine learning algorithms to produce accurate predictions. The models are trained on a large dataset of historical solar power generation data and other relevant features. The performance of the models is evaluated using AUC and other metrics such as precision, recall, and F1-score. The trained machine learning models are then deployed in a production environment, where they can be used to make real-time predictions about solar power generation. The results show that the proposed approach achieves a 99% AUC for solar power generation prediction, which can help energy companies better manage their solar power systems, reduce costs, and improve energy efficiency.
    HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones. (arXiv:2303.07538v1 [cs.LG])
    Modern noise-cancelling headphones have significantly improved users' auditory experiences by removing unwanted background noise, but they can also block out sounds that matter to users. Machine learning (ML) models for sound event detection (SED) and speaker identification (SID) can enable headphones to selectively pass through important sounds; however, implementing these models for a user-centric experience presents several unique challenges. First, most people spend limited time customizing their headphones, so the sound detection should work reasonably well out of the box. Second, the models should be able to learn over time the specific sounds that are important to users based on their implicit and explicit interactions. Finally, such models should have a small memory footprint to run on low-power headphones with limited on-chip memory. In this paper, we propose addressing these challenges using HiSSNet (Hierarchical SED and SID Network). HiSSNet is an SEID (SED and SID) model that uses a hierarchical prototypical network to detect both general and specific sounds of interest and characterize both alarm-like and speech sounds. We show that HiSSNet outperforms an SEID model trained using non-hierarchical prototypical networks by 6.9 - 8.6 percent. When compared to state-of-the-art (SOTA) models trained specifically for SED or SID alone, HiSSNet achieves similar or better performance while reducing the memory footprint required to support multiple capabilities on-device.  ( 2 min )
    FPTN: Fast Pure Transformer Network for Traffic Flow Forecasting. (arXiv:2303.07685v1 [cs.LG])
    Traffic flow forecasting is challenging due to the intricate spatio-temporal correlations in traffic flow data. Existing Transformer-based methods usually treat traffic flow forecasting as multivariate time series (MTS) forecasting. However, too many sensors can cause a vector with a dimension greater than 800, which is difficult to process without information loss. In addition, these methods design complex mechanisms to capture spatial dependencies in MTS, resulting in slow forecasting speed. To solve the abovementioned problems, we propose a Fast Pure Transformer Network (FPTN) in this paper. First, the traffic flow data are divided into sequences along the sensor dimension instead of the time dimension. Then, to adequately represent complex spatio-temporal correlations, Three types of embeddings are proposed for projecting these vectors into a suitable vector space. After that, to capture the complex spatio-temporal correlations simultaneously in these vectors, we utilize Transformer encoder and stack it with several layers. Extensive experiments are conducted with 4 real-world datasets and 13 baselines, which demonstrate that FPTN outperforms the state-of-the-art on two metrics. Meanwhile, the computational time of FPTN spent is less than a quarter of other state-of-the-art Transformer-based models spent, and the requirements for computing resources are significantly reduced.  ( 2 min )
    Tuning support vector machines and boosted trees using optimization algorithms. (arXiv:2303.07400v1 [stat.ML])
    Statistical learning methods have been growing in popularity in recent years. Many of these procedures have parameters that must be tuned for models to perform well. Research has been extensive in neural networks, but not for many other learning methods. We looked at the behavior of tuning parameters for support vector machines, gradient boosting machines, and adaboost in both a classification and regression setting. We used grid search to identify ranges of tuning parameters where good models can be found across many different datasets. We then explored different optimization algorithms to select a model across the tuning parameter space. Models selected by the optimization algorithm were compared to the best models obtained through grid search to select well performing algorithms. This information was used to create an R package, EZtune, that automatically tunes support vector machines and boosted trees.  ( 2 min )
    WDiscOOD: Out-of-Distribution Detection via Whitened Linear Discriminative Analysis. (arXiv:2303.07543v1 [cs.CV])
    Deep neural networks are susceptible to generating overconfident yet erroneous predictions when presented with data beyond known concepts. This challenge underscores the importance of detecting out-of-distribution (OOD) samples in the open world. In this work, we propose a novel feature-space OOD detection score that jointly reasons with both class-specific and class-agnostic information. Specifically, our approach utilizes Whitened Linear Discriminative Analysis to project features into two subspaces - the discriminative and residual subspaces - in which the ID classes are maximally separated and closely clustered, respectively. The OOD score is then determined by combining the deviation from the input data to the ID distribution in both subspaces. The efficacy of our method, named WDiscOOD, is verified on the large-scale ImageNet-1k benchmark, with six OOD datasets that covers a variety of distribution shifts. WDiscOOD demonstrates superior performance on deep classifiers with diverse backbone architectures, including CNN and vision transformer. Furthermore, we also show that our method can more effectively detect novel concepts in representation space trained with contrastive objectives, including supervised contrastive loss and multi-modality contrastive loss.  ( 2 min )
    Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies. (arXiv:2303.07551v1 [cs.LG])
    Recent work has shown the promise of creating generalist, transformer-based, policies for language, vision, and sequential decision-making problems. To create such models, we generally require centralized training objectives, data, and compute. It is of interest if we can more flexibly create generalist policies, by merging together multiple, task-specific, individually trained policies. In this work, we take a preliminary step in this direction through merging, or averaging, subsets of Decision Transformers in weight space trained on different MuJoCo locomotion problems, forming multi-task models without centralized training. We also propose that when merging policies, we can obtain better results if all policies start from common, pre-trained initializations, while also co-training on shared auxiliary tasks during problem-specific finetuning. In general, we believe research in this direction can help democratize and distribute the process of which forms generally capable agents.  ( 2 min )
    Clustering with Simplicial Complexes. (arXiv:2303.07646v1 [cs.LG])
    In this work, we propose a new clustering algorithm to group nodes in networks based on second-order simplices (aka filled triangles) to leverage higher-order network interactions. We define a simplicial conductance function, which on minimizing, yields an optimal partition with a higher density of filled triangles within the set while the density of filled triangles is smaller across the sets. To this end, we propose a simplicial adjacency operator that captures the relation between the nodes through second-order simplices. This allows us to extend the well-known Cheeger inequality to cluster a simplicial complex. Then, leveraging the Cheeger inequality, we propose the simplicial spectral clustering algorithm. We report results from numerical experiments on synthetic and real-world network data to demonstrate the efficacy of the proposed approach.  ( 2 min )
    General Loss Functions Lead to (Approximate) Interpolation in High Dimensions. (arXiv:2303.07475v1 [stat.ML])
    We provide a unified framework, applicable to a general family of convex losses and across binary and multiclass settings in the overparameterized regime, to approximately characterize the implicit bias of gradient descent in closed form. Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm interpolation in high dimensions, which arises from training on the squared loss. In contrast to prior work which was tailored to exponentially-tailed losses and used the intermediate support-vector-machine formulation, our framework directly builds on the primal-dual analysis of Ji and Telgarsky (2021), allowing us to provide new approximate equivalences for general convex losses through a novel sensitivity analysis. Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings. Finally, we provide evidence for the tightness of our techniques, which we use to demonstrate the effect of certain loss functions designed for out-of-distribution problems on the closed-form solution.  ( 2 min )
    Reinforcement Learning-based Wavefront Sensorless Adaptive Optics Approaches for Satellite-to-Ground Laser Communication. (arXiv:2303.07516v1 [cs.LG])
    Optical satellite-to-ground communication (OSGC) has the potential to improve access to fast and affordable Internet in remote regions. Atmospheric turbulence, however, distorts the optical beam, eroding the data rate potential when coupling into single-mode fibers. Traditional adaptive optics (AO) systems use a wavefront sensor to improve fiber coupling. This leads to higher system size, cost and complexity, consumes a fraction of the incident beam and introduces latency, making OSGC for internet service impractical. We propose the use of reinforcement learning (RL) to reduce the latency, size and cost of the system by up to $30-40\%$ by learning a control policy through interactions with a low-cost quadrant photodiode rather than a wavefront phase profiling camera. We develop and share an AO RL environment that provides a standardized platform to develop and evaluate RL based on the Strehl ratio, which is correlated to fiber-coupling performance. Our empirical analysis finds that Proximal Policy Optimization (PPO) outperforms Soft-Actor-Critic and Deep Deterministic Policy Gradient. PPO converges to within $86\%$ of the maximum reward obtained by an idealized Shack-Hartmann sensor after training of 250 episodes, indicating the potential of RL to enable efficient wavefront sensorless OSGC.  ( 2 min )
    AdPE: Adversarial Positional Embeddings for Pretraining Vision Transformers via MAE+. (arXiv:2303.07598v1 [cs.CV])
    Unsupervised learning of vision transformers seeks to pretrain an encoder via pretext tasks without labels. Among them is the Masked Image Modeling (MIM) aligned with pretraining of language transformers by predicting masked patches as a pretext task. A criterion in unsupervised pretraining is the pretext task needs to be sufficiently hard to prevent the transformer encoder from learning trivial low-level features not generalizable well to downstream tasks. For this purpose, we propose an Adversarial Positional Embedding (AdPE) approach -- It distorts the local visual structures by perturbing the position encodings so that the learned transformer cannot simply use the locally correlated patches to predict the missing ones. We hypothesize that it forces the transformer encoder to learn more discriminative features in a global context with stronger generalizability to downstream tasks. We will consider both absolute and relative positional encodings, where adversarial positions can be imposed both in the embedding mode and the coordinate mode. We will also present a new MAE+ baseline that brings the performance of the MIM pretraining to a new level with the AdPE. The experiments demonstrate that our approach can improve the fine-tuning accuracy of MAE by $0.8\%$ and $0.4\%$ over 1600 epochs of pretraining ViT-B and ViT-L on Imagenet1K. For the transfer learning task, it outperforms the MAE with the ViT-B backbone by $2.6\%$ in mIoU on ADE20K, and by $3.2\%$ in AP$^{bbox}$ and $1.6\%$ in AP$^{mask}$ on COCO, respectively. These results are obtained with the AdPE being a pure MIM approach that does not use any extra models or external datasets for pretraining. The code is available at https://github.com/maple-research-lab/AdPE.  ( 2 min )
    Testing Causality for High Dimensional Data. (arXiv:2303.07774v1 [cs.LG])
    Determining causal relationship between high dimensional observations are among the most important tasks in scientific discoveries. In this paper, we revisited the \emph{linear trace method}, a technique proposed in~\citep{janzing2009telling,zscheischler2011testing} to infer the causal direction between two random variables of high dimensions. We strengthen the existing results significantly by providing an improved tail analysis in addition to extending the results to nonlinear trace functionals with sharper confidence bounds under certain distributional assumptions. We obtain our results by interpreting the trace estimator in the causal regime as a function over random orthogonal matrices, where the concentration of Lipschitz functions over such space could be applied. We additionally propose a novel ridge-regularized variant of the estimator in \cite{zscheischler2011testing}, and give provable bounds relating the ridge-estimated terms to their ground-truth counterparts. We support our theoretical results with encouraging experiments on synthetic datasets, more prominently, under high-dimension low sample size regime.  ( 2 min )
    Lifelong Learning for Anomaly Detection: New Challenges, Perspectives, and Insights. (arXiv:2303.07557v1 [cs.LG])
    Anomaly detection is of paramount importance in many real-world domains, characterized by evolving behavior. Lifelong learning represents an emerging trend, answering the need for machine learning models that continuously adapt to new challenges in dynamic environments while retaining past knowledge. However, limited efforts are dedicated to building foundations for lifelong anomaly detection, which provides intrinsically different challenges compared to the more widely explored classification setting. In this paper, we face this issue by exploring, motivating, and discussing lifelong anomaly detection, trying to build foundations for its wider adoption. First, we explain why lifelong anomaly detection is relevant, defining challenges and opportunities to design anomaly detection methods that deal with lifelong learning complexities. Second, we characterize learning settings and a scenario generation procedure that enables researchers to experiment with lifelong anomaly detection using existing datasets. Third, we perform experiments with popular anomaly detection methods on proposed lifelong scenarios, emphasizing the gap in performance that could be gained with the adoption of lifelong learning. Overall, we conclude that the adoption of lifelong anomaly detection is important to design more robust models that provide a comprehensive view of the environment, as well as simultaneous adaptation and knowledge retention.  ( 2 min )
    DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions. (arXiv:2303.07697v1 [cs.CV])
    For realistic talking head generation, creating natural head motion while maintaining accurate lip synchronization is essential. To fulfill this challenging task, we propose DisCoHead, a novel method to disentangle and control head pose and facial expressions without supervision. DisCoHead uses a single geometric transformation as a bottleneck to isolate and extract head motion from a head-driving video. Either an affine or a thin-plate spline transformation can be used and both work well as geometric bottlenecks. We enhance the efficiency of DisCoHead by integrating a dense motion estimator and the encoder of a generator which are originally separate modules. Taking a step further, we also propose a neural mix approach where dense motion is estimated and applied implicitly by the encoder. After applying the disentangled head motion to a source identity, DisCoHead controls the mouth region according to speech audio, and it blinks eyes and moves eyebrows following a separate driving video of the eye region, via the weight modulation of convolutional neural networks. The experiments using multiple datasets show that DisCoHead successfully generates realistic audio-and-video-driven talking heads and outperforms state-of-the-art methods. Project page: https://deepbrainai-research.github.io/discohead/  ( 2 min )
    SuperMask: Generating High-resolution object masks from multi-view, unaligned low-resolution MRIs. (arXiv:2303.07517v1 [eess.IV])
    Three-dimensional segmentation in magnetic resonance images (MRI), which reflects the true shape of the objects, is challenging since high-resolution isotropic MRIs are rare and typical MRIs are anisotropic, with the out-of-plane dimension having a much lower resolution. A potential remedy to this issue lies in the fact that often multiple sequences are acquired on different planes. However, in practice, these sequences are not orthogonal to each other, limiting the applicability of many previous solutions to reconstruct higher-resolution images from multiple lower-resolution ones. We propose a weakly-supervised deep learning-based solution to generating high-resolution masks from multiple low-resolution images. Our method combines segmentation and unsupervised registration networks by introducing two new regularizations to make registration and segmentation reinforce each other. Finally, we introduce a multi-view fusion method to generate high-resolution target object masks. The experimental results on two datasets show the superiority of our methods. Importantly, the advantage of not using high-resolution images in the training process makes our method applicable to a wide variety of MRI segmentation tasks.  ( 2 min )
    VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation. (arXiv:2303.07578v1 [cs.SD])
    We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. Our model builds upon disentanglement strategies proposed in RADMMM and supports explicit control of accent, language, speaker and fine-grained $F_0$ and energy features for speech synthesis. We utilize the Indic languages dataset, released for LIMMITS 2023 as part of ICASSP Signal Processing Grand Challenge, to synthesize speech in 3 different languages. Our model supports transferring the language of a speaker while retaining their voice and the native accent of the target language. We utilize the large-parameter RADMMM model for Track $1$ and lightweight VANI model for Track $2$ and $3$ of the competition.  ( 2 min )
    Fast Regularized Discrete Optimal Transport with Group-Sparse Regularizers. (arXiv:2303.07597v1 [cs.LG])
    Regularized discrete optimal transport (OT) is a powerful tool to measure the distance between two discrete distributions that have been constructed from data samples on two different domains. While it has a wide range of applications in machine learning, in some cases the sampled data from only one of the domains will have class labels such as unsupervised domain adaptation. In this kind of problem setting, a group-sparse regularizer is frequently leveraged as a regularization term to handle class labels. In particular, it can preserve the label structure on the data samples by corresponding the data samples with the same class label to one group-sparse regularization term. As a result, we can measure the distance while utilizing label information by solving the regularized optimization problem with gradient-based algorithms. However, the gradient computation is expensive when the number of classes or data samples is large because the number of regularization terms and their respective sizes also turn out to be large. This paper proposes fast discrete OT with group-sparse regularizers. Our method is based on two ideas. The first is to safely skip the computations of the gradients that must be zero. The second is to efficiently extract the gradients that are expected to be nonzero. Our method is guaranteed to return the same value of the objective function as that of the original method. Experiments show that our method is up to 8.6 times faster than the original method without degrading accuracy.  ( 2 min )
    ForDigitStress: A multi-modal stress dataset employing a digital job interview scenario. (arXiv:2303.07742v1 [cs.LG])
    We present a multi-modal stress dataset that uses digital job interviews to induce stress. The dataset provides multi-modal data of 40 participants including audio, video (motion capturing, facial recognition, eye tracking) as well as physiological information (photoplethysmography, electrodermal activity). In addition to that, the dataset contains time-continuous annotations for stress and occurred emotions (e.g. shame, anger, anxiety, surprise). In order to establish a baseline, five different machine learning classifiers (Support Vector Machine, K-Nearest Neighbors, Random Forest, Long-Short-Term Memory Network) have been trained and evaluated on the proposed dataset for a binary stress classification task. The best-performing classifier achieved an accuracy of 88.3% and an F1-score of 87.5%.  ( 2 min )
    AutoTransfer: AutoML with Knowledge Transfer -- An Application to Graph Neural Networks. (arXiv:2303.07669v1 [cs.LG])
    AutoML has demonstrated remarkable success in finding an effective neural architecture for a given machine learning task defined by a specific dataset and an evaluation metric. However, most present AutoML techniques consider each task independently from scratch, which requires exploring many architectures, leading to high computational cost. Here we propose AutoTransfer, an AutoML solution that improves search efficiency by transferring the prior architectural design knowledge to the novel task of interest. Our key innovation includes a task-model bank that captures the model performance over a diverse set of GNN architectures and tasks, and a computationally efficient task embedding that can accurately measure the similarity among different tasks. Based on the task-model bank and the task embeddings, we estimate the design priors of desirable models of the novel task, by aggregating a similarity-weighted sum of the top-K design distributions on tasks that are similar to the task of interest. The computed design priors can be used with any AutoML search algorithm. We evaluate AutoTransfer on six datasets in the graph machine learning domain. Experiments demonstrate that (i) our proposed task embedding can be computed efficiently, and that tasks with similar embeddings have similar best-performing architectures; (ii) AutoTransfer significantly improves search efficiency with the transferred design priors, reducing the number of explored architectures by an order of magnitude. Finally, we release GNN-Bank-101, a large-scale dataset of detailed GNN training information of 120,000 task-model combinations to facilitate and inspire future research.  ( 2 min )
    Machine Learning Computer Vision Applications for Spatial AI Object Recognition in Orange County, California. (arXiv:2303.07560v1 [cs.CV])
    We provide an integrated and systematic automation approach to spatial object recognition and positional detection using AI machine learning and computer vision algorithms for Orange County, California. We describe a comprehensive methodology for multi-sensor, high-resolution field data acquisition, along with post-field processing and pre-analysis processing tasks. We developed a series of algorithmic formulations and workflows that integrate convolutional deep neural network learning with detected object positioning estimation in 360{\deg} equirectancular photosphere imagery. We provide examples of application processing more than 800 thousand cardinal directions in photosphere images across two areas in Orange County, and present detection results for stop-sign and fire hydrant object recognition. We discuss the efficiency and effectiveness of our approach, along with broader inferences related to the performance and implications of this approach for future technological innovations, including automation of spatial data and public asset inventories, and near real-time AI field data systems.  ( 2 min )
    Guided Speech Enhancement Network. (arXiv:2303.07486v1 [eess.AS])
    High quality speech capture has been widely studied for both voice communication and human computer interface reasons. To improve the capture performance, we can often find multi-microphone speech enhancement techniques deployed on various devices. Multi-microphone speech enhancement problem is often decomposed into two decoupled steps: a beamformer that provides spatial filtering and a single-channel speech enhancement model that cleans up the beamformer output. In this work, we propose a speech enhancement solution that takes both the raw microphone and beamformer outputs as the input for an ML model. We devise a simple yet effective training scheme that allows the model to learn from the cues of the beamformer by contrasting the two inputs and greatly boost its capability in spatial rejection, while conducting the general tasks of denoising and dereverberation. The proposed solution takes advantage of classical spatial filtering algorithms instead of competing with them. By design, the beamformer module then could be selected separately and does not require a large amount of data to be optimized for a given form factor, and the network model can be considered as a standalone module which is highly transferable independently from the microphone array. We name the ML module in our solution as GSENet, short for Guided Speech Enhancement Network. We demonstrate its effectiveness on real world data collected on multi-microphone devices in terms of the suppression of noise and interfering speech.  ( 2 min )
    Automated Vulnerability Detection in Source Code Using Quantum Natural Language Processing. (arXiv:2303.07525v1 [cs.LG])
    One of the most important challenges in the field of software code audit is the presence of vulnerabilities in software source code. These flaws are highly likely ex-ploited and lead to system compromise, data leakage, or denial of ser-vice. C and C++ open source code are now available in order to create a large-scale, classical machine-learning and quantum machine-learning system for function-level vulnerability identification. We assembled a siz-able dataset of millions of open-source functions that point to poten-tial exploits. We created an efficient and scalable vulnerability detection method based on a deep neural network model Long Short Term Memory (LSTM), and quantum machine learning model Long Short Term Memory (QLSTM), that can learn features extracted from the source codes. The source code is first converted into a minimal intermediate representation to remove the pointless components and shorten the de-pendency. Therefore, We keep the semantic and syntactic information using state of the art word embedding algorithms such as Glove and fastText. The embedded vectors are subsequently fed into the classical and quantum convolutional neural networks to classify the possible vulnerabilities. To measure the performance, we used evaluation metrics such as F1 score, precision, re-call, accuracy, and total execution time. We made a comparison between the results derived from the classical LSTM and quantum LSTM using basic feature representation as well as semantic and syntactic represen-tation. We found that the QLSTM with semantic and syntactic features detects significantly accurate vulnerability and runs faster than its classical counterpart.  ( 2 min )
    Architext: Language-Driven Generative Architecture Design. (arXiv:2303.07519v1 [cs.CL])
    Architectural design is a highly complex practice that involves a wide diversity of disciplines, technologies, proprietary design software, expertise, and an almost infinite number of constraints, across a vast array of design tasks. Enabling intuitive, accessible, and scalable design processes is an important step towards performance-driven and sustainable design for all. To that end, we introduce Architext, a novel semantic generation assistive tool. Architext enables design generation with only natural language prompts, given to large-scale Language Models, as input. We conduct a thorough quantitative evaluation of Architext's downstream task performance, focusing on semantic accuracy and diversity for a number of pre-trained language models ranging from 120 million to 6 billion parameters. Architext models are able to learn the specific design task, generating valid residential layouts at a near 100\% rate. Accuracy shows great improvement when scaling the models, with the largest model (GPT-J) yielding impressive accuracy ranging between 25% to over 80% for different prompt categories. We open source the finetuned Architext models and our synthetic dataset, hoping to inspire experimentation in this exciting area of design research.  ( 2 min )
    Using VAEs to Learn Latent Variables: Observations on Applications in cryo-EM. (arXiv:2303.07487v1 [stat.ML])
    Variational autoencoders (VAEs) are a popular generative model used to approximate distributions. The encoder part of the VAE is used in amortized learning of latent variables, producing a latent representation for data samples. Recently, VAEs have been used to characterize physical and biological systems. In this case study, we qualitatively examine the amortization properties of a VAE used in biological applications. We find that in this application the encoder bears a qualitative resemblance to more traditional explicit representation of latent variables.  ( 2 min )
    Study on the Data Storage Technology of Mini-Airborne Radar Based on Machine Learning. (arXiv:2303.07407v1 [cs.AR])
    The data rate of airborne radar is much higher than the wireless data transfer rate in many detection applications, so the onboard data storage systems are usually used to store the radar data. Data storage systems with good seismic performance usually use NAND Flash as storage medium, and there is a widespread problem of long file management time, which seriously affects the data storage speed, especially under the limitation of platform miniaturization. To solve this problem, a data storage method based on machine learning is proposed for mini-airborne radar. The storage training model is established based on machine learning, and could process various kinds of radar data. The file management methods are classified and determined using the model, and then are applied to the storage of radar data. To verify the performance of the proposed method, a test was carried out on the data storage system of a mini-airborne radar. The experimental results show that the method based on machine learning can form various data storage methods adapted to different data rates and application scenarios. The ratio of the file management time to the actual data writing time is extremely low.  ( 2 min )
    Many learning agents interacting with an agent-based market model. (arXiv:2303.07393v1 [q-fin.TR])
    We consider the dynamics and the interactions of multiple reinforcement learning optimal execution trading agents interacting with a reactive Agent-Based Model (ABM) of a financial market in event time. The model represents a market ecology with 3-trophic levels represented by: optimal execution learning agents, minimally intelligent liquidity takers, and fast electronic liquidity providers. The optimal execution agent classes include buying and selling agents that can either use a combination of limit orders and market orders, or only trade using market orders. The reward function explicitly balances trade execution slippage against the penalty of not executing the order timeously. This work demonstrates how multiple competing learning agents impact a minimally intelligent market simulation as functions of the number of agents, the size of agents' initial orders, and the state spaces used for learning. We use phase space plots to examine the dynamics of the ABM, when various specifications of learning agents are included. Further, we examine whether the inclusion of optimal execution agents that can learn is able to produce dynamics with the same complexity as empirical data. We find that the inclusion of optimal execution agents changes the stylised facts produced by ABM to conform more with empirical data, and are a necessary inclusion for ABMs investigating market micro-structure. However, including execution agents to chartist-fundamentalist-noise ABMs is insufficient to recover the complexity observed in empirical data.  ( 2 min )
    Path Planning using Reinforcement Learning: A Policy Iteration Approach. (arXiv:2303.07535v1 [cs.LG])
    With the impact of real-time processing being realized in the recent past, the need for efficient implementations of reinforcement learning algorithms has been on the rise. Albeit the numerous advantages of Bellman equations utilized in RL algorithms, they are not without the large search space of design parameters. This research aims to shed light on the design space exploration associated with reinforcement learning parameters, specifically that of Policy Iteration. Given the large computational expenses of fine-tuning the parameters of reinforcement learning algorithms, we propose an auto-tuner-based ordinal regression approach to accelerate the process of exploring these parameters and, in return, accelerate convergence towards an optimal policy. Our approach provides 1.82x peak speedup with an average of 1.48x speedup over the previous state-of-the-art.  ( 2 min )
    Tensor-based Multimodal Learning for Prediction of Pulmonary Arterial Wedge Pressure from Cardiac MRI. (arXiv:2303.07540v1 [cs.LG])
    Heart failure is a serious and life-threatening condition that can lead to elevated pressure in the left ventricle. Pulmonary Arterial Wedge Pressure (PAWP) is an important surrogate marker indicating high pressure in the left ventricle. PAWP is determined by Right Heart Catheterization (RHC) but it is an invasive procedure. A non-invasive method is useful in quickly identifying high-risk patients from a large population. In this work, we develop a tensor learning-based pipeline for identifying PAWP from multimodal cardiac Magnetic Resonance Imaging (MRI). This pipeline extracts spatial and temporal features from high-dimensional scans. For quality control, we incorporate an epistemic uncertainty-based binning strategy to identify poor-quality training samples. To improve the performance, we learn complementary information by integrating features from multimodal data: cardiac MRI with short-axis and four-chamber views, and Electronic Health Records. The experimental analysis on a large cohort of $1346$ subjects who underwent the RHC procedure for PAWP estimation indicates that the proposed pipeline has a diagnostic value and can produce promising performance with significant improvement over the baseline in clinical practice (i.e., $\Delta$AUC $=0.10$, $\Delta$Accuracy $=0.06$, and $\Delta$MCC $=0.39$). The decision curve analysis further confirms the clinical utility of our method.  ( 2 min )
    Unsupervised Representation Learning in Partially Observable Atari Games. (arXiv:2303.07437v1 [cs.LG])
    State representation learning aims to capture latent factors of an environment. Contrastive methods have performed better than generative models in previous state representation learning research. Although some researchers realize the connections between masked image modeling and contrastive representation learning, the effort is focused on using masks as an augmentation technique to represent the latent generative factors better. Partially observable environments in reinforcement learning have not yet been carefully studied using unsupervised state representation learning methods. In this article, we create an unsupervised state representation learning scheme for partially observable states. We conducted our experiment on a previous Atari 2600 framework designed to evaluate representation learning models. A contrastive method called Spatiotemporal DeepInfomax (ST-DIM) has shown state-of-the-art performance on this benchmark but remains inferior to its supervised counterpart. Our approach improves ST-DIM when the environment is not fully observable and achieves higher F1 scores and accuracy scores than the supervised learning counterpart. The mean accuracy score averaged over categories of our approach is ~66%, compared to ~38% of supervised learning. The mean F1 score is ~64% to ~33%.  ( 2 min )
    Efficient Bayesian Physics Informed Neural Networks for Inverse Problems via Ensemble Kalman Inversion. (arXiv:2303.07392v1 [stat.ML])
    Bayesian Physics Informed Neural Networks (B-PINNs) have gained significant attention for inferring physical parameters and learning the forward solutions for problems based on partial differential equations. However, the overparameterized nature of neural networks poses a computational challenge for high-dimensional posterior inference. Existing inference approaches, such as particle-based or variance inference methods, are either computationally expensive for high-dimensional posterior inference or provide unsatisfactory uncertainty estimates. In this paper, we present a new efficient inference algorithm for B-PINNs that uses Ensemble Kalman Inversion (EKI) for high-dimensional inference tasks. We find that our proposed method can achieve inference results with informative uncertainty estimates comparable to Hamiltonian Monte Carlo (HMC)-based B-PINNs with a much reduced computational cost. These findings suggest that our proposed approach has great potential for uncertainty quantification in physics-informed machine learning for practical applications.  ( 2 min )
    Loss of Plasticity in Continual Deep Reinforcement Learning. (arXiv:2303.07507v1 [cs.LG])
    The ability to learn continually is essential in a complex and changing world. In this paper, we characterize the behavior of canonical value-based deep reinforcement learning (RL) approaches under varying degrees of non-stationarity. In particular, we demonstrate that deep RL agents lose their ability to learn good policies when they cycle through a sequence of Atari 2600 games. This phenomenon is alluded to in prior work under various guises -- e.g., loss of plasticity, implicit under-parameterization, primacy bias, and capacity loss. We investigate this phenomenon closely at scale and analyze how the weights, gradients, and activations change over time in several experiments with varying dimensions (e.g., similarity between games, number of games, number of frames per game), with some experiments spanning 50 days and 2 billion environment interactions. Our analysis shows that the activation footprint of the network becomes sparser, contributing to the diminishing gradients. We investigate a remarkably simple mitigation strategy -- Concatenated ReLUs (CReLUs) activation function -- and demonstrate its effectiveness in facilitating continual learning in a changing environment.  ( 2 min )
    Audio Visual Language Maps for Robot Navigation. (arXiv:2303.07522v1 [cs.RO])
    While interacting in the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments. In this work, we propose Audio-Visual-Language Maps (AVLMaps), a unified 3D spatial map representation for storing cross-modal information from audio, visual, and language cues. AVLMaps integrate the open-vocabulary capabilities of multimodal foundation models pre-trained on Internet-scale data by fusing their features into a centralized 3D voxel grid. In the context of navigation, we show that AVLMaps enable robot systems to index goals in the map based on multimodal queries, e.g., textual descriptions, images, or audio snippets of landmarks. In particular, the addition of audio information enables robots to more reliably disambiguate goal locations. Extensive experiments in simulation show that AVLMaps enable zero-shot multimodal goal navigation from multimodal prompts and provide 50% better recall in ambiguous scenarios. These capabilities extend to mobile robots in the real world - navigating to landmarks referring to visual, audio, and spatial concepts. Videos and code are available at: https://avlmaps.github.io.  ( 2 min )
  • Open

    General Loss Functions Lead to (Approximate) Interpolation in High Dimensions. (arXiv:2303.07475v1 [stat.ML])
    We provide a unified framework, applicable to a general family of convex losses and across binary and multiclass settings in the overparameterized regime, to approximately characterize the implicit bias of gradient descent in closed form. Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm interpolation in high dimensions, which arises from training on the squared loss. In contrast to prior work which was tailored to exponentially-tailed losses and used the intermediate support-vector-machine formulation, our framework directly builds on the primal-dual analysis of Ji and Telgarsky (2021), allowing us to provide new approximate equivalences for general convex losses through a novel sensitivity analysis. Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings. Finally, we provide evidence for the tightness of our techniques, which we use to demonstrate the effect of certain loss functions designed for out-of-distribution problems on the closed-form solution.
    Using VAEs to Learn Latent Variables: Observations on Applications in cryo-EM. (arXiv:2303.07487v1 [stat.ML])
    Variational autoencoders (VAEs) are a popular generative model used to approximate distributions. The encoder part of the VAE is used in amortized learning of latent variables, producing a latent representation for data samples. Recently, VAEs have been used to characterize physical and biological systems. In this case study, we qualitatively examine the amortization properties of a VAE used in biological applications. We find that in this application the encoder bears a qualitative resemblance to more traditional explicit representation of latent variables.
    Tuning support vector machines and boosted trees using optimization algorithms. (arXiv:2303.07400v1 [stat.ML])
    Statistical learning methods have been growing in popularity in recent years. Many of these procedures have parameters that must be tuned for models to perform well. Research has been extensive in neural networks, but not for many other learning methods. We looked at the behavior of tuning parameters for support vector machines, gradient boosting machines, and adaboost in both a classification and regression setting. We used grid search to identify ranges of tuning parameters where good models can be found across many different datasets. We then explored different optimization algorithms to select a model across the tuning parameter space. Models selected by the optimization algorithm were compared to the best models obtained through grid search to select well performing algorithms. This information was used to create an R package, EZtune, that automatically tunes support vector machines and boosted trees.
    Variational Inference with Gaussian Mixture by Entropy Approximation. (arXiv:2202.13059v3 [stat.ML] UPDATED)
    Variational inference is a technique for approximating intractable posterior distributions in order to quantify the uncertainty of machine learning. Although the unimodal Gaussian distribution is usually chosen as a parametric distribution, it hardly approximates the multimodality. In this paper, we employ the Gaussian mixture distribution as a parametric distribution. A main difficulty of variational inference with the Gaussian mixture is how to approximate the entropy of the Gaussian mixture. We approximate the entropy of the Gaussian mixture as the sum of the entropy of the unimodal Gaussian, which can be analytically calculated. In addition, we theoretically analyze the approximation error between the true entropy and approximated one in order to reveal when our approximation works well. Specifically, the approximation error is controlled by the ratios of the distances between the means to the sum of the variances of the Gaussian mixture. Furthermore, it converges to zero when the ratios go to infinity. This situation seems to be more likely to occur in higher dimensional parametric spaces because of the curse of dimensionality. Therefore, our result guarantees that our approximation works well, for example, in neural networks that assume a large number of weights.
    A law of adversarial risk, interpolation, and label noise. (arXiv:2207.03933v3 [stat.ML] UPDATED)
    In supervised learning, it has been shown that label noise in the data can be interpolated without penalties on test accuracy. We show that interpolating label noise induces adversarial vulnerability, and prove the first theorem showing the relationship between label noise and adversarial risk for any data distribution. Our results are almost tight if we do not make any assumptions on the inductive bias of the learning algorithm. We then investigate how different components of this problem affect this result, including properties of the distribution. We also discuss non-uniform label noise distributions; and prove a new theorem showing uniform label noise induces nearly as large an adversarial risk as the worst poisoning with the same noise rate. Then, we provide theoretical and empirical evidence that uniform label noise is more harmful than typical real-world label noise. Finally, we show how inductive biases amplify the effect of label noise and argue the need for future work in this direction.
    Expectation Distance-based Distributional Clustering for Noise-Robustness. (arXiv:2110.08871v4 [cs.LG] UPDATED)
    This paper presents a clustering technique that reduces the susceptibility to data noise by learning and clustering the data-distribution and then assigning the data to the cluster of its distribution. In the process, it reduces the impact of noise on clustering results. This method involves introducing a new distance among distributions, namely the expectation distance (denoted, ED), that goes beyond the state-of-art distribution distance of optimal mass transport (denoted, $W_2$ for $2$-Wasserstein): The latter essentially depends only on the marginal distributions while the former also employs the information about the joint distributions. Using the ED, the paper extends the classical $K$-means and $K$-medoids clustering to those over data-distributions (rather than raw-data) and introduces $K$-medoids using $W_2$. The paper also presents the closed-form expressions of the $W_2$ and ED distance measures. The implementation results of the proposed ED and the $W_2$ distance measures to cluster real-world weather data as well as stock data are also presented, which involves efficiently extracting and using the underlying data distributions -- Gaussians for weather data versus lognormals for stock data. The results show striking performance improvement over classical clustering of raw-data, with higher accuracy realized for ED. Also, not only does the distribution-based clustering offer higher accuracy, but it also lowers the computation time due to reduced time-complexity.
    Regret Lower Bounds for Learning Linear Quadratic Gaussian Systems. (arXiv:2201.01680v3 [cs.LG] UPDATED)
    TWe establish regret lower bounds for adaptively controlling an unknown linear Gaussian system with quadratic costs. We combine ideas from experiment design, estimation theory and a perturbation bound of certain information matrices to derive regret lower bounds exhibiting scaling on the order of magnitude $\sqrt{T}$ in the time horizon $T$. Our bounds accurately capture the role of control-theoretic parameters and we are able to show that systems that are hard to control are also hard to learn to control; when instantiated to state feedback systems we recover the dimensional dependency of earlier work but with improved scaling with system-theoretic constants such as system costs and Gramians. Furthermore, we extend our results to a class of partially observed systems and demonstrate that systems with poor observability structure also are hard to learn to control.
    Exact Selective Inference with Randomization. (arXiv:2212.12940v3 [stat.ME] UPDATED)
    We introduce a pivot for exact selective inference with randomization. Not only does our pivot lead to exact inference in Gaussian regression models, but it is also available in closed form. We reduce the problem of exact selective inference to a bivariate truncated Gaussian distribution. By doing so, we give up some power that is achieved with approximate inference in Panigrahi and Taylor (2022). Yet we always produce narrower confidence intervals than a closely related data-splitting procedure. For popular instances of Gaussian regression, this price -- in terms of power -- in exchange for exact selective inference is demonstrated in simulated experiments and in an HIV drug resistance analysis.
    Causal Rule Ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects. (arXiv:2009.09036v4 [stat.ME] UPDATED)
    In health and social sciences, it is critically important to identify subgroups of the study population where a treatment has notable heterogeneity in the causal effects with respect to the average treatment effect. Data-driven discovery of heterogeneous treatment effects (HTE) via decision tree methods has been proposed for this task. Despite its high interpretability, the single-tree discovery of HTE tends to be highly unstable and to find an oversimplified representation of treatment heterogeneity. To accommodate these shortcomings, we propose Causal Rule Ensemble (CRE), a new method to discover heterogeneous subgroups through an ensemble-of-trees approach. CRE has the following features: 1) provides an interpretable representation of the HTE; 2) allows extensive exploration of complex heterogeneity patterns; and 3) guarantees high stability in the discovery. The discovered subgroups are defined in terms of interpretable decision rules, and we develop a general two-stage approach for subgroup-specific conditional causal effects estimation, providing theoretical guarantees. Via simulations, we show that the CRE method has a strong discovery ability and a competitive estimation performance when compared to state-of-the-art techniques. Finally, we apply CRE to discover subgroups most vulnerable to the effects of exposure to air pollution on mortality for 35.3 million Medicare beneficiaries across the contiguous U.S.
    Pretrained Language Models are Symbolic Mathematics Solvers too!. (arXiv:2110.03501v3 [stat.ML] UPDATED)
    Solving symbolic mathematics has always been of in the arena of human ingenuity that needs compositional reasoning and recurrence. However, recent studies have shown that large-scale language models such as transformers are universal and surprisingly can be trained as a sequence-to-sequence task to solve complex mathematical equations. These large transformer models need humongous amounts of training data to generalize to unseen symbolic mathematics problems. In this paper, we present a sample efficient way of solving the symbolic tasks by first pretraining the transformer model with language translation and then fine-tuning the pretrained transformer model to solve the downstream task of symbolic mathematics. We achieve comparable accuracy on the integration task with our pretrained model while using around $1.5$ orders of magnitude less number of training samples with respect to the state-of-the-art deep learning for symbolic mathematics. The test accuracy on differential equation tasks is considerably lower comparing with integration as they need higher order recursions that are not present in language translations. We propose the generalizability of our pretrained language model from Anna Karenina Principle (AKP). We pretrain our model with different pairs of language translations. Our results show language bias in solving symbolic mathematics tasks. Finally, we study the robustness of the fine-tuned model on symbolic math tasks against distribution shift, and our approach generalizes better in distribution shift scenarios for the function integration.
    Fast Regularized Discrete Optimal Transport with Group-Sparse Regularizers. (arXiv:2303.07597v1 [cs.LG])
    Regularized discrete optimal transport (OT) is a powerful tool to measure the distance between two discrete distributions that have been constructed from data samples on two different domains. While it has a wide range of applications in machine learning, in some cases the sampled data from only one of the domains will have class labels such as unsupervised domain adaptation. In this kind of problem setting, a group-sparse regularizer is frequently leveraged as a regularization term to handle class labels. In particular, it can preserve the label structure on the data samples by corresponding the data samples with the same class label to one group-sparse regularization term. As a result, we can measure the distance while utilizing label information by solving the regularized optimization problem with gradient-based algorithms. However, the gradient computation is expensive when the number of classes or data samples is large because the number of regularization terms and their respective sizes also turn out to be large. This paper proposes fast discrete OT with group-sparse regularizers. Our method is based on two ideas. The first is to safely skip the computations of the gradients that must be zero. The second is to efficiently extract the gradients that are expected to be nonzero. Our method is guaranteed to return the same value of the objective function as that of the original method. Experiments show that our method is up to 8.6 times faster than the original method without degrading accuracy.
    Bayes Complexity of Learners vs Overfitting. (arXiv:2303.07874v1 [cs.LG])
    We introduce a new notion of complexity of functions and we show that it has the following properties: (i) it governs a PAC Bayes-like generalization bound, (ii) for neural networks it relates to natural notions of complexity of functions (such as the variation), and (iii) it explains the generalization gap between neural networks and linear schemes. While there is a large set of papers which describes bounds that have each such property in isolation, and even some that have two, as far as we know, this is a first notion that satisfies all three of them. Moreover, in contrast to previous works, our notion naturally generalizes to neural networks with several layers. Even though the computation of our complexity is nontrivial in general, an upper-bound is often easy to derive, even for higher number of layers and functions with structure, such as period functions. An upper-bound we derive allows to show a separation in the number of samples needed for good generalization between 2 and 4-layer neural networks for periodic functions.
    A Diffusion Model Predicts 3D Shapes from 2D Microscopy Images. (arXiv:2208.14125v3 [cs.CV] UPDATED)
    Diffusion models are a special type of generative model, capable of synthesising new data from a learnt distribution. We introduce DISPR, a diffusion-based model for solving the inverse problem of three-dimensional (3D) cell shape prediction from two-dimensional (2D) single cell microscopy images. Using the 2D microscopy image as a prior, DISPR is conditioned to predict realistic 3D shape reconstructions. To showcase the applicability of DISPR as a data augmentation tool in a feature-based single cell classification task, we extract morphological features from the red blood cells grouped into six highly imbalanced classes. Adding features from the DISPR predictions to the three minority classes improved the macro F1 score from $F1_\text{macro} = 55.2 \pm 4.6\%$ to $F1_\text{macro} = 72.2 \pm 4.9\%$. We thus demonstrate that diffusion models can be successfully applied to inverse biomedical problems, and that they learn to reconstruct 3D shapes with realistic morphological features from 2D microscopy images.
    On the Robustness of Text Vectorizers. (arXiv:2303.07203v1 [cs.CL] CROSS LISTED)
    A fundamental issue in natural language processing is the robustness of the models with respect to changes in the input. One critical step in this process is the embedding of documents, which transforms sequences of words or tokens into vector representations. Our work formally proves that popular embedding schemes, such as concatenation, TF-IDF, and Paragraph Vector (a.k.a. doc2vec), exhibit robustness in the H\"older or Lipschitz sense with respect to the Hamming distance. We provide quantitative bounds for these schemes and demonstrate how the constants involved are affected by the length of the document. These findings are exemplified through a series of numerical examples.
    Information-Theoretic Regret Bounds for Bandits with Fixed Expert Advice. (arXiv:2303.08102v1 [cs.LG])
    We investigate the problem of bandits with expert advice when the experts are fixed and known distributions over the actions. Improving on previous analyses, we show that the regret in this setting is controlled by information-theoretic quantities that measure the similarity between experts. In some natural special cases, this allows us to obtain the first regret bound for EXP4 that can get arbitrarily close to zero if the experts are similar enough. While for a different algorithm, we provide another bound that describes the similarity between the experts in terms of the KL-divergence, and we show that this bound can be smaller than the one of EXP4 in some cases. Additionally, we provide lower bounds for certain classes of experts showing that the algorithms we analyzed are nearly optimal in some cases.
    Multiway clustering of 3-order tensor via affinity matrix. (arXiv:2303.07757v1 [cs.LG])
    We propose a new method of multiway clustering for 3-order tensors via affinity matrix (MCAM). Based on a notion of similarity between the tensor slices and the spread of information of each slice, our model builds an affinity/similarity matrix on which we apply advanced clustering methods. The combination of all clusters of the three modes delivers the desired multiway clustering. Finally, MCAM achieves competitive results compared with other known algorithms on synthetics and real datasets.
    Style Feature Extraction Using Contrastive Conditioned Variational Autoencoders with Mutual Information Constraints. (arXiv:2303.08068v1 [cs.CV])
    It is crucial to extract fine-grained features such as styles from unlabeled data in data analysis. Unsupervised methods, such as variational autoencoders (VAEs), can extract styles, but the extracted styles are usually mixed with other features. We can isolate the styles using VAEs conditioned by class labels, known as conditional VAEs (CVAEs). However, methods to extract only styles using unlabeled data are not established. In this paper, we construct a CVAE-based method that extracts style features using only unlabeled data. The proposed model roughly consists of two parallel parts; a contrastive learning (CL) part that extracts style-independent features and a CVAE part that extracts style features. CL models generally learn representations independent of data augmentation, which can be seen as a perturbation in styles, in a self-supervised way. Taking the style-independent features as a condition, the CVAE learns to extract only styles. In the training procedure, a CL model is trained beforehand, and then the CVAE is trained while the CL model is fixed. Additionally, to prevent the CVAE from learning to ignore the condition and failing to extract only styles, we introduce a constraint based on mutual information between the CL features and the VAE features. Experiments on two simple datasets, MNIST and an original dataset based on Google Fonts, show that the proposed method efficiently extracts style features. Further experiments using real-world natural image datasets also show the method's extendability.
    Efficient Bayesian Physics Informed Neural Networks for Inverse Problems via Ensemble Kalman Inversion. (arXiv:2303.07392v1 [stat.ML])
    Bayesian Physics Informed Neural Networks (B-PINNs) have gained significant attention for inferring physical parameters and learning the forward solutions for problems based on partial differential equations. However, the overparameterized nature of neural networks poses a computational challenge for high-dimensional posterior inference. Existing inference approaches, such as particle-based or variance inference methods, are either computationally expensive for high-dimensional posterior inference or provide unsatisfactory uncertainty estimates. In this paper, we present a new efficient inference algorithm for B-PINNs that uses Ensemble Kalman Inversion (EKI) for high-dimensional inference tasks. We find that our proposed method can achieve inference results with informative uncertainty estimates comparable to Hamiltonian Monte Carlo (HMC)-based B-PINNs with a much reduced computational cost. These findings suggest that our proposed approach has great potential for uncertainty quantification in physics-informed machine learning for practical applications.
    Best arm identification in rare events. (arXiv:2303.07627v1 [cs.LG])
    We consider the best arm identification problem in the stochastic multi-armed bandit framework where each arm has a tiny probability of realizing large rewards while with overwhelming probability the reward is zero. A key application of this framework is in online advertising where click rates of advertisements could be a fraction of a single percent and final conversion to sales, while highly profitable, may again be a small fraction of the click rates. Lately, algorithms for BAI problems have been developed that minimise sample complexity while providing statistical guarantees on the correct arm selection. As we observe, these algorithms can be computationally prohibitive. We exploit the fact that the reward process for each arm is well approximated by a Compound Poisson process to arrive at algorithms that are faster, with a small increase in sample complexity. We analyze the problem in an asymptotic regime as rarity of reward occurrence reduces to zero, and reward amounts increase to infinity. This helps illustrate the benefits of the proposed algorithm. It also sheds light on the underlying structure of the optimal BAI algorithms in the rare event setting.
    Explanation Shift: Investigating Interactions between Models and Shifting Data Distributions. (arXiv:2303.08081v1 [cs.LG])
    As input data distributions evolve, the predictive performance of machine learning models tends to deteriorate. In practice, new input data tend to come without target labels. Then, state-of-the-art techniques model input data distributions or model prediction distributions and try to understand issues regarding the interactions between learned models and shifting distributions. We suggest a novel approach that models how explanation characteristics shift when affected by distribution shifts. We find that the modeling of explanation shifts can be a better indicator for detecting out-of-distribution model behaviour than state-of-the-art techniques. We analyze different types of distribution shifts using synthetic examples and real-world data sets. We provide an algorithmic method that allows us to inspect the interaction between data set features and learned models and compare them to the state-of-the-art. We release our methods in an open-source Python package, as well as the code used to reproduce our experiments.
    Deep Learning-Based Estimation and Goodness-of-Fit for Large-Scale Confirmatory Item Factor Analysis. (arXiv:2109.09500v2 [stat.ML] UPDATED)
    We investigate novel parameter estimation and goodness-of-fit (GOF) assessment methods for large-scale confirmatory item factor analysis (IFA) with many respondents, items, and latent factors. For parameter estimation, we extend Urban and Bauer's (2021) deep learning algorithm for exploratory IFA to the confirmatory setting by showing how to handle constraints on loadings and factor correlations. For GOF assessment, we explore simulation-based tests and indices that extend the classifier two-sample test (C2ST), a method that tests whether a deep neural network can distinguish between observed data and synthetic data sampled from a fitted IFA model. Proposed extensions include a test of approximate fit wherein the user specifies what percentage of observed and synthetic data should be distinguishable as well as a relative fit index (RFI) that is similar in spirit to the RFIs used in structural equation modeling. Via simulation studies, we show that: (1) the confirmatory extension of Urban and Bauer's (2021) algorithm obtains comparable estimates to a state-of-the-art estimation procedure in less time; (2) C2ST-based GOF tests control the empirical type I error rate and detect when the latent dimensionality is misspecified; and (3) the sampling distribution of the C2ST-based RFI depends on the sample size.
    Fast Rates for Maximum Entropy Exploration. (arXiv:2303.08059v1 [stat.ML])
    We consider the reinforcement learning (RL) setting, in which the agent has to act in unknown environment driven by a Markov Decision Process (MDP) with sparse or even reward free signals. In this situation, exploration becomes the main challenge. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization that was previously considered by Hazan et al. (2019) in the discounted setting. For this type of exploration, we propose an algorithm based on a game theoretic representation that has $\widetilde{\mathcal{O}}(H^3 S^2 A / \varepsilon^2)$ sample complexity thus improving the $\varepsilon$-dependence of Hazan et al. (2019), where $S$ is a number of states, $A$ is a number of actions, $H$ is an episode length, and $\varepsilon$ is a desired accuracy. The second type of entropy we study is the trajectory entropy. This objective function is closely related to the entropy-regularized MDPs, and we propose a simple modification of the UCBVI algorithm that has a sample complexity of order $\widetilde{\mathcal{O}}(1/\varepsilon)$ ignoring dependence in $S, A, H$. Interestingly enough, it is the first theoretical result in RL literature establishing that the exploration problem for the regularized MDPs can be statistically strictly easier (in terms of sample complexity) than for the ordinary MDPs.
    Nonparametric Multi-shape Modeling with Uncertainty Quantification. (arXiv:2206.09127v3 [stat.ML] UPDATED)
    The modeling and uncertainty quantification of closed curves is an important problem in the field of shape analysis, and can have significant ramifications for subsequent statistical tasks. Many of these tasks involve collections of closed curves, which often exhibit structural similarities at multiple levels. Modeling multiple closed curves in a way that efficiently incorporates such between-curve dependence remains a challenging problem. In this work, we propose and investigate a multiple-output (a.k.a. multi-output), multi-dimensional Gaussian process modeling framework. We illustrate the proposed methodological advances, and demonstrate the utility of meaningful uncertainty quantification, on several curve and shape-related tasks. This model-based approach not only addresses the problem of inference on closed curves (and their shapes) with kernel constructions, but also opens doors to nonparametric modeling of multi-level dependence for functional objects in general.
    Axiomatic characterization of pointwise Shapley decompositions. (arXiv:2303.07773v1 [q-fin.MF])
    A common problem in various applications is the additive decomposition of the output of a function with respect to its input variables. Functions with binary arguments can be axiomatically decomposed by the famous Shapley value. For the decomposition of functions with real arguments, a popular method is the pointwise application of the Shapley value on the domain. However, this pointwise application largely ignores the overall structure of functions. In this paper, axioms are developed which fully preserve functional structures and lead to unique decompositions for all Borel measurable functions.
    DBSCAN of Multi-Slice Clustering for three-order Tensor. (arXiv:2303.07768v1 [cs.LG])
    Several methods for triclustering three-dimensional data require the cluster size or the number of clusters in each dimension to be specified. To address this issue, the Multi-Slice Clustering (MSC) for 3-order tensor finds signal slices that lie in a low dimensional subspace for a rank-one tensor dataset in order to find a cluster based on the threshold similarity. We propose an extension algorithm called MSC-DBSCAN to extract the different clusters of slices that lie in the different subspaces from the data if the dataset is a sum of r rank-one tensor (r > 1). Our algorithm uses the same input as the MSC algorithm and can find the same solution for rank-one tensor data as MSC.

  • Open

    [P] We are building a curated list of awesome curated list closely related to machine learning, looking for contributions.
    Hey r/MachineLearning, We are collecting a hand-crafted curated list of awesome curated lists closely related to machine learning. Here is the link to the Github repo: https://github.com/zhimin-z/awesome-awesome-machine-learning Do any lists need to be included from your perspective? Please let me know, or feel free to submit a pull request. The motivation underlying this project is that so many awesome lists regarding machine learning exist on GitHub. But, gradually, it adds a mental burden to memorize where to look for when the ML world is progressing faster and faster these days. Thus, there the project comes, as a unification to sew together all awesome lists closely related to machine learning. submitted by /u/happybirdie007 [link] [comments]  ( 43 min )
    [D] Does anyone have a pdf of Hinton’s talk “Aetherial Symbols”?
    This talk got referenced in something I was reading, and I was really interested in checking it out, but the links all seem to this https://drive.google.com/file/d/0B8i61jl8OE3XdHRCSkV1VFNqTWc/view, which is no longer publicly accessible. I was wondering if anyone had a copy somewhere submitted by /u/The_Sundark [link] [comments]  ( 43 min )
    [N] FastKafka - free open source python lib for building Kafka-based services
    We were searching for something like FastAPI for Kafka-based service we were developing, but couldn’t find anything similar. So we shamelessly made one by reusing beloved paradigms from FastAPI and we shamelessly named it FastKafka. The point was to set the expectations right - you get pretty much what you would expect: function decorators for consumers and producers with type hints specifying Pydantic classes for JSON encoding/decoding, automatic message routing to Kafka brokers and documentation generation. Please take a look and tell us how to make it better. Our goal is to make using it as easy as possible for someone with experience with FastAPI. https://github.com/airtai/fastkafka submitted by /u/davorrunje [link] [comments]  ( 43 min )
    [News] OpenAI Announced GPT-4
    Research blog: https://openai.com/research/gpt-4 Product demo: https://openai.com/product/gpt-4 Research report: https://cdn.openai.com/papers/gpt-4.pdf API waitlist: https://openai.com/waitlist/gpt-4-api Twitter announcement: https://twitter.com/OpenAI/status/1635687373060317185 OpenAI developer livestream: https://www.youtube.com/watch?v=outcGtbnMuQ submitted by /u/shitty-greentext [link] [comments]  ( 48 min )
    [D] Query regarding regression
    I performed linear regression on this dataset - https://www.kaggle.com/datasets/yasserh/uber-fares-dataset But I have some questions / concerns - Some of the top solutions have derived the time (month number, day of the week [mon = 1,...]) but while doing regression they didn't do one hot encoding for that even though it is a categorical data. They have also used year in the regression but I fail to see how that will be relevant to the model Now I made the following attributes weekday check (binary field) - 1 for weekday, 0 for weekend time of the day - based on the pickup time I divided the data in 3 parts 00;00 - 08:00 - morning, 08:00 - 16:00 - afternoon, 16:00 - 24:00 - night quarter - divided the months in quarters - Q1 from Jan to March ..... The problem I'm facing is how to do correlation between the variables since some variables are continuous, some are binary and some are converted after one hot encoding. Also, after doing regression I'm getting really low P-values. Like for example = for distance travelled - P val is = 0 (straight up 0) for others as well it is E-40, E-120, stuff like that Is this right ? cause such P values are really too good to be true. Thank you so much !! submitted by /u/zoro_245 [link] [comments]  ( 44 min )
    [D]Query on the uniqueness of GPT-based chatbots
    I have this question bugging me, and I'm a noob to this. So, if ChatGPT and the likes are all LLMs, built on GPT, and are trained with the same data like from Github, Wikipedia and such, won't they be giving more or less the same answer if each is separately asked the same question? submitted by /u/datanutting [link] [comments]  ( 43 min )
    [D] On research directions being "out of date"
    For the papers we have submitted in recent years, there has been a significant increase in the number of reviewers whose only complaint is the paper not following a "hip" version of the research topic. They don't care about the results and don't care about the merit of the work, their problem is that our work does not follow the trend. It feels like there is this subset of reviewers see anything that is more than a year old as "out of date" and a reason for rejection. Have we been unlucky with our reviewer bingo recently or is this the case for others as well? submitted by /u/redlow0992 [link] [comments]  ( 7 min )
    Interesting sources on Anomaly Detection [R]
    Hi everyone! I would like to deepen my knowledge of the field of Anomaly Detection, both for professional reasons and pure curiosity. Do you know some sources (of any kind, may be books, web, articles, etc.) that I can study? My goal is to gain a profound knowledge of the matter, so also very specific material is welcome. Thank you! ​ p.s. For background, I am a Data Scientist and have a Master's Degree in Mathematical Engineering submitted by /u/zero_redditer [link] [comments]  ( 43 min )
    [P] Enriched Huggingface dataset (+embeddings, baseline, edge cases) for the DCASE Anomalous Sound Detection challenge
    Hey r/MachineLearning, the DCASE sound event detection challenges have recently started! Generally speaking, challenges are a big part of the ML community. These are typically very model-centric: The dataset is given in terms of datapoints/labels and the evaluation is purely quantitatively. In real-world use cases, it is often a better idea to iterate on the data (data-centric AI, DCAI). We believe that this view can also be beneficial in a challenge setting. In order to popularize this DCAI approach, we have built an enriched Huggingface dataset for the DCASE Task2 Challenge: https://huggingface.co/datasets/renumics/dcase23-task2-enriched https://preview.redd.it/wtv1b9ai7pna1.png?width=1920&format=png&auto=webp&s=0d6e8aa841c8b1ddf4a59091f8db2fee31c0b8ff The dataset can be loaded with a few lines of code and allows you to quickly: Understand the data distribution based on embeddings and manual inspection Understand critical data points based on baseline and anomaly detection results Leverage the HF model ecosystem for your trainings ​ Would love to hear honest feedback on this. If you find concrete problems in the workflow, feel free to submit an issue on our Github: https://github.com/Renumics/spotlight We are currently thinking which benchmark datasets we should do next. Is there a dataset that you could recommend? Best, Stefan submitted by /u/44sps [link] [comments]  ( 43 min )
    [D] 2022 State of Competitive ML -- The Downfall of TensorFlow
    It's shocking to see just how far TensorFlow has fallen. The 2022 state of competitive machine learning report came out recently and paints a very grim picture -- only 4% of winning projects are built with TensorFlow. This starkly contrasts with a few years ago, when TensorFlow owned the deep learning landscape. Overall, poor architectural decisions led to abandonment from the community, and a monopoly-style view of ML led to a further lack of adoption from necessary tool chains in the ML ecosystem. The TensorFlow team tried to fix all of this with the TensorFlow v2 refactor, but it was too little, too late, and it abandoned the core piece TensorFlow was still holding on to — legacy systems. Check out more here: https://medium.com/@markurtz/2022-state-of-competitive-ml-the-downfall-of-tensorflow-e2577c499a4d submitted by /u/markurtz [link] [comments]  ( 50 min )
    "[D]" ,"[R]" Applications of Deep belief Networks
    What will be the future of deep belief networks, RBM in the current ML research? What are the advantage of these compared to existing models ? Are there any practical applications of these methods? submitted by /u/frodo_mavinchotil [link] [comments]  ( 42 min )
    [D] NLP - Merging token embeddings for smaller input sizes
    We all know that one of the main problems with current LLMs is their limited input size. However, for certain applications like code modeling, joining common tokens into a single one can make sense and reduce the vocabulary drastically. Example: if you are modeling Python code, probably you can consider `import` as a single token, instead of having two tokens like `im` + `port`. Does this work in practice? Are there any resources on this? Maybe averaging the tokens into a single embedding and adding that to the vocabulary and tokenizer is enough? I've seen some work on token merging for images, but not for text. Thank you in advance! submitted by /u/JClub [link] [comments]  ( 44 min )
    [R] Training Small Diffusion Model
    Does anyone have experience training a small diffusion model conditioned on text captions from scratch on 64x64 images or possibly even smaller? I would like to run it only on images of text to see if it is able to render text. How long would this potentially take if I ran it on 1-2 GPUs? Is this something that’s even possible? submitted by /u/crappr [link] [comments]  ( 43 min )
    [P] Build a Question Answer system/chat bot trained on documentation.
    Hi everyone! I'm working on a side project for my company where the goal is to train an ML model on the company's documentation. We should then be able to ask it any question based on the docs and it should generate a concise response( something like what chatgpt does). How can I achieve this? Thanks you in advance :) submitted by /u/Haunting-King7640 [link] [comments]  ( 44 min )
    [D] Comparing models implemented in PyTorch and Tensorflow
    Hola! I am working on comparing some models, few of which have been implemented in PyTorch and the rest of them in Tensorflow (some in 1.x and others in 2.x versions). I know if they are implemented well, one should be able to simply compare their graphs/performances regardless of the platform. But often there might be some subtle differences in the implementations (within the platforms themselves and the way model code utilizes it) that can make it painful to trust the training. Some models are from official sources so I'd rather not verify much of their code before using them. Of course, I don't want to implement all of them into a single platform unless I must. If you have come across such a problem, how have you dealt with it? Are there certain tests you would conduct to ensure the loss curves can be compared? How would you go about this issue other than finding someone else's implementation of say, a TF model in PyTorch, and verifying it? Sincerely, A man in crisis. submitted by /u/chaotycmunkey [link] [comments]  ( 7 min )
    Productionize training pipeline vs model artifact? [D]
    Let's say you have ETL, training, and inference pipelines. Is it best practices to promote all pipelines to production (you will have one model artifact in dev env and one model artifact in prod env) or keep the training pipeline in dev and only promote the resulting model artifact + ETL/inference pipelines? Why? submitted by /u/Secret_Valuable_Yes [link] [comments]  ( 43 min )
  • Open

    GPT-4 Has Arrived — Here’s What You Need To Know
    submitted by /u/SupPandaHugger [link] [comments]  ( 41 min )
    Ai Generated 12 Viral Looks of Harry Potter from Different Countries! - Graphics Gaga
    submitted by /u/psprady [link] [comments]  ( 41 min )
    I just open-sourced CoverLetterGPT.xyz
    submitted by /u/hottown [link] [comments]  ( 41 min )
    Man Reacts to AI Memes
    submitted by /u/VausProd [link] [comments]  ( 41 min )
    Andreessen: "Every kid is going to grow up now with a friend that's a bot. And that bot is going to be with them their whole lives ... It's going to know everything about them. It's going to be able to answer any question ... As close as a machine can get to loving you, it's gonna love you."
    submitted by /u/Farnectarine4825 [link] [comments]  ( 41 min )
    Finally GPT-4 is here!
    submitted by /u/ai-lover [link] [comments]  ( 41 min )
    Google unveils new AI features in Workspace, opens up PaLM API, and brings generative AI to Cloud
    submitted by /u/qptbook [link] [comments]  ( 6 min )
    Kaiber AI generated music video (based on Jucika by Pusztai Pál) for my noise rock song.
    submitted by /u/grondylion [link] [comments]  ( 6 min )
    No Internet? No Problem! Smartphone Can Now Create Images on Its Own
    submitted by /u/webmanpt [link] [comments]  ( 6 min )
    CMU researchers created an AI model that can detect the pose of multiple humans in a room using only WiFi signals.
    submitted by /u/Dalembert [link] [comments]  ( 42 min )
    Visual ChatGPT: Chatbot can now process images
    submitted by /u/Peaking_AI [link] [comments]  ( 41 min )
    The GPT Query
    I have this question bugging me, and I'm a noob to this. So, if ChatGPT and the likes are all LLMs, built on GPT, and are trained with the same data like from Github, Wikipedia and such, won't they be giving more or less the same answer if each is separately asked the same question? submitted by /u/datanutting [link] [comments]  ( 41 min )
    How to Create INSANE AI Art with Just a Few Keywords
    Learn how to create mind-blowing AI art with just a few keywords! This guide will show you how to use an AI model to generate stunning digital art, step by step! https://youtu.be/HmrqjqyxeCo submitted by /u/TheQuestionStation [link] [comments]  ( 41 min )
    So I created a manga using Midjourney... If you’re interested in checking it out, it’s a free download from https://www.comicsauthority.store/product/i-think-my-king-might-be-a-lil-bitch-1-digital-version/
    submitted by /u/MobileFilmmaker [link] [comments]  ( 41 min )
    The Meaning and Impact of High Tech: A Simple Explanation of the Most Advanced Technology
    High Tech term has been used a lot lately but what is High Tech and what is it used for? in simple terms, it refers to the most advanced technology available in any given field or industry. But why is it called High Tech? The term "High" refers to the level of sophistication and complexity involved in the technology and "Tech" is an abbreviation of the word "Technology" in case you didn’t get it which shows the most advanced and complex technology out there. From healthcare and aerospace to telecommunications and entertainment High Tech can be found in all kinds of industries. It has revolutionized the way we live our lives providing us with new products and services such as mobile phones, drones, VA, or even self-driving cars that we could never have imagined before couple years ago. If you're anything like me, you'll be excited to see what new innovations and breakthroughs are on the horizon so let me know your thoughts on High Tech and how far would it go? submitted by /u/TechPioneerAustin [link] [comments]  ( 42 min )
    Gretel and Google Cloud partner on synthetic data for the enterprise
    submitted by /u/Repeat-or [link] [comments]  ( 41 min )
    17 Best ChatGPT Plagiarism Checker Tools
    submitted by /u/webmanpt [link] [comments]  ( 41 min )
    MidJourney V5 releasing any day now: Sneak Peak
    submitted by /u/messyp [link] [comments]  ( 42 min )
  • Open

    Maximize performance and reduce your deep learning training cost with AWS Trainium and Amazon SageMaker
    Today, tens of thousands of customers are building, training, and deploying machine learning (ML) models using Amazon SageMaker to power applications that have the potential to reinvent their businesses and customer experiences. These ML models have been increasing in size and complexity over the last few years, which has led to state-of-the-art accuracies across a […]  ( 9 min )
  • Open

    Future of Education: Application not Regurgitation of Knowledge – Part I
    When I was getting my MBA at the University of Iowa in 1981, my advisor Gary Fethke (who would later serve as University of Iowa interim president and Emeritus Professor in Business Analytics) convinced me to take a PhD class in econometrics.  I think he was trying to punish me or something.  I was totally… Read More »Future of Education: Application not Regurgitation of Knowledge – Part I The post Future of Education: Application not Regurgitation of Knowledge – Part I appeared first on Data Science Central.  ( 23 min )
    Data Science and Machine Learning Mathematical and Statistical Methods
    As a part of my teaching for AI at the University of Oxford, I read a large number of books which are based on the maths of data science.  Data Science and Machine Learning Mathematical and Statistical Methods is a book i recommend if you like the maths of data science. There is a pdf… Read More »Data Science and Machine Learning Mathematical and Statistical Methods The post Data Science and Machine Learning Mathematical and Statistical Methods appeared first on Data Science Central.  ( 20 min )
    DSC Weekly 14 March 2023 – Our Revamped Submission Guidelines
    Announcements Our Revamped Submission Guidelines Since our migration to WordPress, we have been looking to solidify a set of guidelines for writers to look at prior to submitting that will give them a rough idea of the quality standards the editors are looking for. Many of you will be familiar with our Tips and Tricks… Read More »DSC Weekly 14 March 2023 – Our Revamped Submission Guidelines The post DSC Weekly 14 March 2023 – Our Revamped Submission Guidelines appeared first on Data Science Central.  ( 20 min )
  • Open

    GPT-4
    submitted by /u/nickb [link] [comments]  ( 41 min )
  • Open

    Learning from deep learning: a case study of feature discovery and validation in pathology
    Posted by Ellery Wulczyn and Yun Liu, Google Research When a patient is diagnosed with cancer, one of the most important steps is examination of the tumor under a microscope by pathologists to determine the cancer stage and to characterize the tumor. This information is central to understanding clinical prognosis (i.e., likely patient outcomes) and for determining the most appropriate treatment, such as undergoing surgery alone versus surgery plus chemotherapy. Developing machine learning (ML) tools in pathology to assist with the microscopic review represents a compelling research area with many potential applications. Previous studies have shown that ML can accurately identify and classify tumors in pathology images and can even predict patient prognosis using known pathology feature…  ( 92 min )
  • Open

    Has anyone implemented a solution for simple_world_comm, from PettingZoo?
    https://pettingzoo.farama.org/environments/mpe/simple_world_comm/ I've been doing some experimentation with MARL, and it'd be useful to have a baseline to compare to when solving this environment. It seems fairly popular, and was based off of a popular OpenAI paper, so I have to figure someone's got a saved model somewhere, but search engines aren't getting me anywhere. submitted by /u/Efficient_Star_1336 [link] [comments]  ( 41 min )
    Looking for Maintainers/Contributors for Metaworld
    Hey, I'm Jordan Terry, I'm the CEO of the Farama Foundation (farama.org), the maintainers of Gym/Gymnasium, PettingZoo, and a lot of other major open source reinforcement learning libraries. Right now we're working on doing a big push to dramatically overhaul the Metaworld library (https://github.com/Farama-Foundation/Metaworld), including their API, adding documentation, updating the MuJoCo version, and other massive quality of life features. If anyone would be interested in contributing to this work (and getting your name on an upcoming Metaworld 2.0 paper), please contact Reggie at rmclean@farama.org. submitted by /u/jkterry1 [link] [comments]  ( 42 min )
    Why chatgpt needs reinforcement learning
    Hello everyone, I'm a newer for RL and I have some questions after watching the "Reinforcement Learning from Human Feedback: From Zero to chatGPT" course from HuggingFace. Why is RL necessary? Once we have obtained the Reward model, why not directly use it as a loss term and maximize it? What are the benefits and significance of using RL? Is it because the decoder in GPT involves a multi-stage decision-making process? If I have a one-step generation model, such as a GAN in the image field, do I still need RL? submitted by /u/Difficult-Win8257 [link] [comments]  ( 42 min )
    Introducing EasyDreamer: A Simplified PyTorch Re-Implementation of Dreamer-v1
    Hello everyone! I'm excited to share with you our recent re-implementation of the Dreamer-v1 algorithm, called EasyDreamer. As you know, Dreamer is a state-of-the-art reinforcement learning algorithm that achieves impressive results with high sample efficiency. In fact, Dreamer-v3 recently tackled a long-standing challenge of collecting diamonds in Minecraft without the need for human data or curricula. EasyDreamer, is a simplified version of Dreamer using PyTorch. It is designed to be more accessible for researchers and practitioners who are already familiar with PyTorch, allowing them to easily gain a deeper understanding of how the algorithm works and test their own ideas more efficiently. I hope you find it useful and please feel free to check out our repository for more details! ​ https://github.com/kc-ml2/SimpleDreamer submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 42 min )
    How to search the game tree with depth-first search?
    The idea is to use a multi core CPU with highly optimized C++ code to traverse the game tree of TicTacToe. This will allow to win any game. How can i do so? submitted by /u/ManuelRodriguez331 [link] [comments]  ( 42 min )
    Distributed implementation tips
    If you want to implement e.g. a distributed DQN (parallel, not distributional returns), which framework do people use? I have implemented in Ray, but my god it slows my code down massively. I’ve profiled it and the overhead seems to just come from using ray.get to get the data from the distributed actors. I’m hesitant to use their RLlib because it looks super confusing if I want to implement my own custom algorithm, so I’m wondering if anybody has any tips on how better to optimise my code, whether it be a different framework or how to better utilise ray. Currently my workflow is like this: Have N workers gather a batch size of B, pull this to the central algorithm class and store to the replay buffer, and then perform M updates. I found this to be much quicker than how I originally implemented which was where I had actors periodically pushing data to a decentralised buffer and the learner pulling from this buffer, not in sync, but I found this even slower because there is now two rounds of serialisation/deserialisation from when you put the data into the replay buffer and also when you pull it from the replay buffer. submitted by /u/DefinitelyNot4Burner [link] [comments]  ( 43 min )
  • Open

    GPT-4
    We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.  ( 15 min )
  • Open

    Shuffle product
    The shuffle product of two words, w1 and w2, written w1 Ш w2, is the set of all words formed by the letters in w1 and w2, preserving the order of each word’s letters. The name comes from the analogy with doing a riffle shuffle of two decks of cards. For example, bcd Ш ae, […] Shuffle product first appeared on John D. Cook.  ( 6 min )

  • Open

    [P] ControlNetInpaint: No extra training and you can use 📝text +🌌image + 😷mask to generate new images.
    Hi! Here's an open-source implementation I released today for masked ControlNet synthesis, where you can specify the region that will be synthesised using a mask. The content of the synthesised region is controlled via textual and visual guidance as shown in the README. https://github.com/mikonvergence/ControlNetInpaint Here's an example with a prompt of "a red panda sitting on a bench": https://preview.redd.it/4vxsg9sc0lna1.png?width=1860&format=png&auto=webp&s=1b1b4f832c50fe910dae8a25bb58c71fc479d49f submitted by /u/mikonvergence [link] [comments]  ( 43 min )
    [D] ChatGPT without text limits.
    One of the biggest limitations of large language models is the text limit. This limits their use cases and prohibits more ambitious prompts. This was recently resolved by researchers at Google Brain in Alberta, Canada. In their recent paper they describe a new method of using associative memory which removes the text limit and they also prove that some large language models are universal Turing machines. This will pave the way for entire novels being shared with large language models, personal genomes, etc. The paper talks about the use of "associative memory" which is also known as content-addressable memory (CAM). This type of memory allows the system to retrieve data based on its content rather than its location. Unlike traditional memory systems that use specific memory addresses to…  ( 49 min )
    [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003
    According to the authors, the model performs on par with text-davinci-003 in a small scale human study (the five authors of the paper rated model outputs), despite the Alpaca 7B model being much smaller than text-davinci-003. Read the blog post for details. Blog post: https://crfm.stanford.edu/2023/03/13/alpaca.html Demo: https://crfm.stanford.edu/alpaca/ Code: https://github.com/tatsu-lab/stanford_alpaca submitted by /u/dojoteef [link] [comments]  ( 60 min )
    [R] New grand challenge on generative models for medical imaging
    Want to advance generative AI for medical imaging? 🤖 Join us in the 2023 AAPM Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics! This year, we're calling all the GAN gurus, VAE virtuosos, and diffusion dreamers to showcase their generative genius in developing a model that can accurately learn medical imaging statistics apart from creating beautiful synthetic images with low FID. We all know that generative AI produce unpredictable hallucinations, which is why our goal is to create an evaluation benchmark based on domain-specific statistics meaningful to medical imaging. Register now to become a part of this challenge! https://preview.redd.it/gf4a5d2g4jna1.png?width=1024&format=png&auto=webp&s=02aa4b74847f3df46f634afe68a5d8b31936c7f9 submitted by /u/Hot-Sink1872 [link] [comments]  ( 43 min )
    [D] ICML 2023 Paper Reviews
    ICML 2023 paper reviews are supposed to be released soon. According to the website, they should be released on March 13 (anywhere on earth). I thought to create a discussion thread for us to discuss any issue/complain/celebration or anything else. There is so much noise in the reviews every year. Some good work that the authors are proud of might get a low score because of the noisy system, given that ICML is growing so large these years. We should keep in mind that the work is still valuable no matter what the score is. According to the Program Chair's tweet, it seems that only ~91% of the reviews are submitted. Hopefully it will not delay the release of the reviews and the start of the rebuttal. submitted by /u/zy415 [link] [comments]  ( 52 min )
    [R] MathPrompter: Mathematical Reasoning using Large Language Models. New State of the Art on MultiArith ( 78.7% to 92.5%) with Text-Davinci 002
    Paper - https://arxiv.org/abs/2303.05398 submitted by /u/MysteryInc152 [link] [comments]  ( 45 min )
    [R] Universal Instance Perception as Object Discovery and Retrieval (Video Demo)
    submitted by /u/MasterBin-IIAU [link] [comments]  ( 45 min )
  • Open

    Mining the right transition metals in a vast chemical space
    Computational chemists design better ways of discovering and designing materials for energy applications.  ( 9 min )
    New method accelerates data retrieval in huge databases
    Researchers used machine learning to build faster and more efficient hash functions, which are a key component of databases.  ( 10 min )
  • Open

    A help or tip on this problem
    Hey guys, I have a problem at work that consists of predicting whether a certain part will be missing or not in production. I have data on when the parts were produced, details of the parts characteristics, quantity. But the missing parts data is very small, like 3% of the total parts produced and these 3% cause very big problems to us. At first, I thought of training a model to predict the normal production situation and create a some type of anomaly detector, but I don't know if this is the best way. Have you been through something similar or have any help you can give me? submitted by /u/candimmm [link] [comments]  ( 42 min )
    Image reconstruction
    I have a use-case where (say) N RGB input images are used to reconstruct a single RGB output image, using either an Autoencoder, or a U-Net architecture. More concretely, if N = 18, 18 RGB input images are used as input to a CNN which should then predict one target RGB output image. If the spatial width and height are 90, then one input sample might be (18, 3, 90, 90) which is not batch-size = 18! AFAIK, (18, 3, 90, 90) as input to a CNN will reproduce (18, 3, 90, 90) as output, whereas, I want (3, 90, 90) as the desired output. Any idea how to achieve this? submitted by /u/grid_world [link] [comments]  ( 42 min )
    A New AI Research Introduces Multitask Prompt Tuning (MPT) For Transfer Learning
    submitted by /u/Laser_Gladiator [link] [comments]  ( 6 min )
  • Open

    Open-Source AI LabGym Helps Researchers Analyze Animal Behaviors
    submitted by /u/webmanpt [link] [comments]  ( 41 min )
    A Sci-Fi Movie Written and Directed by an Artificial Intelligence! (chatGPT)
    submitted by /u/webmanpt [link] [comments]  ( 42 min )
  • Open

    How VMware built an MLOps pipeline from scratch using GitLab, Amazon MWAA, and Amazon SageMaker
    This post is co-written with Mahima Agarwal, Machine Learning Engineer, and Deepak Mettem, Senior Engineering Manager, at VMware Carbon Black VMware Carbon Black is a renowned security solution offering protection against the full spectrum of modern cyberattacks. With terabytes of data generated by the product, the security analytics team focuses on building machine learning (ML) […]  ( 11 min )
    Few-click segmentation mask labeling in Amazon SageMaker Ground Truth Plus
    Amazon SageMaker Ground Truth Plus is a managed data labeling service that makes it easy to label data for machine learning (ML) applications. One common use case is semantic segmentation, which is a computer vision ML technique that involves assigning class labels to individual pixels in an image. For example, in video frames captured by […]  ( 7 min )
  • Open

    Prime numbers and Taylor’s law
    The previous post commented that although the digits in the decimal representation of π are not random, it is sometimes useful to think of them as random. Similarly, it is often useful to think of prime numbers as being randomly distributed. If prime numbers were samples from a random variable, it would be natural to […] Prime numbers and Taylor’s law first appeared on John D. Cook.  ( 6 min )
  • Open

    What Are Foundation Models?
    The mics were live and tape was rolling in the studio where the Miles Davis Quintet was recording dozens of tunes in 1956 for Prestige Records. When an engineer asked for the next song’s title, Davis shot back, “I’ll play it, and tell you what it is later.” Like the prolific jazz trumpeter and composer, Read article >  ( 9 min )
  • Open

    How to Implement a Data Privacy and Protection Strategy for Remote Teams
    (Image Source) Remote work has skyrocketed in the last three years. And with that comes increased productivity, happier employees, and lower overhead costs. But unfortunately, it’s not all sunshine and rainbows for companies with remote teams. Studies show that employees working from home increase the frequency of cyberattacks by 238%. And with the global average… Read More »How to Implement a Data Privacy and Protection Strategy for Remote Teams The post How to Implement a Data Privacy and Protection Strategy for Remote Teams appeared first on Data Science Central.  ( 23 min )
    It Takes a Village to Protect and Steer Data Flow
    Back in 2016, my screen froze while filling out the Census online. I felt a sense of unease, mainly because I know that time and time again we’ve seen data collected for good intentions later exploited in disturbing and unintended ways. I didn’t know if my data might be taken, by whom, or what it… Read More »It Takes a Village to Protect and Steer Data Flow The post It Takes a Village to Protect and Steer Data Flow appeared first on Data Science Central.  ( 20 min )
  • Open

    Gymnasium MuJoCo Env Resetting Itself?
    So I'm new to using MuJoCo and I never had this kind of problem in the past using openai's gym environments. Currently, I'm having this problem where a gymnasium MuJoCo env seem to be calling its own reset() function, making it impossible for the agent to handle the termination (it will think the episode hasn't ended still). Furthermore, it seems that when I'm using multiple MuJoCo envs (in series not in parallel), the reset() call occurs so often that, for the agent, the environment never ends. Is there a special way to create multiple MuJoCo environments and is this behavior normal? submitted by /u/_ianmi [link] [comments]  ( 42 min )
    "Rewarding Chatbots for Real-World Engagement with Millions of Users", Irvine et al 2023
    submitted by /u/gwern [link] [comments]  ( 41 min )
    Understanding Action Masking in RLlib
    Hello, I'm new to reinforcement learning. For a project I'm working on I created a custom multi-agent ParallelEnv in PettingZoo and plan to train it using RLlib. I am attempting to limit my action space by defining a set of rules that if violated would constitute an invalid action. I understand vaguely that I can use action masking for this, but there doesn't seem to be any clear tutorials on this, and I've tried sifting through code to understand this but am still having difficulties understanding its implementation. Would anyone have any clear demonstrations of action masking in an environment and then how that is used in RLlib during training? Thank in advance! submitted by /u/Ealta1 [link] [comments]  ( 43 min )
  • Open

    Joint Optimization of Energy Consumption and Completion Time in Federated Learning. (arXiv:2209.14900v2 [cs.LG] UPDATED)
    Federated Learning (FL) is an intriguing distributed machine learning approach due to its privacy-preserving characteristics. To balance the trade-off between energy and execution latency, and thus accommodate different demands and application scenarios, we formulate an optimization problem to minimize a weighted sum of total energy consumption and completion time through two weight parameters. The optimization variables include bandwidth, transmission power and CPU frequency of each device in the FL system, where all devices are linked to a base station and train a global model collaboratively. Through decomposing the non-convex optimization problem into two subproblems, we devise a resource allocation algorithm to determine the bandwidth allocation, transmission power, and CPU frequency for each participating device. We further present the convergence analysis and computational complexity of the proposed algorithm. Numerical results show that our proposed algorithm not only has better performance at different weight parameters (i.e., different demands) but also outperforms the state of the art.  ( 2 min )
    Metrizing Fairness. (arXiv:2205.15049v3 [cs.LG] UPDATED)
    We study supervised learning problems for predicting properties of individuals who belong to one of two demographic groups, and we seek predictors that are fair according to statistical parity. This means that the distributions of the predictions within the two groups should be close with respect to the Kolmogorov distance, and fairness is achieved by penalizing the dissimilarity of these two distributions in the objective function of the learning problem. In this paper, we showcase conceptual and computational benefits of measuring unfairness with integral probability metrics (IPMs) other than the Kolmogorov distance. Conceptually, we show that the generator of any IPM can be interpreted as a family of utility functions and that unfairness with respect to this IPM arises if individuals in the two demographic groups have diverging expected utilities. We also prove that the unfairness-regularized prediction loss admits unbiased gradient estimators if unfairness is measured by the squared $\mathcal L^2$-distance or by a squared maximum mean discrepancy. In this case, the fair learning problem is susceptible to efficient stochastic gradient descent (SGD) algorithms. Numerical experiments on real data show that these SGD algorithms outperform state-of-the-art methods for fair learning in that they achieve superior accuracy-unfairness trade-offs -- sometimes orders of magnitude faster. Finally, we identify conditions under which statistical parity can improve prediction accuracy.  ( 2 min )
    A novel notion of barycenter for probability distributions based on optimal weak mass transport. (arXiv:2102.13380v4 [stat.ML] UPDATED)
    We introduce weak barycenters of a family of probability distributions, based on the recently developed notion of optimal weak transport of mass by Gozlanet al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical analysis of this object and discuss its interpretation in the light of convex ordering between probability measures. In particular, we show that, rather than averaging the input distributions in a geometric way (as the Wasserstein barycenter based on classic optimal transport does) weak barycenters extract common geometric information shared by all the input distributions, encoded as a latent random variable that underlies all of them. We also provide an iterative algorithm to compute a weak barycenter for a finite family of input distributions, and a stochastic algorithm that computes them for arbitrary populations of laws. The latter approach is particularly well suited for the streaming setting, i.e., when distributions are observed sequentially. The notion of weak barycenter and our approaches to compute it are illustrated on synthetic examples, validated on 2D real-world data and compared to standard Wasserstein barycenters.  ( 2 min )
    DM-NeRF: 3D Scene Geometry Decomposition and Manipulation from 2D Images. (arXiv:2208.07227v2 [cs.CV] UPDATED)
    In this paper, we study the problem of 3D scene geometry decomposition and manipulation from 2D views. By leveraging the recent implicit neural representation techniques, particularly the appealing neural radiance fields, we introduce an object field component to learn unique codes for all individual objects in 3D space only from 2D supervision. The key to this component is a series of carefully designed loss functions to enable every 3D point, especially in non-occupied space, to be effectively optimized even without 3D labels. In addition, we introduce an inverse query algorithm to freely manipulate any specified 3D object shape in the learned scene representation. Notably, our manipulation algorithm can explicitly tackle key issues such as object collisions and visual occlusions. Our method, called DM-NeRF, is among the first to simultaneously reconstruct, decompose, manipulate and render complex 3D scenes in a single pipeline. Extensive experiments on three datasets clearly show that our method can accurately decompose all 3D objects from 2D views, allowing any interested object to be freely manipulated in 3D space such as translation, rotation, size adjustment, and deformation.  ( 2 min )
    On the Fusion Strategies for Federated Decision Making. (arXiv:2303.06109v1 [cs.LG])
    We consider the problem of information aggregation in federated decision making, where a group of agents collaborate to infer the underlying state of nature without sharing their private data with the central processor or each other. We analyze the non-Bayesian social learning strategy in which agents incorporate their individual observations into their opinions (i.e., soft-decisions) with Bayes rule, and the central processor aggregates these opinions by arithmetic or geometric averaging. Building on our previous work, we establish that both pooling strategies result in asymptotic normality characterization of the system, which, for instance, can be utilized in order to give approximate expressions for the error probability. We verify the theoretical findings with simulations and compare both strategies.  ( 2 min )
    Accelerating Distributed Deep Reinforcement Learning by In-Network Experience Sampling. (arXiv:2110.13506v3 [cs.DC] UPDATED)
    A computing cluster that interconnects multiple compute nodes is used to accelerate distributed reinforcement learning based on DQN (Deep Q-Network). In distributed reinforcement learning, Actor nodes acquire experiences by interacting with a given environment and a Learner node optimizes their DQN model. Since data transfer between Actor and Learner nodes increases depending on the number of Actor nodes and their experience size, communication overhead between them is one of major performance bottlenecks. In this paper, their communication is accelerated by DPDK-based network optimizations, and DPDK-based low-latency experience replay memory server is deployed between Actor and Learner nodes interconnected with a 40GbE (40Gbit Ethernet) network. Evaluation results show that, as a network optimization technique, kernel bypassing by DPDK reduces network access latencies to a shared memory server by 32.7% to 58.9%. As another network optimization technique, an in-network experience replay memory server between Actor and Learner nodes reduces access latencies to the experience replay memory by 11.7% to 28.1% and communication latencies for prioritized experience sampling by 21.9% to 29.1%.  ( 2 min )
    Advancing Spiking Neural Networks towards Deep Residual Learning. (arXiv:2112.08954v3 [cs.NE] UPDATED)
    Despite the rapid progress of neuromorphic computing, inadequate capacity and insufficient representation power of spiking neural networks (SNNs) severely restrict their application scope in practice. Residual learning and shortcuts have been evidenced as an important approach for training deep neural networks, but rarely did previous work assess their applicability to the characteristics of spike-based communication and spatiotemporal dynamics. In this paper, we first identify that this negligence leads to impeded information flow and the accompanying degradation problem in previous residual SNNs. To address this issue, we propose a novel SNN-oriented residual architecture termed MS-ResNet, which establishes membrane-based shortcut pathways, and further prove that the gradient norm equality can be achieved in MS-ResNet by introducing block dynamical isometry theory, which ensures the network can be well-behaved in a depth-insensitive way. Thus we are able to significantly extend the depth of directly trained SNNs, e.g., up to 482 layers on CIFAR-10 and 104 layers on ImageNet, without observing any slight degradation problem. To validate the effectiveness of MS-ResNet, experiments on both frame-based and neuromorphic datasets are conducted. MS-ResNet104 achieves a superior result of 76.02% accuracy on ImageNet, which is the highest to our best knowledge in the domain of directly trained SNNs. Great energy efficiency is also observed, with an average of only one spike per neuron needed to classify an input sample. We believe our powerful and scalable models will provide a strong support for further exploration of SNNs.  ( 2 min )
    Efficient recurrent architectures through activity sparsity and sparse back-propagation through time. (arXiv:2206.06178v3 [cs.LG] UPDATED)
    Recurrent neural networks (RNNs) are well suited for solving sequence tasks in resource-constrained systems due to their expressivity and low computational requirements. However, there is still a need to bridge the gap between what RNNs are capable of in terms of efficiency and performance and real-world application requirements. The memory and computational requirements arising from propagating the activations of all the neurons at every time step to every connected neuron, together with the sequential dependence of activations, contribute to the inefficiency of training and using RNNs. We propose a solution inspired by biological neuron dynamics that makes the communication between RNN units sparse and discrete. This makes the backward pass with backpropagation through time (BPTT) computationally sparse and efficient as well. We base our model on the gated recurrent unit (GRU), extending it with units that emit discrete events for communication triggered by a threshold so that no information is communicated to other units in the absence of events. We show theoretically that the communication between units, and hence the computation required for both the forward and backward passes, scales with the number of events in the network. Our model achieves efficiency without compromising task performance, demonstrating competitive performance compared to state-of-the-art recurrent network models in real-world tasks, including language modeling. The dynamic activity sparsity mechanism also makes our model well suited for novel energy-efficient neuromorphic hardware. Code is available at https://github.com/KhaleelKhan/EvNN/.  ( 2 min )
    EiX-GNN : Concept-level eigencentrality explainer for graph neural networks. (arXiv:2206.03491v6 [cs.AI] UPDATED)
    Nowadays, deep prediction models, especially graph neural networks, have a majorplace in critical applications. In such context, those models need to be highlyinterpretable or being explainable by humans, and at the societal scope, this understandingmay also be feasible for humans that do not have a strong prior knowledgein models and contexts that need to be explained. In the literature, explainingis a human knowledge transfer process regarding a phenomenon between an explainerand an explainee. We propose EiX-GNN (Eigencentrality eXplainer forGraph Neural Networks) a new powerful method for explaining graph neural networksthat encodes computationally this social explainer-to-explainee dependenceunderlying in the explanation process. To handle this dependency, we introducethe notion of explainee concept assimibility which allows explainer to adapt itsexplanation to explainee background or expectation. We lead a qualitative studyto illustrate our explainee concept assimibility notion on real-world data as wellas a qualitative study that compares, according to objective metrics established inthe literature, fairness and compactness of our method with respect to performingstate-of-the-art methods. It turns out that our method achieves strong results inboth aspects.  ( 2 min )
    You Only Need End-to-End Training for Long-Tailed Recognition. (arXiv:2112.05958v4 [cs.CV] UPDATED)
    The generalization gap on the long-tailed data sets is largely owing to most categories only occupying a few training samples. Decoupled training achieves better performance by training backbone and classifier separately. What causes the poorer performance of end-to-end model training (e.g., logits margin-based methods)? In this work, we identify a key factor that affects the learning of the classifier: the channel-correlated features with low entropy before inputting into the classifier. From the perspective of information theory, we analyze why cross-entropy loss tends to produce highly correlated features on the imbalanced data. In addition, we theoretically analyze and prove its impacts on the gradients of classifier weights, the condition number of Hessian, and logits margin-based approach. Therefore, we firstly propose to use Channel Whitening to decorrelate ("scatter") the classifier's inputs for decoupling the weight update and reshaping the skewed decision boundary, which achieves satisfactory results combined with logits margin-based method. However, when the number of minor classes are large, batch imbalance and more participation in training cause over-fitting of the major classes. We also propose two novel modules, Block-based Relatively Balanced Batch Sampler (B3RS) and Batch Embedded Training (BET) to solve the above problems, which makes the end-to-end training achieve even better performance than decoupled training. Experimental results on the long-tailed classification benchmarks, CIFAR-LT and ImageNet-LT, demonstrate the effectiveness of our method.  ( 2 min )
    VALERIAN: Invariant Feature Learning for IMU Sensor-based Human Activity Recognition in the Wild. (arXiv:2303.06048v1 [eess.SP])
    Deep neural network models for IMU sensor-based human activity recognition (HAR) that are trained from controlled, well-curated datasets suffer from poor generalizability in practical deployments. However, data collected from naturalistic settings often contains significant label noise. In this work, we examine two in-the-wild HAR datasets and DivideMix, a state-of-the-art learning with noise labels (LNL) method to understand the extent and impacts of noisy labels in training data. Our empirical analysis reveals that the substantial domain gaps among diverse subjects cause LNL methods to violate a key underlying assumption, namely, neural networks tend to fit simpler (and thus clean) data in early training epochs. Motivated by the insights, we design VALERIAN, an invariant feature learning method for in-the-wild wearable sensor-based HAR. By training a multi-task model with separate task-specific layers for each subject, VALERIAN allows noisy labels to be dealt with individually while benefiting from shared feature representation across subjects. We evaluated VALERIAN on four datasets, two collected in a controlled environment and two in the wild.
    Pistol: Pupil Invisible Supportive Tool to extract Pupil, Iris, Eye Opening, Eye Movements, Pupil and Iris Gaze Vector, and 2D as well as 3D Gaze. (arXiv:2201.06799v2 [cs.CV] UPDATED)
    This paper describes a feature extraction and gaze estimation software, named \textit{Pistol} that can be used with Pupil Invisible projects and other eye trackers in the future. In offline mode, our software extracts multiple features from the eye including, the pupil and iris ellipse, eye aperture, pupil vector, iris vector, eye movement types from pupil and iris velocities, marker detection, marker distance, 2D gaze estimation for the pupil center, iris center, pupil vector, and iris vector using Levenberg Marquart fitting and neural networks. The gaze signal is computed in 2D for each eye and each feature separately and for both eyes in 3D also for each feature separately. We hope this software helps other researchers to extract state-of-the-art features for their research out of their recordings. Link: https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FPISTOL&mode=list
    GFlowCausal: Generative Flow Networks for Causal Discovery. (arXiv:2210.08185v2 [cs.LG] UPDATED)
    Causal discovery aims to uncover causal structure among a set of variables. Score-based approaches mainly focus on searching for the best Directed Acyclic Graph (DAG) based on a predefined score function. However, most of them are not applicable on a large scale due to the limited searchability. Inspired by the active learning in generative flow networks, we propose a novel approach to learning a DAG from observational data called GFlowCausal. It converts the graph search problem to a generation problem, in which direct edges are added gradually. GFlowCausal aims to learn the best policy to generate high-reward DAGs by sequential actions with probabilities proportional to predefined rewards. We propose a plug-and-play module based on transitive closure to ensure efficient sampling. Theoretical analysis shows that this module could guarantee acyclicity properties effectively and the consistency between final states and fully-connected graphs. We conduct extensive experiments on both synthetic and real datasets, and results show the proposed approach to be superior and also performs well in a large-scale setting.
    Tactile-Filter: Interactive Tactile Perception for Part Mating. (arXiv:2303.06034v1 [cs.RO])
    Humans rely on touch and tactile sensing for a lot of dexterous manipulation tasks. Our tactile sensing provides us with a lot of information regarding contact formations as well as geometric information about objects during any interaction. With this motivation, vision-based tactile sensors are being widely used for various robotic perception and control tasks. In this paper, we present a method for interactive perception using vision-based tactile sensors for multi-object assembly. In particular, we are interested in tactile perception during part mating, where a robot can use tactile sensors and a feedback mechanism using particle filter to incrementally improve its estimate of objects that fit together for assembly. To do this, we first train a deep neural network that makes use of tactile images to predict the probabilistic correspondence between arbitrarily shaped objects that fit together. The trained model is used to design a particle filter which is used twofold. First, given one partial (or non-unique) observation of the hole, it incrementally improves the estimate of the correct peg by sampling more tactile observations. Second, it selects the next action for the robot to sample the next touch (and thus image) which results in maximum uncertainty reduction to minimize the number of interactions during the perception task. We evaluate our method on several part-mating tasks for assembly using a robot equipped with a vision-based tactile sensor. We also show the efficiency of the proposed action selection method against a naive method. See supplementary video at https://www.youtube.com/watch?v=jMVBg_e3gLw .
    RawNet: Fast End-to-End Neural Vocoder. (arXiv:1904.05351v2 [eess.AS] UPDATED)
    Neural network-based vocoders have recently demonstrated the powerful ability to synthesize high-quality speech. These models usually generate samples by conditioning on spectral features, such as Mel-spectrogram and fundamental frequency, which is crucial to speech synthesis. However, the feature extraction procession tends to depend heavily on human knowledge resulting in a less expressive description of the origin audio. In this work, we proposed RawNet, a complete end-to-end neural vocoder following the auto-encoder structure for speaker-dependent and -independent speech synthesis. It automatically learns to extract features and recover audio using neural networks, which include a coder network to capture a higher representation of the input audio and an autoregressive voder network to restore the audio in a sample-by-sample manner. The coder and voder are jointly trained directly on the raw waveform without any human-designed features. The experimental results show that RawNet achieves a better speech quality using a simplified model architecture and obtains a faster speech generation speed at the inference stage.  ( 2 min )
    POLICE: Provably Optimal Linear Constraint Enforcement for Deep Neural Networks. (arXiv:2211.01340v3 [cs.LG] UPDATED)
    Deep Neural Networks (DNNs) outshine alternative function approximators in many settings thanks to their modularity in composing any desired differentiable operator. The formed parametrized functional is then tuned to solve a task at hand from simple gradient descent. This modularity comes at the cost of making strict enforcement of constraints on DNNs, e.g. from a priori knowledge of the task, or from desired physical properties, an open challenge. In this paper we propose the first provable affine constraint enforcement method for DNNs that only requires minimal changes into a given DNN's forward-pass, that is computationally friendly, and that leaves the optimization of the DNN's parameter to be unconstrained, i.e. standard gradient-based method can be employed. Our method does not require any sampling and provably ensures that the DNN fulfills the affine constraint on a given input space's region at any point during training, and testing. We coin this method POLICE, standing for Provably Optimal LInear Constraint Enforcement. Github: https://github.com/RandallBalestriero/POLICE
    Skew Class-balanced Re-weighting for Unbiased Scene Graph Generation. (arXiv:2301.00351v2 [cs.LG] UPDATED)
    An unbiased scene graph generation (SGG) algorithm referred to as Skew Class-balanced Re-weighting (SCR) is proposed for considering the unbiased predicate prediction caused by the long-tailed distribution. The prior works focus mainly on alleviating the deteriorating performances of the minority predicate predictions, showing drastic dropping recall scores, i.e., losing the majority predicate performances. It has not yet correctly analyzed the trade-off between majority and minority predicate performances in the limited SGG datasets. In this paper, to alleviate the issue, the Skew Class-balanced Re-weighting (SCR) loss function is considered for the unbiased SGG models. Leveraged by the skewness of biased predicate predictions, the SCR estimates the target predicate weight coefficient and then re-weights more to the biased predicates for better trading-off between the majority predicates and the minority ones. Extensive experiments conducted on the standard Visual Genome dataset and Open Image V4 \& V6 show the performances and generality of the SCR with the traditional SGG models.
    Exphormer: Sparse Transformers for Graphs. (arXiv:2303.06147v1 [cs.LG])
    Graph transformers have emerged as a promising architecture for a variety of graph learning and representation tasks. Despite their successes, though, it remains challenging to scale graph transformers to large graphs while maintaining accuracy competitive with message-passing networks. In this paper, we introduce Exphormer, a framework for building powerful and scalable graph transformers. Exphormer consists of a sparse attention mechanism based on two mechanisms: virtual global nodes and expander graphs, whose mathematical characteristics, such as spectral expansion, pseduorandomness, and sparsity, yield graph transformers with complexity only linear in the size of the graph, while allowing us to prove desirable theoretical properties of the resulting transformer models. We show that incorporating \textsc{Exphormer} into the recently-proposed GraphGPS framework produces models with competitive empirical results on a wide variety of graph datasets, including state-of-the-art results on three datasets. We also show that \textsc{Exphormer} can scale to datasets on larger graphs than shown in previous graph transformer architectures. Code can be found at https://github.com/hamed1375/Exphormer.  ( 2 min )
    Deep Clustering Survival Machines with Interpretable Expert Distributions. (arXiv:2301.11826v3 [cs.LG] UPDATED)
    Conventional survival analysis methods are typically ineffective to characterize heterogeneity in the population while such information can be used to assist predictive modeling. In this study, we propose a hybrid survival analysis method, referred to as deep clustering survival machines, that combines the discriminative and generative mechanisms. Similar to the mixture models, we assume that the timing information of survival data is generatively described by a mixture of certain numbers of parametric distributions, i.e., expert distributions. We learn weights of the expert distributions for individual instances according to their features discriminatively such that each instance's survival information can be characterized by a weighted combination of the learned constant expert distributions. This method also facilitates interpretable subgrouping/clustering of all instances according to their associated expert distributions. Extensive experiments on both real and synthetic datasets have demonstrated that the method is capable of obtaining promising clustering results and competitive time-to-event predicting performance.
    Reconstructing the Hubble parameter with future Gravitational Wave missions using Machine Learning. (arXiv:2303.05169v1 [astro-ph.CO] CROSS LISTED)
    We study the prospects of Machine Learning algorithms like Gaussian processes (GP) as a tool to reconstruct the Hubble parameter $H(z)$ with two upcoming gravitational wave missions, namely the evolved Laser Interferometer Space Antenna (eLISA) and the Einstein Telescope (ET). We perform non-parametric reconstructions of $H(z)$ with GP using realistically generated catalogues, assuming various background cosmological models, for each mission. We also take into account the effect of early-time and late-time priors separately on the reconstruction, and hence on the Hubble constant ($H_0$). Our analysis reveals that GPs are quite robust in reconstructing the expansion history of the Universe within the observational window of the specific mission under study. We further confirm that both eLISA and ET would be able to constrain $H(z)$ and $H_0$ to a much higher precision than possible today, and also find out their possible role in addressing the Hubble tension for each model, on a case-by-case basis.
    Seq2Seq Surrogates of Epidemic Models to Facilitate Bayesian Inference. (arXiv:2209.09617v2 [cs.LG] UPDATED)
    Epidemic models are powerful tools in understanding infectious disease. However, as they increase in size and complexity, they can quickly become computationally intractable. Recent progress in modelling methodology has shown that surrogate models can be used to emulate complex epidemic models with a high-dimensional parameter space. We show that deep sequence-to-sequence (seq2seq) models can serve as accurate surrogates for complex epidemic models with sequence based model parameters, effectively replicating seasonal and long-term transmission dynamics. Once trained, our surrogate can predict scenarios a several thousand times faster than the original model, making them ideal for policy exploration. We demonstrate that replacing a traditional epidemic model with a learned simulator facilitates robust Bayesian inference.
    Multi-Task Recommendations with Reinforcement Learning. (arXiv:2302.03328v2 [cs.IR] UPDATED)
    In recent years, Multi-task Learning (MTL) has yielded immense success in Recommender System (RS) applications. However, current MTL-based recommendation models tend to disregard the session-wise patterns of user-item interactions because they are predominantly constructed based on item-wise datasets. Moreover, balancing multiple objectives has always been a challenge in this field, which is typically avoided via linear estimations in existing works. To address these issues, in this paper, we propose a Reinforcement Learning (RL) enhanced MTL framework, namely RMTL, to combine the losses of different recommendation tasks using dynamic weights. To be specific, the RMTL structure can address the two aforementioned issues by (i) constructing an MTL environment from session-wise interactions and (ii) training multi-task actor-critic network structure, which is compatible with most existing MTL-based recommendation models, and (iii) optimizing and fine-tuning the MTL loss function using the weights generated by critic networks. Experiments on two real-world public datasets demonstrate the effectiveness of RMTL with a higher AUC against state-of-the-art MTL-based recommendation models. Additionally, we evaluate and validate RMTL's compatibility and transferability across various MTL models.
    ReAct: Synergizing Reasoning and Acting in Language Models. (arXiv:2210.03629v3 [cs.CL] UPDATED)
    While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples. Project site with code: https://react-lm.github.io
    Zero-One Laws of Graph Neural Networks. (arXiv:2301.13060v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) are de facto standard deep learning architectures for machine learning on graphs. This has led to a large body of work analyzing the capabilities and limitations of these models, particularly pertaining to their representation and extrapolation capacity. We offer a novel theoretical perspective on the representation and extrapolation capacity of GNNs, by answering the question: how do GNNs behave as the number of graph nodes become very large? Under mild assumptions, we show that when we draw graphs of increasing size from the Erd\H{o}s-R\'enyi model, the probability that such graphs are mapped to a particular output by a class of GNN classifiers tends to either zero or to one. This class includes the popular graph convolutional network architecture. The result establishes 'zero-one laws' for these GNNs, and analogously to other convergence laws, entails theoretical limitations on their capacity. We empirically verify our results, observing that the theoretical asymptotic limits are evident already on relatively small graphs.
    Exploring the Relationship between Architecture and Adversarially Robust Generalization. (arXiv:2209.14105v2 [cs.LG] UPDATED)
    Adversarial training has been demonstrated to be one of the most effective remedies for defending adversarial examples, yet it often suffers from the huge robustness generalization gap on unseen testing adversaries, deemed as the adversarially robust generalization problem. Despite the preliminary understandings devoted to adversarially robust generalization, little is known from the architectural perspective. To bridge the gap, this paper for the first time systematically investigated the relationship between adversarially robust generalization and architectural design. Inparticular, we comprehensively evaluated 20 most representative adversarially trained architectures on ImageNette and CIFAR-10 datasets towards multiple `p-norm adversarial attacks. Based on the extensive experiments, we found that, under aligned settings, Vision Transformers (e.g., PVT, CoAtNet) often yield better adversarially robust generalization while CNNs tend to overfit on specific attacks and fail to generalize on multiple adversaries. To better understand the nature behind it, we conduct theoretical analysis via the lens of Rademacher complexity. We revealed the fact that the higher weight sparsity contributes significantly towards the better adversarially robust generalization of Transformers, which can be often achieved by the specially-designed attention blocks. We hope our paper could help to better understand the mechanism for designing robust DNNs. Our model weights can be found at this http URL
    Guaranteed Conformance of Neurosymbolic Models to Natural Constraints. (arXiv:2212.01346v6 [cs.LG] UPDATED)
    Deep neural networks have emerged as the workhorse for a large section of robotics and control applications, especially as models for dynamical systems. Such data-driven models are in turn used for designing and verifying autonomous systems. This is particularly useful in modeling medical systems where data can be leveraged to individualize treatment. In safety-critical applications, it is important that the data-driven model is conformant to established knowledge from the natural sciences. Such knowledge is often available or can often be distilled into a (possibly black-box) model $M$. For instance, the unicycle model (which encodes Newton's laws) for an F1 racing car. In this light, we consider the following problem - given a model $M$ and state transition dataset, we wish to best approximate the system model while being bounded distance away from $M$. We propose a method to guarantee this conformance. Our first step is to distill the dataset into few representative samples called memories, using the idea of a growing neural gas. Next, using these memories we partition the state space into disjoint subsets and compute bounds that should be respected by the neural network, when the input is drawn from a particular subset. This serves as a symbolic wrapper for guaranteed conformance. We argue theoretically that this only leads to bounded increase in approximation error; which can be controlled by increasing the number of memories. We experimentally show that on three case studies (Car Model, Drones, and Artificial Pancreas), our constrained neurosymbolic models conform to specified $M$ models (each encoding various constraints) with order-of-magnitude improvements compared to the augmented Lagrangian and vanilla training methods. Our code can be found at https://github.com/kaustubhsridhar/Constrained_Models
    On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation. (arXiv:2211.09634v3 [cs.LG] UPDATED)
    We investigate the sample complexity of bounded two-layer neural networks using different activation functions. In particular, we consider the class $$ \mathcal{H} = \left\{\textbf{x}\mapsto \langle \textbf{v}, \sigma \circ W\textbf{b} + \textbf{b} \rangle : \textbf{b}\in\mathbb{R}^d, W \in \mathbb{R}^{\mathcal{T}\times d}, \textbf{v} \in \mathbb{R}^{\mathcal{T}}\right\} $$ where the spectral norm of $W$ and $\textbf{v}$ is bounded by $O(1)$, the Frobenius norm of $W$ is bounded from its initialization by $R > 0$, and $\sigma$ is a Lipschitz activation function. We prove that if $\sigma$ is element-wise, then the sample complexity of $\mathcal{H}$ has only logarithmic dependency in width and that this complexity is tight, up to logarithmic factors. We further show that the element-wise property of $\sigma$ is essential for a logarithmic dependency bound in width, in the sense that there exist non-element-wise activation functions whose sample complexity is linear in width, for widths that can be up to exponential in the input dimension. For the upper bound, we use the recent approach for norm-based bounds named Approximate Description Length (ADL) by arXiv:1910.05697. We further develop new techniques and tools for this approach that will hopefully inspire future works.
    ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Emotional Reaction Intensity Estimation Challenges. (arXiv:2303.01498v2 [cs.CV] UPDATED)
    The fifth Affective Behavior Analysis in-the-wild (ABAW) Competition is part of the respective ABAW Workshop which will be held in conjunction with IEEE Computer Vision and Pattern Recognition Conference (CVPR), 2023. The 5th ABAW Competition is a continuation of the Competitions held at ECCV 2022, IEEE CVPR 2022, ICCV 2021, IEEE FG 2020 and CVPR 2017 Conferences, and is dedicated at automatically analyzing affect. For this year's Competition, we feature two corpora: i) an extended version of the Aff-Wild2 database and ii) the Hume-Reaction dataset. The former database is an audiovisual one of around 600 videos of around 3M frames and is annotated with respect to:a) two continuous affect dimensions -valence (how positive/negative a person is) and arousal (how active/passive a person is)-; b) basic expressions (e.g. happiness, sadness, neutral state); and c) atomic facial muscle actions (i.e., action units). The latter dataset is an audiovisual one in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities. Thus the 5th ABAW Competition encompasses four Challenges: i) uni-task Valence-Arousal Estimation, ii) uni-task Expression Classification, iii) uni-task Action Unit Detection, and iv) Emotional Reaction Intensity Estimation. In this paper, we present these Challenges, along with their corpora, we outline the evaluation metrics, we present the baseline systems and illustrate their obtained performance.
    Communication Size Reduction of Federated Learning using Neural ODE Models. (arXiv:2208.09478v3 [cs.LG] UPDATED)
    Federated learning is a machine learning approach in which data is not aggregated on a server, but is trained at clients locally, in consideration of security and privacy. ResNet is a classic but representative neural network that succeeds in deepening the neural network by learning a residual function that adds the inputs and outputs together. In federated learning, communication is performed between the server and clients to exchange weight parameters. Since ResNet has deep layers and a large number of parameters, the communication size becomes large. In this paper, we use Neural ODE as a lightweight model of ResNet to reduce communication size in federated learning. In addition, we newly introduce a flexible federated learning using Neural ODE models with different number of iterations, which correspond to ResNet models with different depths. Evaluation results using CIFAR-10 dataset show that the use of Neural ODE reduces communication size by up to 92.4% compared to ResNet. We also show that the proposed flexible federated learning can merge models with different iteration counts or depths.
    Machine Learning-powered Course Allocation. (arXiv:2210.00954v2 [cs.GT] UPDATED)
    We introduce a machine learning-powered course allocation mechanism. Concretely, we extend the state-of-the-art Course Match mechanism with a machine learning-based preference elicitation module. In an iterative, asynchronous manner, this module generates pairwise comparison queries that are tailored to each individual student. Regarding incentives, our machine learning-powered course match (MLCM) mechanism retains the attractive strategyproofness in the large property of Course Match. Regarding welfare, we perform computational experiments using a simulator that was fitted to real-world data. Our results show that, compared to Course Match, MLCM increases average student utility by 4%-9% and minimum student utility by 10%-21%, even with only ten comparison queries. Finally, we highlight the practicability of MLCM and the ease of piloting it for universities currently using Course Match.
    DORA: Exploring outlier representations in Deep Neural Networks. (arXiv:2206.04530v2 [cs.LG] UPDATED)
    Deep Neural Networks (DNNs) draw their power from the representations they learn. However, while being incredibly effective in learning complex abstractions, they are susceptible to learning malicious concepts, due to the spurious correlations inherent in the training data. So far, existing methods for uncovering such artifactual behavior in trained models focus on finding artifacts in the input data, which requires both availability of a data set and human supervision. In this paper, we introduce DORA (Data-agnOstic Representation Analysis): the first data-agnostic framework for the analysis of the representation space of DNNs. We propose a novel distance measure between representations that utilizes self-explaining capabilities within the network itself without access to any data and quantitatively validate its alignment with human-defined semantic distances. We further demonstrate that this metric could be utilized for the detection of anomalous representations, which may bear a risk of learning unintended spurious concepts deviating from the desired decision-making policy. Finally, we demonstrate the practical utility of DORA by analyzing and identifying artifactual representations in widely popular Computer Vision models.
    Uncertainty Estimates of Predictions via a General Bias-Variance Decomposition. (arXiv:2210.12256v2 [cs.LG] UPDATED)
    Reliably estimating the uncertainty of a prediction throughout the model lifecycle is crucial in many safety-critical applications. The most common way to measure this uncertainty is via the predicted confidence. While this tends to work well for in-domain samples, these estimates are unreliable under domain drift and restricted to classification. Alternatively, proper scores can be used for most predictive tasks but a bias-variance decomposition for model uncertainty does not exist in the current literature. In this work we introduce a general bias-variance decomposition for proper scores, giving rise to the Bregman Information as the variance term. We discover how exponential families and the classification log-likelihood are special cases and provide novel formulations. Surprisingly, we can express the classification case purely in the logit space. We showcase the practical relevance of this decomposition on several downstream tasks, including model ensembles and confidence regions. Further, we demonstrate how different approximations of the instance-level Bregman Information allow reliable out-of-distribution detection for all degrees of domain drift.
    Large Language Models Are Human-Level Prompt Engineers. (arXiv:2211.01910v2 [cs.LG] UPDATED)
    By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers. However, task performance depends significantly on the quality of the prompt used to steer the model, and most effective prompts have been handcrafted by humans. Inspired by classical program synthesis and the human approach to prompt engineering, we propose Automatic Prompt Engineer (APE) for automatic instruction generation and selection. In our method, we treat the instruction as the "program," optimized by searching over a pool of instruction candidates proposed by an LLM in order to maximize a chosen score function. To evaluate the quality of the selected instruction, we evaluate the zero-shot performance of another LLM following the selected instruction. Experiments on 24 NLP tasks show that our automatically generated instructions outperform the prior LLM baseline by a large margin and achieve better or comparable performance to the instructions generated by human annotators on 19/24 tasks. We conduct extensive qualitative and quantitative analyses to explore the performance of APE. We show that APE-engineered prompts can be applied to steer models toward truthfulness and/or informativeness, as well as to improve few-shot learning performance by simply prepending them to standard in-context learning prompts. Please check out our webpage at https://sites.google.com/view/automatic-prompt-engineer.
    Model-based Causal Bayesian Optimization. (arXiv:2211.10257v2 [cs.LG] UPDATED)
    How should we intervene on an unknown structural equation model to maximize a downstream variable of interest? This setting, also known as causal Bayesian optimization (CBO), has important applications in medicine, ecology, and manufacturing. Standard Bayesian optimization algorithms fail to effectively leverage the underlying causal structure. Existing CBO approaches assume noiseless measurements and do not come with guarantees. We propose the model-based causal Bayesian optimization algorithm (MCBO) that learns a full system model instead of only modeling intervention-reward pairs. MCBO propagates epistemic uncertainty about the causal mechanisms through the graph and trades off exploration and exploitation via the optimism principle. We bound its cumulative regret, and obtain the first non-asymptotic bounds for CBO. Unlike in standard Bayesian optimization, our acquisition function cannot be evaluated in closed form, so we show how the reparameterization trick can be used to apply gradient-based optimizers. The resulting practical implementation of MCBO compares favorably with state-of-the-art approaches empirically.
    Tensor Denoising via Amplification and Stable Rank Methods. (arXiv:2301.03761v2 [cs.LG] UPDATED)
    Tensors in the form of multilinear arrays are ubiquitous in data science applications. Captured real-world data, including video, hyperspectral images, and discretized physical systems, naturally occur as tensors and often come with attendant noise. Under the additive noise model and with the assumption that the underlying clean tensor has low rank, many denoising methods have been created that utilize tensor decomposition to effect denoising through low rank tensor approximation. However, all such decomposition methods require estimating the tensor rank, or related measures such as the tensor spectral and nuclear norms, all of which are NP-hard problems. In this work we leverage our previously developed framework of $\textit{tensor amplification}$, which provides good approximations of the spectral and nuclear tensor norms, to denoising synthetic tensors of various sizes, ranks, and noise levels, along with real-world tensors derived from physiological signals. We also introduce two new notions of tensor rank -- $\textit{stable slice rank}$ and $\textit{stable }$$X$$\textit{-rank}$ -- and new denoising methods based on their estimation. The experimental results show that in the low rank context, tensor-based amplification provides comparable denoising performance in high signal-to-noise ratio (SNR) settings and superior performance in noisy (i.e., low SNR) settings, while the stable $X$-rank method achieves superior denoising performance on the physiological signal data.
    DEJA VU: Continual Model Generalization For Unseen Domains. (arXiv:2301.10418v2 [cs.LG] UPDATED)
    In real-world applications, deep learning models often run in non-stationary environments where the target data distribution continually shifts over time. There have been numerous domain adaptation (DA) methods in both online and offline modes to improve cross-domain adaptation ability. However, these DA methods typically only provide good performance after a long period of adaptation, and perform poorly on new domains before and during adaptation - in what we call the "Unfamiliar Period", especially when domain shifts happen suddenly and significantly. On the other hand, domain generalization (DG) methods have been proposed to improve the model generalization ability on unadapted domains. However, existing DG works are ineffective for continually changing domains due to severe catastrophic forgetting of learned knowledge. To overcome these limitations of DA and DG in handling the Unfamiliar Period during continual domain shift, we propose RaTP, a framework that focuses on improving models' target domain generalization (TDG) capability, while also achieving effective target domain adaptation (TDA) capability right after training on certain domains and forgetting alleviation (FA) capability on past domains. RaTP includes a training-free data augmentation module to prepare data for TDG, a novel pseudo-labeling mechanism to provide reliable supervision for TDA, and a prototype contrastive alignment algorithm to align different domains for achieving TDG, TDA and FA. Extensive experiments on Digits, PACS, and DomainNet demonstrate that RaTP significantly outperforms state-of-the-art works from Continual DA, Source-Free DA, Test-Time/Online DA, Single DG, Multiple DG and Unified DA&DG in TDG, and achieves comparable TDA and FA capabilities.
    Low Dimensional Invariant Embeddings for Universal Geometric Learning. (arXiv:2205.02956v2 [cs.LG] UPDATED)
    This paper studies separating invariants: mappings on $D$ dimensional domains which are invariant to an appropriate group action, and which separate orbits. The motivation for this study comes from the usefulness of separating invariants in proving universality of equivariant neural network architectures. We observe that in several cases the cardinality of separating invariants proposed in the machine learning literature is much larger than the dimension $D$. As a result, the theoretical universal constructions based on these separating invariants is unrealistically large. Our goal in this paper is to resolve this issue. We show that when a continuous family of semi-algebraic separating invariants is available, separation can be obtained by randomly selecting 2D+1 of these invariants. We apply this methodology to obtain an efficient scheme for computing separating invariants for several classical group actions which have been studied in the invariant learning literature. Examples include matrix multiplication actions on point clouds by permutations, rotations, and various other linear groups. Often the requirement of invariant separation is relaxed and only generic separation is required. In this case, we show that only D+1 invariants are required. More importantly, generic invariants are often significantly easier to compute, as we illustrate by discussing generic and full separation for weighted graphs. Finally we outline an approach for proving that separating invariants can be constructed also when the random parameters have finite precision.
    No Reason for No Supervision: Improved Generalization in Supervised Models. (arXiv:2206.15369v2 [cs.CV] UPDATED)
    We consider the problem of training a deep neural network on a given classification task, e.g., ImageNet-1K (IN1K), so that it excels at both the training task as well as at other (future) transfer tasks. These two seemingly contradictory properties impose a trade-off between improving the model's generalization and maintaining its performance on the original task. Models trained with self-supervised learning tend to generalize better than their supervised counterparts for transfer learning; yet, they still lag behind supervised models on IN1K. In this paper, we propose a supervised learning setup that leverages the best of both worlds. We extensively analyze supervised training using multi-scale crops for data augmentation and an expendable projector head, and reveal that the design of the projector allows us to control the trade-off between performance on the training task and transferability. We further replace the last layer of class weights with class prototypes computed on the fly using a memory bank and derive two models: t-ReX that achieves a new state of the art for transfer learning and outperforms top methods such as DINO and PAWS on IN1K, and t-ReX* that matches the highly optimized RSB-A1 model on IN1K while performing better on transfer tasks. Code and pretrained models: https://europe.naverlabs.com/t-rex
    Exploiting Proximity-Aware Tasks for Embodied Social Navigation. (arXiv:2212.00767v2 [cs.CV] UPDATED)
    Learning how to navigate among humans in an occluded and spatially constrained indoor environment, is a key ability required to embodied agent to be integrated into our society. In this paper, we propose an end-to-end architecture that exploits Proximity-Aware Tasks (referred as to Risk and Proximity Compass) to inject into a reinforcement learning navigation policy the ability to infer common-sense social behaviors. To this end, our tasks exploit the notion of immediate and future dangers of collision. Furthermore, we propose an evaluation protocol specifically designed for the Social Navigation Task in simulated environments. This is done to capture fine-grained features and characteristics of the policy by analyzing the minimal unit of human-robot spatial interaction, called Encounter. We validate our approach on Gibson4+ and Habitat-Matterport3D datasets.
    SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models. (arXiv:2106.00553v4 [cs.LG] UPDATED)
    In recent years, implicit deep learning has emerged as a method to increase the effective depth of deep neural networks. While their training is memory-efficient, they are still significantly slower to train than their explicit counterparts. In Deep Equilibrium Models (DEQs), the training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix. In this paper, we propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer. The main idea is to use the quasi-Newton matrices from the forward pass to efficiently approximate the inverse Jacobian matrix in the direction needed for the gradient computation. We provide a theorem that motivates using our method with the original forward algorithms. In addition, by modifying these forward algorithms, we further provide theoretical guarantees that our method asymptotically estimates the true implicit gradient. We empirically study this approach and the recent Jacobian-Free method in different settings, ranging from hyperparameter optimization to large Multiscale DEQs (MDEQs) applied to CIFAR and ImageNet. Both methods reduce significantly the computational cost of the backward pass. While SHINE has a clear advantage on hyperparameter optimization problems, both methods attain similar computational performances for larger scale problems such as MDEQs at the cost of a limited performance drop compared to the original models.
    Learning POD of Complex Dynamics Using Heavy-ball Neural ODEs. (arXiv:2202.12373v2 [cs.LG] UPDATED)
    Proper orthogonal decomposition (POD) allows reduced-order modeling of complex dynamical systems at a substantial level, while maintaining a high degree of accuracy in modeling the underlying dynamical systems. Advances in machine learning algorithms enable learning POD-based dynamics from data and making accurate and fast predictions of dynamical systems. In this paper, we leverage the recently proposed heavy-ball neural ODEs (HBNODEs) [Xia et al. NeurIPS, 2021] for learning data-driven reduced-order models (ROMs) in the POD context, in particular, for learning dynamics of time-varying coefficients generated by the POD analysis on training snapshots generated from solving full order models. HBNODE enjoys several practical advantages for learning POD-based ROMs with theoretical guarantees, including 1) HBNODE can learn long-term dependencies effectively from sequential observations and 2) HBNODE is computationally efficient in both training and testing. We compare HBNODE with other popular ROMs on several complex dynamical systems, including the von K\'{a}rm\'{a}n Street flow, the Kurganov-Petrova-Popov equation, and the one-dimensional Euler equations for fluids modeling.
    Privacy-Preserving and Lossless Distributed Estimation of High-Dimensional Generalized Additive Mixed Models. (arXiv:2210.07723v2 [stat.ML] UPDATED)
    Various privacy-preserving frameworks that respect the individual's privacy in the analysis of data have been developed in recent years. However, available model classes such as simple statistics or generalized linear models lack the flexibility required for a good approximation of the underlying data-generating process in practice. In this paper, we propose an algorithm for a distributed, privacy-preserving, and lossless estimation of generalized additive mixed models (GAMM) using component-wise gradient boosting (CWB). Making use of CWB allows us to reframe the GAMM estimation as a distributed fitting of base learners using the $L_2$-loss. In order to account for the heterogeneity of different data location sites, we propose a distributed version of a row-wise tensor product that allows the computation of site-specific (smooth) effects. Our adaption of CWB preserves all the important properties of the original algorithm, such as an unbiased feature selection and the feasibility to fit models in high-dimensional feature spaces, and yields equivalent model estimates as CWB on pooled data. Next to a derivation of the equivalence of both algorithms, we also showcase the efficacy of our algorithm on a distributed heart disease data set and compare it with state-of-the-art methods.
    Multidimensional Interactive Fixed-Effects. (arXiv:2209.11691v2 [econ.EM] UPDATED)
    This paper studies a linear and additively separable model for multidimensional panel data of three or more dimensions with unobserved interactive fixed effects. Two approaches are considered to account for these unobserved interactive fixed-effects when estimating coefficients on the observed covariates. First, the model is embedded within the standard two-dimensional panel framework and restrictions are derived under which the factor structure methods in Bai (2009) lead to consistent estimation of model parameters, but at potentially slow rates of convergence. The second approach utilises popular machine learning techniques to develop group fixed-effects and kernel weighted fixed-effects that are more robust to the multidimensional nature of the problem and can achieve the parametric rate of consistency under certain conditions. Theoretical results and simulations show the benefit of standard two-dimensional panel methods when the structure of the interactive fixed-effect term is known, but also highlight how the group fixed-effects and kernel methods perform well without knowledge of this structure. The methods are implemented to estimate the demand elasticity for beer under a handful of models for demand.
    APTx: better activation function than MISH, SWISH, and ReLU's variants used in deep learning. (arXiv:2209.06119v4 [cs.LG] UPDATED)
    Activation Functions introduce non-linearity in the deep neural networks. This nonlinearity helps the neural networks learn faster and efficiently from the dataset. In deep learning, many activation functions are developed and used based on the type of problem statement. ReLU's variants, SWISH, and MISH are goto activation functions. MISH function is considered having similar or even better performance than SWISH, and much better than ReLU. In this paper, we propose an activation function named APTx which behaves similar to MISH, but requires lesser mathematical operations to compute. The lesser computational requirements of APTx does speed up the model training, and thus also reduces the hardware requirement for the deep learning model.
    One step closer to EEG based eye tracking. (arXiv:2303.06039v1 [eess.SP])
    In this paper, we present two approaches and algorithms that adapt areas of interest We present a new deep neural network (DNN) that can be used to directly determine gaze position using EEG data. EEG-based eye tracking is a new and difficult research topic in the field of eye tracking, but it provides an alternative to image-based eye tracking with an input data set comparable to conventional image processing. The presented DNN exploits spatial dependencies of the EEG signal and uses convolutions similar to spatial filtering, which is used for preprocessing EEG signals. By this, we improve the direct gaze determination from the EEG signal compared to the state of the art by 3.5 cm MAE (Mean absolute error), but unfortunately still do not achieve a directly applicable system, since the inaccuracy is still significantly higher compared to image-based eye trackers. Link: https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FEEGGaze&mode=list
    Non-invasive Waveform Analysis for Emergency Triage via Simulated Hemorrhage: An Experimental Study using Novel Dynamic Lower Body Negative Pressure Model. (arXiv:2303.06064v1 [eess.SP])
    The extent to which advanced waveform analysis of non-invasive physiological signals can diagnose levels of hypovolemia remains insufficiently explored. The present study explores the discriminative ability of a deep learning (DL) framework to classify levels of ongoing hypovolemia, simulated via novel dynamic lower body negative pressure (LBNP) model among healthy volunteers. We used a dynamic LBNP protocol as opposed to the traditional model, where LBNP is applied in a predictable step-wise, progressively descending manner. This dynamic LBNP version assists in circumventing the problem posed in terms of time dependency, as in real-life pre-hospital settings, intravascular blood volume may fluctuate due to volume resuscitation. A supervised DL-based framework for ternary classification was realized by segmenting the underlying noninvasive signal and labeling segments with corresponding LBNP target levels. The proposed DL model with two inputs was trained with respective time-frequency representations extracted on waveform segments to classify each of them into blood volume loss: Class 1 (mild); Class 2 (moderate); or Class 3 (severe). At the outset, the latent space derived at the end of the DL model via late fusion among both inputs assists in enhanced classification performance. When evaluated in a 3-fold cross-validation setup with stratified subjects, the experimental findings demonstrated PPG to be a potential surrogate for variations in blood volume with average classification performance, AUROC: 0.8861, AUPRC: 0.8141, $F1$-score:72.16%, Sensitivity:79.06 %, and Specificity:89.21 %. Our proposed DL algorithm on PPG signal demonstrates the possibility of capturing the complex interplay in physiological responses related to both bleeding and fluid resuscitation using this challenging LBNP setup.
    Modeling Events and Interactions through Temporal Processes -- A Survey. (arXiv:2303.06067v1 [cs.LG])
    In real-world scenario, many phenomena produce a collection of events that occur in continuous time. Point Processes provide a natural mathematical framework for modeling these sequences of events. In this survey, we investigate probabilistic models for modeling event sequences through temporal processes. We revise the notion of event modeling and provide the mathematical foundations that characterize the literature on the topic. We define an ontology to categorize the existing approaches in terms of three families: simple, marked, and spatio-temporal point processes. For each family, we systematically review the existing approaches based based on deep learning. Finally, we analyze the scenarios where the proposed techniques can be used for addressing prediction and modeling aspects.
    Best of Many Worlds Guarantees for Online Learning with Knapsacks. (arXiv:2202.13710v2 [cs.LG] UPDATED)
    We study online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$ resource constraints. By casting the learning process over a suitably defined space of strategy mixtures, we recover strong duality on a Lagrangian relaxation of the underlying optimization problem, even for general settings with non-convex reward and resource-consumption functions. Then, we provide the first best-of-many-worlds type framework for this setting, with no-regret guarantees under stochastic, adversarial, and non-stationary inputs. Our framework yields the same regret guarantees of prior work in the stochastic case. On the other hand, when budgets grow at least linearly in the time horizon, it allows us to provide a constant competitive ratio in the adversarial case, which improves over the best known upper bound bound of $O(\log m \log T)$. Moreover, our framework allows the decision maker to handle non-convex reward and cost functions. We provide two game-theoretic applications of our framework to give further evidence of its flexibility. In doing so, we show that it can be employed to implement budget-pacing mechanisms in repeated first-price auctions.
    A Contrastive Approach to Online Change Point Detection. (arXiv:2206.10143v2 [stat.ML] UPDATED)
    We suggest a novel procedure for online change point detection. Our approach expands an idea of maximizing a discrepancy measure between points from pre-change and post-change distributions. This leads to a flexible procedure suitable for both parametric and nonparametric scenarios. We prove non-asymptotic bounds on the average running length of the procedure and its expected detection delay. The efficiency of the algorithm is illustrated with numerical experiments on synthetic and real-world data sets.
    Machine Learning Security in Industry: A Quantitative Survey. (arXiv:2207.05164v2 [cs.LG] UPDATED)
    Despite the large body of academic work on machine learning security, little is known about the occurrence of attacks on machine learning systems in the wild. In this paper, we report on a quantitative study with 139 industrial practitioners. We analyze attack occurrence and concern and evaluate statistical hypotheses on factors influencing threat perception and exposure. Our results shed light on real-world attacks on deployed machine learning. On the organizational level, while we find no predictors for threat exposure in our sample, the amount of implement defenses depends on exposure to threats or expected likelihood to become a target. We also provide a detailed analysis of practitioners' replies on the relevance of individual machine learning attacks, unveiling complex concerns like unreliable decision making, business information leakage, and bias introduction into models. Finally, we find that on the individual level, prior knowledge about machine learning security influences threat perception. Our work paves the way for more research about adversarial machine learning in practice, but yields also insights for regulation and auditing.
    Multivariate Probabilistic Forecasting of Intraday Electricity Prices using Normalizing Flows. (arXiv:2205.13826v4 [cs.LG] UPDATED)
    Electricity is traded on various markets with different time horizons and regulations. Short-term intraday trading becomes increasingly important due to the higher penetration of renewables. In Germany, the intraday electricity price typically fluctuates around the day-ahead price of the European Power EXchange (EPEX) spot markets in a distinct hourly pattern. This work proposes a probabilistic modeling approach that models the intraday price difference to the day-ahead contracts. The model captures the emerging hourly pattern by considering the four 15 min intervals in each day-ahead price interval as a four-dimensional joint probability distribution. The resulting nontrivial, multivariate price difference distribution is learned using a normalizing flow, i.e., a deep generative model that combines conditional multivariate density estimation and probabilistic regression. Furthermore, this work discusses the influence of different external impact factors based on literature insights and impact analysis using explainable artificial intelligence (XAI). The normalizing flow is compared to an informed selection of historical data and probabilistic forecasts using a Gaussian copula and a Gaussian regression model. Among the different models, the normalizing flow identifies the trends with the highest accuracy and has the narrowest prediction intervals. Both the XAI analysis and the empirical experiments highlight that the immediate history of the price difference realization and the increments of the day-ahead price have the most substantial impact on the price difference.
    Maximal Objectives in the Multi-armed Bandit with Applications. (arXiv:2006.06853v6 [cs.LG] UPDATED)
    In several applications of the stochastic multi-armed bandit problem, the traditional objective of maximizing the expected total reward can be inappropriate. In this paper, motivated by certain operational concerns in online platforms, we consider a new objective in the classical setup. Given $K$ arms, instead of maximizing the expected total reward from $T$ pulls (the traditional "sum" objective), we consider the vector of total rewards earned from each of the $K$ arms at the end of $T$ pulls and aim to maximize the expected highest total reward across arms (the "max" objective). For this objective, we show that any policy must incur an instance-dependent asymptotic regret of $\Omega(\log T)$ (with a higher instance-dependent constant compared to the traditional objective) and a worst-case regret of $\Omega(K^{1/3}T^{2/3})$. We then design an adaptive explore-then-commit policy featuring exploration based on appropriately tuned confidence bounds on the mean reward and an adaptive stopping criterion, which adapts to the problem difficulty and achieves these bounds (up to logarithmic factors). We then generalize our algorithmic insights to the problem of maximizing the expected value of the average total reward of the top $m$ arms with the highest total rewards. Our numerical experiments demonstrate the efficacy of our policies compared to several natural alternatives in practical parameter regimes. We discuss applications of these new objectives to the problem of grooming an adequate supply of value-providing market participants (workers/sellers/service providers) in online platforms.
    DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning. (arXiv:1905.13543v3 [cs.CV] UPDATED)
    Neural Architecture Search (NAS) has demonstrated state-of-the-art performance on various computer vision tasks. Despite the superior performance achieved, the efficiency and generality of existing methods are highly valued due to their high computational complexity and low generality. In this paper, we propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning, facilitating a theoretical bound on accuracy and efficiency. In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs. With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints, which is practical for on-device models across diverse search spaces and constraints. The architectures searched by our method achieve remarkable top-1 accuracies, 97.56 and 77.2 on CIFAR-10 and ImageNet (mobile settings), respectively, with the fastest search process, i.e., only 1.8 GPU hours on a Tesla V100. Codes for searching and network generation are available at: https://openi.pcl.ac.cn/PCL AutoML/XNAS.  ( 2 min )
    Long-tailed Classification from a Bayesian-decision-theory Perspective. (arXiv:2303.06075v1 [cs.LG])
    Long-tailed classification poses a challenge due to its heavy imbalance in class probabilities and tail-sensitivity risks with asymmetric misprediction costs. Recent attempts have used re-balancing loss and ensemble methods, but they are largely heuristic and depend heavily on empirical results, lacking theoretical explanation. Furthermore, existing methods overlook the decision loss, which characterizes different costs associated with tailed classes. This paper presents a general and principled framework from a Bayesian-decision-theory perspective, which unifies existing techniques including re-balancing and ensemble methods, and provides theoretical justifications for their effectiveness. From this perspective, we derive a novel objective based on the integrated risk and a Bayesian deep-ensemble approach to improve the accuracy of all classes, especially the ``tail". Besides, our framework allows for task-adaptive decision loss which provides provably optimal decisions in varying task scenarios, along with the capability to quantify uncertainty. Finally, We conduct comprehensive experiments, including standard classification, tail-sensitive classification with a new False Head Rate metric, calibration, and ablation studies. Our framework significantly improves the current SOTA even on large-scale real-world datasets like ImageNet.  ( 2 min )
    Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning. (arXiv:2303.05952v1 [cs.LG])
    Contrastive loss has been increasingly used in learning representations from multiple modalities. In the limit, the nature of the contrastive loss encourages modalities to exactly match each other in the latent space. Yet it remains an open question how the modality alignment affects the downstream task performance. In this paper, based on an information-theoretic argument, we first prove that exact modality alignment is sub-optimal in general for downstream prediction tasks. Hence we advocate that the key of better performance lies in meaningful latent modality structures instead of perfect modality alignment. To this end, we propose three general approaches to construct latent modality structures. Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization. Extensive experiments are conducted on two popular multi-modal representation learning frameworks: the CLIP-based two-tower model and the ALBEF-based fusion model. We test our model on a variety of tasks including zero/few-shot image classification, image-text retrieval, visual question answering, visual reasoning, and visual entailment. Our method achieves consistent improvements over existing methods, demonstrating the effectiveness and generalizability of our proposed approach on latent modality structure regularization.  ( 2 min )
    Rewarding Chatbots for Real-World Engagement with Millions of Users. (arXiv:2303.06135v1 [cs.CL])
    The emergence of pretrained large language models has led to the deployment of a range of social chatbots for chitchat. Although these chatbots demonstrate language ability and fluency, they are not guaranteed to be engaging and can struggle to retain users. This work investigates the development of social chatbots that prioritize user engagement to enhance retention, specifically examining the use of human feedback to efficiently develop highly engaging chatbots. The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses generated by the chatbot model at inference time. Intuitive evaluation metrics, such as mean conversation length (MCL), are introduced as proxies to measure the level of engagement of deployed chatbots. A/B testing on groups of 10,000 new daily chatbot users on the Chai Research platform shows that this approach increases the MCL by up to 70%, which translates to a more than 30% increase in user retention for a GPT-J 6B model. Future work aims to use the reward model to realise a data fly-wheel, where the latest user conversations can be used to alternately fine-tune the language model and the reward model.  ( 2 min )
    A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms. (arXiv:2303.06058v1 [cs.LG])
    In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the sampling probability of each arm and on the family of distributions to prove a logarithmic regret. As a direct application we revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling (TS), under various models for the distributions including single parameter exponential families, Gaussian distributions, bounded distributions, or distributions satisfying some conditions on their moments. In particular, we prove that MED is asymptotically optimal for all these models, but also provide a simple regret analysis of some TS algorithms for which the optimality is already known. We then further illustrate the interest of our approach, by analyzing a new Non-Parametric TS algorithm (h-NPTS), adapted to some families of unbounded reward distributions with a bounded h-moment. This model can for instance capture some non-parametric families of distributions whose variance is upper bounded by a known constant.  ( 2 min )
    Multiple Hands Make Light Work: Enhancing Quality and Diversity using MAP-Elites with Multiple Parallel Evolution Strategies. (arXiv:2303.06137v1 [cs.NE])
    With the development of hardware accelerators and their corresponding tools, evaluations have become more affordable through fast and massively parallel evaluations in some applications. This advancement has drastically sped up the runtime of evolution-inspired algorithms such as Quality-Diversity optimization, creating tremendous potential for algorithmic innovation through scale. In this work, we propose MAP-Elites-Multi-ES (MEMES), a novel QD algorithm based on Evolution Strategies (ES) designed for fast parallel evaluations. ME-Multi-ES builds on top of the existing MAP-Elites-ES algorithm, scaling it by maintaining multiple independent ES threads with massive parallelization. We also introduce a new dynamic reset procedure for the lifespan of the independent ES to autonomously maximize the improvement of the QD population. We show experimentally that MEMES outperforms existing gradient-based and objective-agnostic QD algorithms when compared in terms of generations. We perform this comparison on both black-box optimization and QD-Reinforcement Learning tasks, demonstrating the benefit of our approach across different problems and domains. Finally, we also find that our approach intrinsically enables optimization of fitness locally around a niche, a phenomenon not observed in other QD algorithms.  ( 2 min )
    Phase Aberration Correction without Reference Data: An Adaptive Mixed Loss Deep Learning Approach. (arXiv:2303.05747v1 [eess.IV])
    Phase aberration is one of the primary sources of image quality degradation in ultrasound, which is induced by spatial variations in sound speed across the heterogeneous medium. This effect disrupts transmitted waves and prevents coherent summation of echo signals, resulting in suboptimal image quality. In real experiments, obtaining non-aberrated ground truths can be extremely challenging, if not infeasible. It hinders the performance of deep learning-based phase aberration correction techniques due to sole reliance on simulated data and the presence of domain shift between simulated and experimental data. Here, for the first time, we propose a deep learning-based method that does not require reference data to compensate for the phase aberration effect. We train a network wherein both input and target output are randomly aberrated radio frequency (RF) data. Moreover, we demonstrate that a conventional loss function such as mean square error is inadequate for training the network to achieve optimal performance. Instead, we propose an adaptive mixed loss function that employs both B-mode and RF data, resulting in more efficient convergence and enhanced performance. Source code is available at \url{this http URL}.  ( 2 min )
    Hierarchical Neural Program Synthesis. (arXiv:2303.06018v1 [cs.SE])
    Program synthesis aims to automatically construct human-readable programs that satisfy given task specifications, such as input/output pairs or demonstrations. Recent works have demonstrated encouraging results in a variety of domains, such as string transformation, tensor manipulation, and describing behaviors of embodied agents. Most existing program synthesis methods are designed to synthesize programs from scratch, generating a program token by token, line by line. This fundamentally prevents these methods from scaling up to synthesize programs that are longer or more complex. In this work, we present a scalable program synthesis framework that instead synthesizes a program by hierarchically composing programs. Specifically, we first learn a task embedding space and a program decoder that can decode a task embedding into a program. Then, we train a high-level module to comprehend the task specification (e.g., input/output pairs or demonstrations) from long programs and produce a sequence of task embeddings, which are then decoded by the program decoder and composed to yield the synthesized program. We extensively evaluate our proposed framework in a string transformation domain with input/output pairs. The experimental results demonstrate that the proposed framework can synthesize programs that are significantly longer and more complex than the programs considered in prior program synthesis works. Website at https://thoughtp0lice.github.io/hnps_web/  ( 2 min )
    Forecasting Solar Irradiance without Direct Observation: An Empirical Analysis. (arXiv:2303.06010v1 [cs.LG])
    As the use of solar power increases, having accurate and timely forecasters will be essential for smooth grid operators. There are many proposed methods for forecasting solar irradiance / solar power production. However, many of these methods formulate the problem as a time-series, relying on near real-time access to observations at the location of interest to generate forecasts. This requires both access to a real-time stream of data and enough historical observations for these methods to be deployed. In this paper, we conduct a thorough analysis of effective ways to formulate the forecasting problem comparing classical machine learning approaches to state-of-the-art deep learning. Using data from 20 locations distributed throughout the UK and commercially available weather data, we show that it is possible to build systems that do not require access to this data. Leveraging weather observations and measurements from other locations we show it is possible to create models capable of accurately forecasting solar irradiance at new locations. We utilise compare both satellite and ground observations (e.g. temperature, pressure) of weather data. This could facilitate use planning and optimisation for both newly deployed solar farms and domestic installations from the moment they come online. Additionally, we show that training a single global model for multiple locations can produce a more robust model with more consistent and accurate results across locations.  ( 2 min )
    On the Value of Stochastic Side Information in Online Learning. (arXiv:2303.05914v1 [cs.LG])
    We study the effectiveness of stochastic side information in deterministic online learning scenarios. We propose a forecaster to predict a deterministic sequence where its performance is evaluated against an expert class. We assume that certain stochastic side information is available to the forecaster but not the experts. We define the minimax expected regret for evaluating the forecasters performance, for which we obtain both upper and lower bounds. Consequently, our results characterize the improvement in the regret due to the stochastic side information. Compared with the classical online learning problem with regret scales with O(\sqrt(n)), the regret can be negative when the stochastic side information is more powerful than the experts. To illustrate, we apply the proposed bounds to two concrete examples of different types of side information.  ( 2 min )
    The CMA Evolution Strategy: A Tutorial. (arXiv:1604.00772v2 [cs.LG] UPDATED)
    This tutorial introduces the CMA Evolution Strategy (ES), where CMA stands for Covariance Matrix Adaptation. The CMA-ES is a stochastic, or randomized, method for real-parameter (continuous domain) optimization of non-linear, non-convex functions. We try to motivate and derive the algorithm from intuitive concepts and from requirements of non-linear, non-convex search in continuous domain.  ( 2 min )
    Variational Quantum Neural Networks (VQNNS) in Image Classification. (arXiv:2303.05860v1 [quant-ph])
    Quantum machine learning has established as an interdisciplinary field to overcome limitations of classical machine learning and neural networks. This is a field of research which can prove that quantum computers are able to solve problems with complex correlations between inputs that can be hard for classical computers. This suggests that learning models made on quantum computers may be more powerful for applications, potentially faster computation and better generalization on less data. The objective of this paper is to investigate how training of quantum neural network (QNNs) can be done using quantum optimization algorithms for improving the performance and time complexity of QNNs. A classical neural network can be partially quantized to create a hybrid quantum-classical neural network which is used mainly in classification and image recognition. In this paper, a QNN structure is made where a variational parameterized circuit is incorporated as an input layer named as Variational Quantum Neural Network (VQNNs). We encode the cost function of QNNs onto relative phases of a superposition state in the Hilbert space of the network parameters. The parameters are tuned with an iterative quantum approximate optimisation (QAOA) mixer and problem hamiltonians. VQNNs is experimented with MNIST digit recognition (less complex) and crack image classification datasets (more complex) which converges the computation in lesser time than QNN with decent training accuracy.  ( 2 min )
    An analytic theory for the dynamics of wide quantum neural networks. (arXiv:2203.16711v2 [quant-ph] UPDATED)
    Parameterized quantum circuits can be used as quantum neural networks and have the potential to outperform their classical counterparts when trained for addressing learning problems. To date, much of the results on their performance on practical problems are heuristic in nature. In particular, the convergence rate for the training of quantum neural networks is not fully understood. Here, we analyze the dynamics of gradient descent for the training error of a class of variational quantum machine learning models. We define wide quantum neural networks as parameterized quantum circuits in the limit of a large number of qubits and variational parameters. We then find a simple analytic formula that captures the average behavior of their loss function and discuss the consequences of our findings. For example, for random quantum circuits, we predict and characterize an exponential decay of the residual training error as a function of the parameters of the system. We finally validate our analytic results with numerical experiments.  ( 2 min )
    Accelerating ODE-Based Neural Networks on Low-Cost FPGAs. (arXiv:2012.15465v5 [cs.LG] UPDATED)
    ODENet is a deep neural network architecture in which a stacking structure of ResNet is implemented with an ordinary differential equation (ODE) solver. It can reduce the number of parameters and strike a balance between accuracy and performance by selecting a proper solver. It is also possible to improve the accuracy while keeping the same number of parameters on resource-limited edge devices. In this paper, using Euler method as an ODE solver, a part of ODENet is implemented as a dedicated logic on a low-cost FPGA (Field-Programmable Gate Array) board, such as PYNQ-Z2 board. As ODENet variants, reduced ODENets (rODENets) each of which heavily uses a part of ODENet layers and reduces/eliminates some layers differently are proposed and analyzed for low-cost FPGA implementation. They are evaluated in terms of parameter size, accuracy, execution time, and resource utilization on the FPGA. The results show that an overall execution time of an rODENet variant is improved by up to 2.66 times compared to a pure software execution while keeping a comparable accuracy to the original ODENet.  ( 2 min )
    eBPF-based Working Set Size Estimation in Memory Management. (arXiv:2303.05919v1 [cs.PF])
    Working set size estimation (WSS) is of great significance to improve the efficiency of program executing and memory arrangement in modern operating systems. Previous work proposed several methods to estimate WSS, including self-balloning, Zballoning and so on. However, these methods which are based on virtual machine usually cause a large overhead. Thus, using those methods to estimate WSS is impractical. In this paper, we propose a novel framework to efficiently estimate WSS with eBPF (extended Berkeley Packet Filter), a cutting-edge technology which monitors and filters data by being attached to the kernel. With an eBPF program pinned into the kernel, we get the times of page fault and other information of memory allocation. Moreover, we collect WSS via vanilla tool to train a predictive model to complete estimation work with LightGBM, a useful tool which performs well on generating decision trees over continuous value. The experimental results illustrate that our framework can estimate WSS precisely with 98.5\% reduction in overhead compared to traditional methods.  ( 2 min )
    Clinical Courses of Acute Kidney Injury in Hospitalized Patients: A Multistate Analysis. (arXiv:2303.06071v1 [q-bio.QM])
    Objectives: We aim to quantify longitudinal acute kidney injury (AKI) trajectories and to describe transitions through progressing and recovery states and outcomes among hospitalized patients using multistate models. Methods: In this large, longitudinal cohort study, 138,449 adult patients admitted to a quaternary care hospital between 2012 and 2019 were staged based on Kidney Disease: Improving Global Outcomes serum creatinine criteria for the first 14 days of their hospital stay. We fit multistate models to estimate probability of being in a certain clinical state at a given time after entering each one of the AKI stages. We investigated the effects of selected variables on transition rates via Cox proportional hazards regression models. Results: Twenty percent of hospitalized encounters (49,325/246,964) had AKI; among patients with AKI, 66% had Stage 1 AKI, 18% had Stage 2 AKI, and 17% had AKI Stage 3 with or without RRT. At seven days following Stage 1 AKI, 69% (95% confidence interval [CI]: 68.8%-70.5%) were either resolved to No AKI or discharged, while smaller proportions of recovery (26.8%, 95% CI: 26.1%-27.5%) and discharge (17.4%, 95% CI: 16.8%-18.0%) were observed following AKI Stage 2. At 14 days following Stage 1 AKI, patients with more frail conditions (Charlson comorbidity index greater than or equal to 3 and had prolonged ICU stay) had lower proportion of transitioning to No AKI or discharge states. Discussion: Multistate analyses showed that the majority of Stage 2 and higher severity AKI patients could not resolve within seven days; therefore, strategies preventing the persistence or progression of AKI would contribute to the patients' life quality. Conclusions: We demonstrate multistate modeling framework's utility as a mechanism for a better understanding of the clinical course of AKI with the potential to facilitate treatment and resource planning.  ( 2 min )
    Combining Contention-Based Spectrum Access and Adaptive Modulation using Deep Reinforcement Learning. (arXiv:2109.11723v3 [eess.SP] UPDATED)
    The use of unlicensed spectrum for cellular systems to mitigate spectrum scarcity has led to the development of intelligent adaptive approaches to spectrum access that improve upon traditional carrier sensing and listen-before-talk methods. We study decentralized contention-based medium access for base stations (BSs) of a single Radio Access Technology (RAT) operating on unlicensed shared spectrum. We devise a distributed deep reinforcement learning-based algorithm for both contention and adaptive modulation, modelled on a two state Markov decision process, that attempts to maximize a network-wide downlink throughput objective. Empirically, we find the (proportional fairness) reward accumulated by a policy gradient approach to be significantly higher than even a genie-aided adaptive energy detection threshold. Our approaches are further validated by improved sum and peak throughput. The scalability of our approach to large networks is demonstrated via an improved cumulative reward earned on both indoor and outdoor layouts with a large number of BSs.  ( 2 min )
    Depression Diagnosis and Drug Response Prediction via Recurrent Neural Networks and Transformers Utilizing EEG Signals. (arXiv:2303.06033v1 [eess.SP])
    The Early diagnosis and treatment of depression is essential for effective treatment. Depression, while being one of the most common mental illnesses, is still poorly understood in both research and clinical practice. Among different treatments, drug prescription is widely used, however the drug treatment is not effective for many patients. In this work, we propose a method for major depressive disorder (MDD) diagnosis as well as a method for predicting the drug response in patient with MDD using EEG signals. Method: We employ transformers, which are modified recursive neural networks with novel architecture to evaluate the time dependency of time series effectively. We also compare the model to the well-known deep learning schemes such as CNN, LSTM and CNN-LSTM. Results: The transformer achieves an average recall of 99.41% and accuracy of 97.14% for classifying normal and MDD subjects. Furthermore, the transformer also performed well in classifying responders and non-responders to the drug, resulting in 97.01% accuracy and 97.76% Recall. Conclusion: Outperforming other methods on a similar number of parameters, the suggested technique, as a screening tool, seems to have the potential to assist health care professionals in assessing MDD patients for early diagnosis and treatment. Significance: Analyzing EEG signal analysis using transformers, which have replaced the recursive models as a new structure to examine the time dependence of time series, is the main novelty of this research.  ( 2 min )
    Deep Anomaly Detection on Tennessee Eastman Process Data. (arXiv:2303.05904v1 [cs.LG])
    This paper provides the first comprehensive evaluation and analysis of modern (deep-learning) unsupervised anomaly detection methods for chemical process data. We focus on the Tennessee Eastman process dataset, which has been a standard litmus test to benchmark anomaly detection methods for nearly three decades. Our extensive study will facilitate choosing appropriate anomaly detection methods in industrial applications.
    HARDC : A novel ECG-based heartbeat classification method to detect arrhythmia using hierarchical attention based dual structured RNN with dilated CNN. (arXiv:2303.06020v1 [eess.SP])
    In this paper have developed a novel hybrid hierarchical attention-based bidirectional recurrent neural network with dilated CNN (HARDC) method for arrhythmia classification. This solves problems that arise when traditional dilated convolutional neural network (CNN) models disregard the correlation between contexts and gradient dispersion. The proposed HARDC fully exploits the dilated CNN and bidirectional recurrent neural network unit (BiGRU-BiLSTM) architecture to generate fusion features. As a result of incorporating both local and global feature information and an attention mechanism, the model's performance for prediction is improved.By combining the fusion features with a dilated CNN and a hierarchical attention mechanism, the trained HARDC model showed significantly improved classification results and interpretability of feature extraction on the PhysioNet 2017 challenge dataset. Sequential Z-Score normalization, filtering, denoising, and segmentation are used to prepare the raw data for analysis. CGAN (Conditional Generative Adversarial Network) is then used to generate synthetic signals from the processed data. The experimental results demonstrate that the proposed HARDC model significantly outperforms other existing models, achieving an accuracy of 99.60\%, F1 score of 98.21\%, a precision of 97.66\%, and recall of 99.60\% using MIT-BIH generated ECG. In addition, this approach substantially reduces run time when using dilated CNN compared to normal convolution. Overall, this hybrid model demonstrates an innovative and cost-effective strategy for ECG signal compression and high-performance ECG recognition. Our results indicate that an automated and highly computed method to classify multiple types of arrhythmia signals holds considerable promise.
    A hybrid deep-learning-metaheuristic framework to approximate discrete road network design problems. (arXiv:2303.06024v1 [cs.NE])
    This study proposes a hybrid deep-learning-metaheuristic framework with a bi-level architecture to solve road network design problems (NDPs). We train a graph neural network (GNN) to approximate the solution of the user equilibrium (UE) traffic assignment problem, and use inferences made by the trained model to calculate fitness function evaluations of a genetic algorithm (GA) to approximate solutions for NDPs. Using two NDP variants and an exact solver as benchmark, we show that our proposed framework can provide solutions within 5% gap of the global optimum results given less than 1% of the time required for finding the optimal results. Moreover, we observe many interesting future directions, thus we propose a brief research agenda for this topic. The key observation inspiring influential future research was that fitness function evaluation time using the inferences made by the GNN model for the genetic algorithm was in the order of milliseconds, which points to an opportunity and a need for novel heuristics that 1) can cope well with noisy fitness function values provided by neural networks, and 2) can use the significantly higher computation time provided to them to explore the search space effectively (rather than efficiently). This opens a new avenue for a modern class of metaheuristics that are crafted for use with AI-powered predictors.
    Approximate Regions of Attraction in Learning with Decision-Dependent Distributions. (arXiv:2107.00055v3 [cs.LG] UPDATED)
    As data-driven methods are deployed in real-world settings, the processes that generate the observed data will often react to the decisions of the learner. For example, a data source may have some incentive for the algorithm to provide a particular label (e.g. approve a bank loan), and manipulate their features accordingly. Work in strategic classification and decision-dependent distributions seeks to characterize the closed-loop behavior of deploying learning algorithms by explicitly considering the effect of the classifier on the underlying data distribution. More recently, works in performative prediction seek to classify the closed-loop behavior by considering general properties of the mapping from classifier to data distribution, rather than an explicit form. Building on this notion, we analyze repeated risk minimization as the perturbed trajectories of the gradient flows of performative risk minimization. We consider the case where there may be multiple local minimizers of performative risk, motivated by situations where the initial conditions may have significant impact on the long-term behavior of the system. We provide sufficient conditions to characterize the region of attraction for the various equilibria in this settings. Additionally, we introduce the notion of performative alignment, which provides a geometric condition on the convergence of repeated risk minimization to performative risk minimizers.
    Exploring Adversarial Attacks on Neural Networks: An Explainable Approach. (arXiv:2303.06032v1 [cs.LG])
    Deep Learning (DL) is being applied in various domains, especially in safety-critical applications such as autonomous driving. Consequently, it is of great significance to ensure the robustness of these methods and thus counteract uncertain behaviors caused by adversarial attacks. In this paper, we use gradient heatmaps to analyze the response characteristics of the VGG-16 model when the input images are mixed with adversarial noise and statistically similar Gaussian random noise. In particular, we compare the network response layer by layer to determine where errors occurred. Several interesting findings are derived. First, compared to Gaussian random noise, intentionally generated adversarial noise causes severe behavior deviation by distracting the area of concentration in the networks. Second, in many cases, adversarial examples only need to compromise a few intermediate blocks to mislead the final decision. Third, our experiments revealed that specific blocks are more vulnerable and easier to exploit by adversarial examples. Finally, we demonstrate that the layers $Block4\_conv1$ and $Block5\_cov1$ of the VGG-16 model are more susceptible to adversarial attacks. Our work could provide valuable insights into developing more reliable Deep Neural Network (DNN) models.
    Pishgu: Universal Path Prediction Network Architecture for Real-time Cyber-physical Edge Systems. (arXiv:2210.08057v3 [cs.CV] UPDATED)
    Path prediction is an essential task for many real-world Cyber-Physical Systems (CPS) applications, from autonomous driving and traffic monitoring/management to pedestrian/worker safety. These real-world CPS applications need a robust, lightweight path prediction that can provide a universal network architecture for multiple subjects (e.g., pedestrians and vehicles) from different perspectives. However, most existing algorithms are tailor-made for a unique subject with a specific camera perspective and scenario. This article presents Pishgu, a universal lightweight network architecture, as a robust and holistic solution for path prediction. Pishgu's architecture can adapt to multiple path prediction domains with different subjects (vehicles, pedestrians), perspectives (bird's-eye, high-angle), and scenes (sidewalk, highway). Our proposed architecture captures the inter-dependencies within the subjects in each frame by taking advantage of Graph Isomorphism Networks and the attention module. We separately train and evaluate the efficacy of our architecture on three different CPS domains across multiple perspectives (vehicle bird's-eye view, pedestrian bird's-eye view, and human high-angle view). Pishgu outperforms state-of-the-art solutions in the vehicle bird's-eye view domain by 42% and 61% and pedestrian high-angle view domain by 23% and 22% in terms of ADE and FDE, respectively. Additionally, we analyze the domain-specific details for various datasets to understand their effect on path prediction and model interpretation. Finally, we report the latency and throughput for all three domains on multiple embedded platforms showcasing the robustness and adaptability of Pishgu for real-world integration into CPS applications.
    Ignorance is Bliss: Robust Control via Information Gating. (arXiv:2303.06121v1 [cs.LG])
    Informational parsimony -- i.e., using the minimal information required for a task, -- provides a useful inductive bias for learning representations that achieve better generalization by being robust to noise and spurious correlations. We propose information gating in the pixel space as a way to learn more parsimonious representations. Information gating works by learning masks that capture only the minimal information required to solve a given task. Intuitively, our models learn to identify which visual cues actually matter for a given task. We gate information using a differentiable parameterization of the signal-to-noise ratio, which can be applied to arbitrary values in a network, e.g.~masking out pixels at the input layer. We apply our approach, which we call InfoGating, to various objectives such as: multi-step forward and inverse dynamics, Q-learning, behavior cloning, and standard self-supervised tasks. Our experiments show that learning to identify and use minimal information can improve generalization in downstream tasks -- e.g., policies based on info-gated images are considerably more robust to distracting/irrelevant visual features.
    Scatter-based common spatial patterns -- a unified spatial filtering framework. (arXiv:2303.06019v1 [eess.SP])
    The common spatial pattern (CSP) approach is known as one of the most popular spatial filtering techniques for EEG classification in motor imagery (MI) based brain-computer interfaces (BCIs). However, it still suffers some drawbacks such as sensitivity to noise, non-stationarity, and limitation to binary classification.Therefore, we propose a novel spatial filtering framework called scaCSP based on the scatter matrices of spatial covariances of EEG signals, which works generally in both binary and multi-class problems whereas CSP can be cast into our framework as a special case when only the range space of the between-class scatter matrix is used in binary cases.We further propose subspace enhanced scaCSP algorithms which easily permit incorporating more discriminative information contained in other range spaces and null spaces of the between-class and within-class scatter matrices in two scenarios: a nullspace components reduction scenario and an additional spatial filter learning scenario.The proposed algorithms are evaluated on two data sets including 4 MI tasks. The classification performance is compared against state-of-the-art competing algorithms: CSP, Tikhonov regularized CSP (TRCSP), stationary CSP (sCSP) and stationary TRCSP (sTRCSP) in the binary problems whilst multi-class extensions of CSP based on pair-wise and one-versus-rest techniques in the multi-class problems. The results show that the proposed framework outperforms all the competing algorithms in terms of average classification accuracy and computational efficiency in both binary and multi-class problems.The proposed scsCSP works as a unified framework for general multi-class problems and is promising for improving the performance of MI-BCIs.
    EEG Synthetic Data Generation Using Probabilistic Diffusion Models. (arXiv:2303.06068v1 [eess.SP])
    Electroencephalography (EEG) plays a significant role in the Brain Computer Interface (BCI) domain, due to its non-invasive nature, low cost, and ease of use, making it a highly desirable option for widespread adoption by the general public. This technology is commonly used in conjunction with deep learning techniques, the success of which is largely dependent on the quality and quantity of data used for training. To address the challenge of obtaining sufficient EEG data from individual participants while minimizing user effort and maintaining accuracy, this study proposes an advanced methodology for data augmentation: generating synthetic EEG data using denoising diffusion probabilistic models. The synthetic data are generated from electrode-frequency distribution maps (EFDMs) of emotionally labeled EEG recordings. To assess the validity of the synthetic data generated, both a qualitative and a quantitative comparison with real EEG data were successfully conducted. This study opens up the possibility for an open\textendash source accessible and versatile toolbox that can process and generate data in both time and frequency dimensions, regardless of the number of channels involved. Finally, the proposed methodology has potential implications for the broader field of neuroscience research by enabling the creation of large, publicly available synthetic EEG datasets without privacy concerns.
    StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces. (arXiv:2303.06146v1 [cs.CV])
    Recent advances in face manipulation using StyleGAN have produced impressive results. However, StyleGAN is inherently limited to cropped aligned faces at a fixed image resolution it is pre-trained on. In this paper, we propose a simple and effective solution to this limitation by using dilated convolutions to rescale the receptive fields of shallow layers in StyleGAN, without altering any model parameters. This allows fixed-size small features at shallow layers to be extended into larger ones that can accommodate variable resolutions, making them more robust in characterizing unaligned faces. To enable real face inversion and manipulation, we introduce a corresponding encoder that provides the first-layer feature of the extended StyleGAN in addition to the latent style code. We validate the effectiveness of our method using unaligned face inputs of various resolutions in a diverse set of face manipulation tasks, including facial attribute editing, super-resolution, sketch/mask-to-face translation, and face toonification.
    Machine learning for sports betting: should forecasting models be optimised for accuracy or calibration?. (arXiv:2303.06021v1 [cs.LG])
    Sports betting's recent federal legalisation in the USA coincides with the golden age of machine learning. If bettors can leverage data to accurately predict the probability of an outcome, they can recognise when the bookmaker's odds are in their favour. As sports betting is a multi-billion dollar industry in the USA alone, identifying such opportunities could be extremely lucrative. Many researchers have applied machine learning to the sports outcome prediction problem, generally using accuracy to evaluate the performance of forecasting models. We hypothesise that for the sports betting problem, model calibration is more important than accuracy. To test this hypothesis, we train models on NBA data over several seasons and run betting experiments on a single season, using published odds. Evaluating various betting systems, we show that optimising the forecasting model for calibration leads to greater returns than optimising for accuracy, on average (return on investment of $110.42\%$ versus $2.98\%$) and in the best case ($902.01\%$ versus $222.84\%$). These findings suggest that for sports betting (or any forecasting problem where decisions are made based on the predicted probability of each outcome), calibration is a more important metric than accuracy. Sports bettors who wish to increase profits should therefore optimise their forecasting model for calibration.
    Variational formulations of ODE-Net as a mean-field optimal control problem and existence results. (arXiv:2303.05924v1 [math.AP])
    This paper presents a mathematical analysis of ODE-Net, a continuum model of deep neural networks (DNNs). In recent years, Machine Learning researchers have introduced ideas of replacing the deep structure of DNNs with ODEs as a continuum limit. These studies regard the "learning" of ODE-Net as the minimization of a "loss" constrained by a parametric ODE. Although the existence of a minimizer for this minimization problem needs to be assumed, only a few studies have investigated its existence analytically in detail. In the present paper, the existence of a minimizer is discussed based on a formulation of ODE-Net as a measure-theoretic mean-field optimal control problem. The existence result is proved when a neural network, which describes a vector field of ODE-Net, is linear with respect to learnable parameters. The proof employs the measure-theoretic formulation combined with the direct method of Calculus of Variations. Secondly, an idealized minimization problem is proposed to remove the above linearity assumption. Such a problem is inspired by a kinetic regularization associated with the Benamou--Brenier formula and universal approximation theorems for neural networks. The proofs of these existence results use variational methods, differential equations, and mean-field optimal control theory. They will stand for a new analytic way to investigate the learning process of deep neural networks.  ( 2 min )
    Analysis and Evaluation of Explainable Artificial Intelligence on Suicide Risk Assessment. (arXiv:2303.06052v1 [cs.LG])
    This study investigates the effectiveness of Explainable Artificial Intelligence (XAI) techniques in predicting suicide risks and identifying the dominant causes for such behaviours. Data augmentation techniques and ML models are utilized to predict the associated risk. Furthermore, SHapley Additive exPlanations (SHAP) and correlation analysis are used to rank the importance of variables in predictions. Experimental results indicate that Decision Tree (DT), Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) models achieve the best results while DT has the best performance with an accuracy of 95:23% and an Area Under Curve (AUC) of 0.95. As per SHAP results, anger problems, depression, and social isolation are the leading variables in predicting the risk of suicide, and patients with good incomes, respected occupations, and university education have the least risk. Results demonstrate the effectiveness of machine learning and XAI framework for suicide risk prediction, and they can assist psychiatrists in understanding complex human behaviours and can also assist in reliable clinical decision-making.  ( 2 min )
    Distributionally Robust Optimization with Probabilistic Group. (arXiv:2303.05809v1 [cs.LG])
    Modern machine learning models may be susceptible to learning spurious correlations that hold on average but not for the atypical group of samples. To address the problem, previous approaches minimize the empirical worst-group risk. Despite the promise, they often assume that each sample belongs to one and only one group, which does not allow expressing the uncertainty in group labeling. In this paper, we propose a novel framework PG-DRO, which explores the idea of probabilistic group membership for distributionally robust optimization. Key to our framework, we consider soft group membership instead of hard group annotations. The group probabilities can be flexibly generated using either supervised learning or zero-shot approaches. Our framework accommodates samples with group membership ambiguity, offering stronger flexibility and generality than the prior art. We comprehensively evaluate PG-DRO on both image classification and natural language processing benchmarks, establishing superior performance  ( 2 min )
    Classifying the evolution of COVID-19 severity on patients with combined dynamic Bayesian networks and neural networks. (arXiv:2303.05972v1 [cs.LG])
    When we face patients arriving to a hospital suffering from the effects of some illness, one of the main problems we can encounter is evaluating whether or not said patients are going to require intensive care in the near future. This intensive care requires allotting valuable and scarce resources, and knowing beforehand the severity of a patients illness can improve both its treatment and the organization of resources. We illustrate this issue in a dataset consistent of Spanish COVID-19 patients from the sixth epidemic wave where we label patients as critical when they either had to enter the intensive care unit or passed away. We then combine the use of dynamic Bayesian networks, to forecast the vital signs and the blood analysis results of patients over the next 40 hours, and neural networks, to evaluate the severity of a patients disease in that interval of time. Our empirical results show that the transposition of the current state of a patient to future values with the DBN for its subsequent use in classification obtains better the accuracy and g-mean score than a direct application with a classifier.
    Neural Gromov-Wasserstein Optimal Transport. (arXiv:2303.05978v1 [cs.LG])
    We present a scalable neural method to solve the Gromov-Wasserstein (GW) Optimal Transport (OT) problem with the inner product cost. In this problem, given two distributions supported on (possibly different) spaces, one has to find the most isometric map between them. Our proposed approach uses neural networks and stochastic mini-batch optimization which allows to overcome the limitations of existing GW methods such as their poor scalability with the number of samples and the lack of out-of-sample estimation. To demonstrate the effectiveness of our proposed method, we conduct experiments on the synthetic data and explore the practical applicability of our method to the popular task of the unsupervised alignment of word embeddings.  ( 2 min )
    Deep Generative Fixed-filter Active Noise Control. (arXiv:2303.05788v1 [eess.SY])
    Due to the slow convergence and poor tracking ability, conventional LMS-based adaptive algorithms are less capable of handling dynamic noises. Selective fixed-filter active noise control (SFANC) can significantly reduce response time by selecting appropriate pre-trained control filters for different noises. Nonetheless, the limited number of pre-trained control filters may affect noise reduction performance, especially when the incoming noise differs much from the initial noises during pre-training. Therefore, a generative fixed-filter active noise control (GFANC) method is proposed in this paper to overcome the limitation. Based on deep learning and a perfect-reconstruction filter bank, the GFANC method only requires a few prior data (one pre-trained broadband control filter) to automatically generate suitable control filters for various noises. The efficacy of the GFANC method is demonstrated by numerical simulations on real-recorded noises.  ( 2 min )
    TSMixer: An all-MLP Architecture for Time Series Forecasting. (arXiv:2303.06053v1 [cs.LG])
    Real-world time-series datasets are often multivariate with complex dynamics. Commonly-used high capacity architectures like recurrent- or attention-based sequential models have become popular. However, recent work demonstrates that simple univariate linear models can outperform those deep alternatives. In this paper, we investigate the capabilities of linear models for time-series forecasting and present Time-Series Mixer (TSMixer), an architecture designed by stacking multi-layer perceptrons (MLPs). TSMixer is based on mixing operations along time and feature dimensions to extract information efficiently. On popular academic benchmarks, the simple-to-implement TSMixer is comparable to specialized state-of-the-art models that leverage the inductive biases of specific benchmarks. On the challenging and large scale M5 benchmark, a real-world retail dataset, TSMixer demonstrates superior performance compared to the state-of-the-art alternatives. Our results underline the importance of efficiently utilizing cross-variate and auxiliary information for improving the performance of time series forecasting. The design paradigms utilized in TSMixer are expected to open new horizons for deep learning-based time series forecasting.
    Gradient Coordination for Quantifying and Maximizing Knowledge Transference in Multi-Task Learning. (arXiv:2303.05847v1 [cs.IR])
    Multi-task learning (MTL) has been widely applied in online advertising and recommender systems. To address the negative transfer issue, recent studies have proposed optimization methods that thoroughly focus on the gradient alignment of directions or magnitudes. However, since prior study has proven that both general and specific knowledge exist in the limited shared capacity, overemphasizing on gradient alignment may crowd out task-specific knowledge, and vice versa. In this paper, we propose a transference-driven approach CoGrad that adaptively maximizes knowledge transference via Coordinated Gradient modification. We explicitly quantify the transference as loss reduction from one task to another, and then derive an auxiliary gradient from optimizing it. We perform the optimization by incorporating this gradient into original task gradients, making the model automatically maximize inter-task transfer and minimize individual losses. Thus, CoGrad can harmonize between general and specific knowledge to boost overall performance. Besides, we introduce an efficient approximation of the Hessian matrix, making CoGrad computationally efficient and simple to implement. Both offline and online experiments verify that CoGrad significantly outperforms previous methods.
    Estimating friction coefficient using generative modelling. (arXiv:2303.05927v1 [cs.CV])
    It is common to utilise dynamic models to measure the tyre-road friction in real-time. Alternatively, predictive approaches estimate the tyre-road friction by identifying the environmental factors affecting it. This work aims to formulate the problem of friction estimation as a visual perceptual learning task. The problem is broken down into detecting surface characteristics by applying semantic segmentation and using the extracted features to predict the frictional force. This work for the first time formulates the friction estimation problem as a regression from the latent space of a semantic segmentation model. The preliminary results indicate that this approach can estimate frictional force.  ( 2 min )
    wav2vec and its current potential to Automatic Speech Recognition in German for the usage in Digital History: A comparative assessment of available ASR-technologies for the use in cultural heritage contexts. (arXiv:2303.06026v1 [eess.AS])
    In this case study we trained and published a state-of-the-art open-source model for Automatic Speech Recognition (ASR) for German to evaluate the current potential of this technology for the use in the larger context of Digital Humanities and cultural heritage indexation. Along with this paper we publish our wav2vec2 based speech to text model while we evaluate its performance on a corpus of historical recordings we assembled compared against commercial cloud-based and proprietary services. While our model achieves moderate results, we see that proprietary cloud services fare significantly better. As our results show, recognition rates over 90 percent can currently be achieved, however, these numbers drop quickly once the recordings feature limited audio quality or use of non-every day or outworn language. A big issue is the high variety of different dialects and accents in the German language. Nevertheless, this paper highlights that the currently available quality of recognition is high enough to address various use cases in the Digital Humanities. We argue that ASR will become a key technology for the documentation and analysis of audio-visual sources and identify an array of important questions that the DH community and cultural heritage stakeholders will have to address in the near future.  ( 2 min )
    Simulation-based Bayesian inference for robotic grasping. (arXiv:2303.05873v1 [cs.RO])
    General robotic grippers are challenging to control because of their rich nonsmooth contact dynamics and the many sources of uncertainties due to the environment or sensor noise. In this work, we demonstrate how to compute 6-DoF grasp poses using simulation-based Bayesian inference through the full stochastic forward simulation of the robot in its environment while robustly accounting for many of the uncertainties in the system. A Riemannian manifold optimization procedure preserving the nonlinearity of the rotation space is used to compute the maximum a posteriori grasp pose. Simulation and physical benchmarks show the promising high success rate of the approach.  ( 2 min )
    Automotive Perception Software Development: An Empirical Investigation into Data, Annotation, and Ecosystem Challenges. (arXiv:2303.05947v1 [cs.SE])
    Software that contains machine learning algorithms is an integral part of automotive perception, for example, in driving automation systems. The development of such software, specifically the training and validation of the machine learning components, require large annotated datasets. An industry of data and annotation services has emerged to serve the development of such data-intensive automotive software components. Wide-spread difficulties to specify data and annotation needs challenge collaborations between OEMs (Original Equipment Manufacturers) and their suppliers of software components, data, and annotations. This paper investigates the reasons for these difficulties for practitioners in the Swedish automotive industry to arrive at clear specifications for data and annotations. The results from an interview study show that a lack of effective metrics for data quality aspects, ambiguities in the way of working, unclear definitions of annotation quality, and deficits in the business ecosystems are causes for the difficulty in deriving the specifications. We provide a list of recommendations that can mitigate challenges when deriving specifications and we propose future research opportunities to overcome these challenges. Our work contributes towards the on-going research on accountability of machine learning as applied to complex software systems, especially for high-stake applications such as automated driving.  ( 2 min )
    Sliced-Wasserstein on Symmetric Positive Definite Matrices for M/EEG Signals. (arXiv:2303.05798v1 [cs.LG])
    When dealing with electro or magnetoencephalography records, many supervised prediction tasks are solved by working with covariance matrices to summarize the signals. Learning with these matrices requires using Riemanian geometry to account for their structure. In this paper, we propose a new method to deal with distributions of covariance matrices and demonstrate its computational efficiency on M/EEG multivariate time series. More specifically, we define a Sliced-Wasserstein distance between measures of symmetric positive definite matrices that comes with strong theoretical guarantees. Then, we take advantage of its properties and kernel methods to apply this distance to brain-age prediction from MEG data and compare it to state-of-the-art algorithms based on Riemannian geometry. Finally, we show that it is an efficient surrogate to the Wasserstein distance in domain adaptation for Brain Computer Interface applications.  ( 2 min )
    Sleep Quality Prediction from Wearables using Convolution Neural Networks and Ensemble Learning. (arXiv:2303.06028v1 [eess.SP])
    Sleep is among the most important factors affecting one's daily performance, well-being, and life quality. Nevertheless, it became possible to measure it in daily life in an unobtrusive manner with wearable devices. Rather than camera recordings and extraction of the state from the images, wrist-worn devices can measure directly via accelerometer, heart rate, and heart rate variability sensors. Some measured features can be as follows: time to bed, time out of bed, bedtime duration, minutes to fall asleep, and minutes after wake-up. There are several studies in the literature regarding sleep quality and stage prediction. However, they use only wearable data to predict or focus on the sleep stage. In this study, we use the NetHealth dataset, which is collected from 698 college students' via wearables, as well as surveys. Recently, there has been an advancement in deep learning algorithms, and they generally perform better than conventional machine learning techniques. Among them, Convolutional Neural Networks (CNN) have high performances. Thus, in this study, we apply different CNN architectures that have already performed well in the human activity recognition domain and compare their results. We also apply Random Forest (RF) since it performs best among the conventional methods. In future studies, we will compare them with other deep learning algorithms.  ( 2 min )
    Accurate Real-time Polyp Detection in Videos from Concatenation of Latent Features Extracted from Consecutive Frames. (arXiv:2303.05871v1 [cs.CV])
    An efficient deep learning model that can be implemented in real-time for polyp detection is crucial to reducing polyp miss-rate during screening procedures. Convolutional neural networks (CNNs) are vulnerable to small changes in the input image. A CNN-based model may miss the same polyp appearing in a series of consecutive frames and produce unsubtle detection output due to changes in camera pose, lighting condition, light reflection, etc. In this study, we attempt to tackle this problem by integrating temporal information among neighboring frames. We propose an efficient feature concatenation method for a CNN-based encoder-decoder model without adding complexity to the model. The proposed method incorporates extracted feature maps of previous frames to detect polyps in the current frame. The experimental results demonstrate that the proposed method of feature concatenation improves the overall performance of automatic polyp detection in videos. The following results are obtained on a public video dataset: sensitivity 90.94\%, precision 90.53\%, and specificity 92.46%  ( 2 min )
    Training, Architecture, and Prior for Deterministic Uncertainty Methods. (arXiv:2303.05796v1 [cs.LG])
    Accurate and efficient uncertainty estimation is crucial to build reliable Machine Learning (ML) models capable to provide calibrated uncertainty estimates, generalize and detect Out-Of-Distribution (OOD) datasets. To this end, Deterministic Uncertainty Methods (DUMs) is a promising model family capable to perform uncertainty estimation in a single forward pass. This work investigates important design choices in DUMs: (1) we show that training schemes decoupling the core architecture and the uncertainty head schemes can significantly improve uncertainty performances. (2) we demonstrate that the core architecture expressiveness is crucial for uncertainty performance and that additional architecture constraints to avoid feature collapse can deteriorate the trade-off between OOD generalization and detection. (3) Contrary to other Bayesian models, we show that the prior defined by DUMs do not have a strong effect on the final performances.  ( 2 min )
    Product Jacobi-Theta Boltzmann machines with score matching. (arXiv:2303.05910v1 [stat.ML])
    The estimation of probability density functions is a non trivial task that over the last years has been tackled with machine learning techniques. Successful applications can be obtained using models inspired by the Boltzmann machine (BM) architecture. In this manuscript, the product Jacobi-Theta Boltzmann machine (pJTBM) is introduced as a restricted version of the Riemann-Theta Boltzmann machine (RTBM) with diagonal hidden sector connection matrix. We show that score matching, based on the Fisher divergence, can be used to fit probability densities with the pJTBM more efficiently than with the original RTBM.
    TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets. (arXiv:2303.05762v1 [cs.LG])
    Diffusion models have achieved great success in a range of tasks, such as image synthesis and molecule design. As such successes hinge on large-scale training data collected from diverse sources, the trustworthiness of these collected data is hard to control or audit. In this work, we aim to explore the vulnerabilities of diffusion models under potential training data manipulations and try to answer: How hard is it to perform Trojan attacks on well-trained diffusion models? What are the adversarial targets that such Trojan attacks can achieve? To answer these questions, we propose an effective Trojan attack against diffusion models, TrojDiff, which optimizes the Trojan diffusion and generative processes during training. In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution and propose a new parameterization of the Trojan generative process that leads to an effective training objective for the attack. In addition, we consider three types of adversarial targets: the Trojaned diffusion models will always output instances belonging to a certain class from the in-domain distribution (In-D2D attack), out-of-domain distribution (Out-D2D-attack), and one specific instance (D2I attack). We evaluate TrojDiff on CIFAR-10 and CelebA datasets against both DDPM and DDIM diffusion models. We show that TrojDiff always achieves high attack performance under different adversarial targets using different types of triggers, while the performance in benign environments is preserved. The code is available at https://github.com/chenweixin107/TrojDiff.  ( 2 min )
    Scaling Up 3D Kernels with Bayesian Frequency Re-parameterization for Medical Image Segmentation. (arXiv:2303.05785v1 [eess.IV])
    With the inspiration of vision transformers, the concept of depth-wise convolution revisits to provide a large Effective Receptive Field (ERF) using Large Kernel (LK) sizes for medical image segmentation. However, the segmentation performance might be saturated and even degraded as the kernel sizes scaled up (e.g., $21\times 21\times 21$) in a Convolutional Neural Network (CNN). We hypothesize that convolution with LK sizes is limited to maintain an optimal convergence for locality learning. While Structural Re-parameterization (SR) enhances the local convergence with small kernels in parallel, optimal small kernel branches may hinder the computational efficiency for training. In this work, we propose RepUX-Net, a pure CNN architecture with a simple large kernel block design, which competes favorably with current network state-of-the-art (SOTA) (e.g., 3D UX-Net, SwinUNETR) using 6 challenging public datasets. We derive an equivalency between kernel re-parameterization and the branch-wise variation in kernel convergence. Inspired by the spatial frequency in the human visual system, we extend to vary the kernel convergence into element-wise setting and model the spatial frequency as a Bayesian prior to re-parameterize convolutional weights during training. Specifically, a reciprocal function is leveraged to estimate a frequency-weighted value, which rescales the corresponding kernel element for stochastic gradient descent. From the experimental results, RepUX-Net consistently outperforms 3D SOTA benchmarks with internal validation (FLARE: 0.929 to 0.944), external validation (MSD: 0.901 to 0.932, KiTS: 0.815 to 0.847, LiTS: 0.933 to 0.949, TCIA: 0.736 to 0.779) and transfer learning (AMOS: 0.880 to 0.911) scenarios in Dice Score.  ( 2 min )
    Semi-supervised Adversarial Learning for Complementary Item Recommendation. (arXiv:2303.05812v1 [cs.IR])
    Complementary item recommendations are a ubiquitous feature of modern e-commerce sites. Such recommendations are highly effective when they are based on collaborative signals like co-purchase statistics. In certain online marketplaces, however, e.g., on online auction sites, constantly new items are added to the catalog. In such cases, complementary item recommendations are often based on item side-information due to a lack of interaction data. In this work, we propose a novel approach that can leverage both item side-information and labeled complementary item pairs to generate effective complementary recommendations for cold items, i.e., for items for which no co-purchase statistics yet exist. Given that complementary items typically have to be of a different category than the seed item, we technically maintain a latent space for each item category. Simultaneously, we learn to project distributed item representations into these category spaces to determine suitable recommendations. The main learning process in our architecture utilizes labeled pairs of complementary items. In addition, we adopt ideas from Cycle Generative Adversarial Networks (CycleGAN) to leverage available item information even in case no labeled data exists for a given item and category. Experiments on three e-commerce datasets show that our method is highly effective.  ( 2 min )
    Lifelong Machine Learning Potentials. (arXiv:2303.05911v1 [cs.LG])
    Machine learning potentials (MLPs) trained on accurate quantum chemical data can retain the high accuracy, while inflicting little computational demands. On the downside, they need to be trained for each individual system. In recent years, a vast number of MLPs has been trained from scratch because learning additional data typically requires to train again on all data to not forget previously acquired knowledge. Additionally, most common structural descriptors of MLPs cannot represent efficiently a large number of different chemical elements. In this work, we tackle these problems by introducing element-embracing atom-centered symmetry functions (eeACSFs) which combine structural properties and element information from the periodic table. These eeACSFs are a key for our development of a lifelong machine learning potential (lMLP). Uncertainty quantification can be exploited to transgress a fixed, pre-trained MLP to arrive at a continuously adapting lMLP, because a predefined level of accuracy can be ensured. To extend the applicability of an lMLP to new systems, we apply continual learning strategies to enable autonomous and on-the-fly training on a continuous stream of new data. For the training of deep neural networks, we propose the continual resilient (CoRe) optimizer and incremental learning strategies relying on rehearsal of data, regularization of parameters, and the architecture of the model.
    Contrastive Language-Image Pretrained (CLIP) Models are Powerful Out-of-Distribution Detectors. (arXiv:2303.05828v1 [cs.CV])
    We present a comprehensive experimental study on pretrained feature extractors for visual out-of-distribution (OOD) detection. We examine several setups, based on the availability of labels or image captions and using different combinations of in- and out-distributions. Intriguingly, we find that (i) contrastive language-image pretrained models achieve state-of-the-art unsupervised out-of-distribution performance using nearest neighbors feature similarity as the OOD detection score, (ii) supervised state-of-the-art OOD detection performance can be obtained without in-distribution fine-tuning, (iii) even top-performing billion-scale vision transformers trained with natural language supervision fail at detecting adversarially manipulated OOD images. Finally, we argue whether new benchmarks for visual anomaly detection are needed based on our experiments. Using the largest publicly available vision transformer, we achieve state-of-the-art performance across all $18$ reported OOD benchmarks, including an AUROC of 87.6\% (9.2\% gain, unsupervised) and 97.4\% (1.2\% gain, supervised) for the challenging task of CIFAR100 $\rightarrow$ CIFAR10 OOD detection. The code will be open-sourced.  ( 2 min )
    Self-Supervised CSF Inpainting with Synthetic Atrophy for Improved Accuracy Validation of Cortical Surface Analyses. (arXiv:2303.05777v1 [eess.IV])
    Accuracy validation of cortical thickness measurement is a difficult problem due to the lack of ground truth data. To address this need, many methods have been developed to synthetically induce gray matter (GM) atrophy in an MRI via deformable registration, creating a set of images with known changes in cortical thickness. However, these methods often cause blurring in atrophied regions, and cannot simulate realistic atrophy within deep sulci where cerebrospinal fluid (CSF) is obscured or absent. In this paper, we present a solution using a self-supervised inpainting model to generate CSF in these regions and create images with more plausible GM/CSF boundaries. Specifically, we introduce a novel, 3D GAN model that incorporates patch-based dropout training, edge map priors, and sinusoidal positional encoding, all of which are established methods previously limited to 2D domains. We show that our framework significantly improves the quality of the resulting synthetic images and is adaptable to unseen data with fine-tuning. We also demonstrate that our resulting dataset can be employed for accuracy validation of cortical segmentation and thickness measurement.  ( 2 min )
    Distribution Preserving Source Separation With Time Frequency Predictive Models. (arXiv:2303.05896v1 [eess.AS])
    We provide an example of a distribution preserving source separation method, which aims at addressing perceptual shortcomings of state-of-the-art methods. Our approach uses unconditioned generative models of signal sources. Reconstruction is achieved by means of mix-consistent sampling from a distribution conditioned on a realization of a mix. The separated signals follow their respective source distributions, which provides an advantage when separation results are evaluated in a listening test.  ( 2 min )
    NFL Career Success as Predicted by NFL Scouting Combine. (arXiv:2303.05774v1 [cs.LG])
    The National Football League (NFL) Scouting Combine serves as a tool to evaluate the skills of prospective players and assess their readiness to play in the NFL. The development of machine learning brings new opportunities in assessing the utility of the Scouting Combine. Using machine and statistical learning, it may be possible to predict future success of prospective athletes, as well as predict which Scouting Combine tests are the most important. Results from statistical learning research have been contradicting whether the Scouting combine is a useful metric for player success. In this study, we investigate if machine learning can be used to determine matriculation and future success in the NFL. Using Scouting Combine data, we evaluate six different algorithms' ability to predict whether a potential draft pick will play a single NFL snap (matriculation). If a player is drafted, we predict how many snaps they go on to play (success). We are able to predict matriculation with 83% accuracy; however, we are unable to predict later success. Our best performing algorithm returns large error and low explained variance (RMSE=1,210 snaps; ${R}^2$=0.17). These findings indicate that while the Scouting Combine can predict NFL matriculation, it may not be a reliable predictor of long-term player success.  ( 2 min )
    Fast Diffusion Sampler for Inverse Problems by Geometric Decomposition. (arXiv:2303.05754v1 [cs.LG])
    Diffusion models have shown exceptional performance in solving inverse problems. However, one major limitation is the slow inference time. While faster diffusion samplers have been developed for unconditional sampling, there has been limited research on conditional sampling in the context of inverse problems. In this study, we propose a novel and efficient diffusion sampling strategy that employs the geometric decomposition of diffusion sampling. Specifically, we discover that the samples generated from diffusion models can be decomposed into two orthogonal components: a ``denoised" component obtained by projecting the sample onto the clean data manifold, and a ``noise" component that induces a transition to the next lower-level noisy manifold with the addition of stochastic noise. Furthermore, we prove that, under some conditions on the clean data manifold, the conjugate gradient update for imposing conditioning from the denoised signal belongs to the clean manifold, resulting in a much faster and more accurate diffusion sampling. Our method is applicable regardless of the parameterization and setting (i.e., VE, VP). Notably, we achieve state-of-the-art reconstruction quality on challenging real-world medical inverse imaging problems, including multi-coil MRI reconstruction and 3D CT reconstruction. Moreover, our proposed method achieves more than 80 times faster inference time than the previous state-of-the-art method.  ( 2 min )
    GATOR: Graph-Aware Transformer with Motion-Disentangled Regression for Human Mesh Recovery from a 2D Pose. (arXiv:2303.05652v1 [cs.CV])
    3D human mesh recovery from a 2D pose plays an important role in various applications. However, it is hard for existing methods to simultaneously capture the multiple relations during the evolution from skeleton to mesh, including joint-joint, joint-vertex and vertex-vertex relations, which often leads to implausible results. To address this issue, we propose a novel solution, called GATOR, that contains an encoder of Graph-Aware Transformer (GAT) and a decoder with Motion-Disentangled Regression (MDR) to explore these multiple relations. Specifically, GAT combines a GCN and a graph-aware self-attention in parallel to capture physical and hidden joint-joint relations. Furthermore, MDR models joint-vertex and vertex-vertex interactions to explore joint and vertex relations. Based on the clustering characteristics of vertex offset fields, MDR regresses the vertices by composing the predicted base motions. Extensive experiments show that GATOR achieves state-of-the-art performance on two challenging benchmarks.  ( 2 min )
    Decision-Making Under Uncertainty: Beyond Probabilities. (arXiv:2303.05848v1 [cs.AI])
    This position paper reflects on the state-of-the-art in decision-making under uncertainty. A classical assumption is that probabilities can sufficiently capture all uncertainty in a system. In this paper, the focus is on the uncertainty that goes beyond this classical interpretation, particularly by employing a clear distinction between aleatoric and epistemic uncertainty. The paper features an overview of Markov decision processes (MDPs) and extensions to account for partial observability and adversarial behavior. These models sufficiently capture aleatoric uncertainty but fail to account for epistemic uncertainty robustly. Consequently, we present a thorough overview of so-called uncertainty models that exhibit uncertainty in a more robust interpretation. We show several solution techniques for both discrete and continuous models, ranging from formal verification, over control-based abstractions, to reinforcement learning. As an integral part of this paper, we list and discuss several key challenges that arise when dealing with rich types of uncertainty in a model-based fashion.  ( 2 min )
    Position Paper on Dataset Engineering to Accelerate Science. (arXiv:2303.05545v1 [cs.LG])
    Data is a critical element in any discovery process. In the last decades, we observed exponential growth in the volume of available data and the technology to manipulate it. However, data is only practical when one can structure it for a well-defined task. For instance, we need a corpus of text broken into sentences to train a natural language machine-learning model. In this work, we will use the token \textit{dataset} to designate a structured set of data built to perform a well-defined task. Moreover, the dataset will be used in most cases as a blueprint of an entity that at any moment can be stored as a table. Specifically, in science, each area has unique forms to organize, gather and handle its datasets. We believe that datasets must be a first-class entity in any knowledge-intensive process, and all workflows should have exceptional attention to datasets' lifecycle, from their gathering to uses and evolution. We advocate that science and engineering discovery processes are extreme instances of the need for such organization on datasets, claiming for new approaches and tooling. Furthermore, these requirements are more evident when the discovery workflow uses artificial intelligence methods to empower the subject-matter expert. In this work, we discuss an approach to bringing datasets as a critical entity in the discovery process in science. We illustrate some concepts using material discovery as a use case. We chose this domain because it leverages many significant problems that can be generalized to other science fields.  ( 2 min )
    On the effectiveness of neural priors in modeling dynamical systems. (arXiv:2303.05728v1 [cs.LG])
    Modelling dynamical systems is an integral component for understanding the natural world. To this end, neural networks are becoming an increasingly popular candidate owing to their ability to learn complex functions from large amounts of data. Despite this recent progress, there has not been an adequate discussion on the architectural regularization that neural networks offer when learning such systems, hindering their efficient usage. In this paper, we initiate a discussion in this direction using coordinate networks as a test bed. We interpret dynamical systems and coordinate networks from a signal processing lens, and show that simple coordinate networks with few layers can be used to solve multiple problems in modelling dynamical systems, without any explicit regularizers.  ( 2 min )
    Feature Unlearning for Generative Models via Implicit Feedback. (arXiv:2303.05699v1 [cs.CV])
    We tackle the problem of feature unlearning from a pretrained image generative model. Unlike a common unlearning task where an unlearning target is a subset of the training set, we aim to unlearn a specific feature, such as hairstyle from facial images, from the pretrained generative models. As the target feature is only presented in a local region of an image, unlearning the entire image from the pretrained model may result in losing other details in the remaining region of the image. To specify which features to unlearn, we develop an implicit feedback mechanism where a user can select images containing the target feature. From the implicit feedback, we identify a latent representation corresponding to the target feature and then use the representation to unlearn the generative model. Our framework is generalizable for the two well-known families of generative models: GANs and VAEs. Through experiments on MNIST and CelebA datasets, we show that target features are successfully removed while keeping the fidelity of the original models.  ( 2 min )
    Explaining Model Confidence Using Counterfactuals. (arXiv:2303.05729v1 [cs.AI])
    Displaying confidence scores in human-AI interaction has been shown to help build trust between humans and AI systems. However, most existing research uses only the confidence score as a form of communication. As confidence scores are just another model output, users may want to understand why the algorithm is confident to determine whether to accept the confidence score. In this paper, we show that counterfactual explanations of confidence scores help study participants to better understand and better trust a machine learning model's prediction. We present two methods for understanding model confidence using counterfactual explanation: (1) based on counterfactual examples; and (2) based on visualisation of the counterfactual space. Both increase understanding and trust for study participants over a baseline of no explanation, but qualitative results show that they are used quite differently, leading to recommendations of when to use each one and directions of designing better explanations.  ( 2 min )
    Pacos: Modeling Users' Interpretable and Context-Dependent Choices in Preference Reversals. (arXiv:2303.05648v1 [cs.IR])
    Choice problems refer to selecting the best choices from several items, and learning users' preferences in choice problems is of great significance in understanding the decision making mechanisms and providing personalized services. Existing works typically assume that people evaluate items independently. In practice, however, users' preferences depend on the market in which items are placed, which is known as context effects; and the order of users' preferences for two items may even be reversed, which is referred to preference reversals. In this work, we identify three factors contributing to context effects: users' adaptive weights, the inter-item comparison, and display positions. We propose a context-dependent preference model named Pacos as a unified framework for addressing three factors simultaneously, and consider two design methods including an additive method with high interpretability and an ANN-based method with high accuracy. We study the conditions for preference reversals to occur and provide an theoretical proof of the effectiveness of Pacos in addressing preference reversals. Experimental results show that the proposed method has better performance than prior works in predicting users' choices, and has great interpretability to help understand the cause of preference reversals.  ( 2 min )
    Gaussian Max-Value Entropy Search for Multi-Agent Bayesian Optimization. (arXiv:2303.05694v1 [cs.LG])
    We study the multi-agent Bayesian optimization (BO) problem, where multiple agents maximize a black-box function via iterative queries. We focus on Entropy Search (ES), a sample-efficient BO algorithm that selects queries to maximize the mutual information about the maximum of the black-box function. One of the main challenges of ES is that calculating the mutual information requires computationally-costly approximation techniques. For multi-agent BO problems, the computational cost of ES is exponential in the number of agents. To address this challenge, we propose the Gaussian Max-value Entropy Search, a multi-agent BO algorithm with favorable sample and computational efficiency. The key to our idea is to use a normal distribution to approximate the function maximum and calculate its mutual information accordingly. The resulting approximation allows queries to be cast as the solution of a closed-form optimization problem which, in turn, can be solved via a modified gradient ascent algorithm and scaled to a large number of agents. We demonstrate the effectiveness of Gaussian max-value Entropy Search through numerical experiments on standard test functions and real-robot experiments on the source-seeking problem. Results show that the proposed algorithm outperforms the multi-agent BO baselines in the numerical experiments and can stably seek the source with a limited number of noisy observations on real robots.  ( 2 min )
    On the Unlikelihood of D-Separation. (arXiv:2303.05628v1 [cs.LG])
    Causal discovery aims to recover a causal graph from data generated by it; constraint based methods do so by searching for a d-separating conditioning set of nodes in the graph via an oracle. In this paper, we provide analytic evidence that on large graphs, d-separation is a rare phenomenon, even when guaranteed to exist, unless the graph is extremely sparse. We then provide an analytic average case analysis of the PC Algorithm for causal discovery, as well as a variant of the SGS Algorithm we call UniformSGS. We consider a set $V=\{v_1,\ldots,v_n\}$ of nodes, and generate a random DAG $G=(V,E)$ where $(v_a, v_b) \in E$ with i.i.d. probability $p_1$ if $a b$. We provide upper bounds on the probability that a subset of $V-\{x,y\}$ d-separates $x$ and $y$, conditional on $x$ and $y$ being d-separable; our upper bounds decay exponentially fast to $0$ as $|V| \rightarrow \infty$. For the PC Algorithm, while it is known that its worst-case guarantees fail on non-sparse graphs, we show that the same is true for the average case, and that the sparsity requirement is quite demanding: for good performance, the density must go to $0$ as $|V| \rightarrow \infty$ even in the average case. For UniformSGS, while it is known that the running time is exponential for existing edges, we show that in the average case, that is the expected running time for most non-existing edges as well.  ( 2 min )
    Explainable Semantic Medical Image Segmentation with Style. (arXiv:2303.05696v1 [eess.IV])
    Semantic medical image segmentation using deep learning has recently achieved high accuracy, making it appealing to clinical problems such as radiation therapy. However, the lack of high-quality semantically labelled data remains a challenge leading to model brittleness to small shifts to input data. Most works require extra data for semi-supervised learning and lack the interpretability of the boundaries of the training data distribution during training, which is essential for model deployment in clinical practice. We propose a fully supervised generative framework that can achieve generalisable segmentation with only limited labelled data by simultaneously constructing an explorable manifold during training. The proposed approach creates medical image style paired with a segmentation task driven discriminator incorporating end-to-end adversarial training. The discriminator is generalised to small domain shifts as much as permissible by the training data, and the generator automatically diversifies the training samples using a manifold of input features learnt during segmentation. All the while, the discriminator guides the manifold learning by supervising the semantic content and fine-grained features separately during the image diversification. After training, visualisation of the learnt manifold from the generator is available to interpret the model limits. Experiments on a fully semantic, publicly available pelvis dataset demonstrated that our method is more generalisable to shifts than other state-of-the-art methods while being more explainable using an explorable manifold.  ( 2 min )
    Hierarchical Clustering with OWA-based Linkages, the Lance-Williams Formula, and Dendrogram Inversions. (arXiv:2303.05683v1 [stat.ML])
    Agglomerative hierarchical clustering based on Ordered Weighted Averaging (OWA) operators not only generalises the single, complete, and average linkages, but also includes intercluster distances based on a few nearest or farthest neighbours, trimmed and winsorised means of pairwise point similarities, amongst many others. We explore the relationships between the famous Lance-Williams update formula and the extended OWA-based linkages with weights generated via infinite coefficient sequences. Furthermore, we provide some conditions for the weight generators to guarantee the resulting dendrograms to be free from unaesthetic inversions.  ( 2 min )
    KGNv2: Separating Scale and Pose Prediction for Keypoint-based 6-DoF Grasp Pose Synthesis on RGB-D input. (arXiv:2303.05617v1 [cs.RO])
    We propose a new 6-DoF grasp pose synthesis approach from 2D/2.5D input based on keypoints. Keypoint-based grasp detector from image input has demonstrated promising results in the previous study, where the additional visual information provided by color images compensates for the noisy depth perception. However, it relies heavily on accurately predicting the location of keypoints in the image space. In this paper, we devise a new grasp generation network that reduces the dependency on precise keypoint estimation. Given an RGB-D input, our network estimates both the grasp pose from keypoint detection as well as scale towards the camera. We further re-design the keypoint output space in order to mitigate the negative impact of keypoint prediction noise to Perspective-n-Point (PnP) algorithm. Experiments show that the proposed method outperforms the baseline by a large margin, validating the efficacy of our approach. Finally, despite trained on simple synthetic objects, our method demonstrate sim-to-real capacity by showing competitive results in real-world robot experiments.  ( 2 min )
    Towards better traffic volume estimation: Tackling both underdetermined and non-equilibrium problems via a correlation adaptive graph convolution network. (arXiv:2303.05660v1 [stat.ML])
    Traffic volume is an indispensable ingredient to provide fine-grained information for traffic management and control. However, due to limited deployment of traffic sensors, obtaining full-scale volume information is far from easy. Existing works on this topic primarily focus on improving the overall estimation accuracy of a particular method and ignore the underlying challenges of volume estimation, thereby having inferior performances on some critical tasks. This paper studies two key problems with regard to traffic volume estimation: (1) underdetermined traffic flows caused by undetected movements, and (2) non-equilibrium traffic flows arise from congestion propagation. Here we demonstrate a graph-based deep learning method that can offer a data-driven, model-free and correlation adaptive approach to tackle the above issues and perform accurate network-wide traffic volume estimation. Particularly, in order to quantify the dynamic and nonlinear relationships between traffic speed and volume for the estimation of underdetermined flows, a speed patternadaptive adjacent matrix based on graph attention is developed and integrated into the graph convolution process, to capture non-local correlations between sensors. To measure the impacts of non-equilibrium flows, a temporal masked and clipped attention combined with a gated temporal convolution layer is customized to capture time-asynchronous correlations between upstream and downstream sensors. We then evaluate our model on a real-world highway traffic volume dataset and compare it with several benchmark models. It is demonstrated that the proposed model achieves high estimation accuracy even under 20% sensor coverage rate and outperforms other baselines significantly, especially on underdetermined and non-equilibrium flow locations. Furthermore, comprehensive quantitative model analysis are also carried out to justify the model designs.  ( 2 min )
    Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers. (arXiv:2303.05691v1 [cs.CV])
    Despite the impressive performance of vision-based pose estimators, they generally fail to perform well under adverse vision conditions and often don't satisfy the privacy demands of customers. As a result, researchers have begun to study tactile sensing systems as an alternative. However, these systems suffer from noisy and ambiguous recordings. To tackle this problem, we propose a novel solution for pose estimation from ambiguous pressure data. Our method comprises a spatio-temporal vision transformer with an encoder-decoder architecture. Detailed experiments on two popular public datasets reveal that our model outperforms existing solutions in the area. Moreover, we observe that increasing the number of temporal crops in the early stages of the network positively impacts the performance while pre-training the network in a self-supervised setting using a masked auto-encoder approach also further improves the results.  ( 2 min )
    On the Soundness of XAI in Prognostics and Health Management (PHM). (arXiv:2303.05517v1 [cs.LG])
    The aim of Predictive Maintenance, within the field of Prognostics and Health Management (PHM), is to identify and anticipate potential issues in the equipment before these become critical. The main challenge to be addressed is to assess the amount of time a piece of equipment will function effectively before it fails, which is known as Remaining Useful Life (RUL). Deep Learning (DL) models, such as Deep Convolutional Neural Networks (DCNN) and Long Short-Term Memory (LSTM) networks, have been widely adopted to address the task, with great success. However, it is well known that this kind of black box models are opaque decision systems, and it may be hard to explain its outputs to stakeholders (experts in the industrial equipment). Due to the large number of parameters that determine the behavior of these complex models, understanding the reasoning behind the predictions is challenging. This work presents a critical and comparative revision on a number of XAI methods applied on time series regression model for PM. The aim is to explore XAI methods within time series regression, which have been less studied than those for time series classification. The model used during the experimentation is a DCNN trained to predict the RUL of an aircraft engine. The methods are reviewed and compared using a set of metrics that quantifies a number of desirable properties that any XAI method should fulfill. The results show that GRAD-CAM is the most robust method, and that the best layer is not the bottom one, as is commonly seen within the context of Image Processing.  ( 2 min )
    Computably Continuous Reinforcement-Learning Objectives are PAC-learnable. (arXiv:2303.05518v1 [cs.LG])
    In reinforcement learning, the classic objectives of maximizing discounted and finite-horizon cumulative rewards are PAC-learnable: There are algorithms that learn a near-optimal policy with high probability using a finite amount of samples and computation. In recent years, researchers have introduced objectives and corresponding reinforcement-learning algorithms beyond the classic cumulative rewards, such as objectives specified as linear temporal logic formulas. However, questions about the PAC-learnability of these new objectives have remained open. This work demonstrates the PAC-learnability of general reinforcement-learning objectives through sufficient conditions for PAC-learnability in two analysis settings. In particular, for the analysis that considers only sample complexity, we prove that if an objective given as an oracle is uniformly continuous, then it is PAC-learnable. Further, for the analysis that considers computational complexity, we prove that if an objective is computable, then it is PAC-learnable. In other words, if a procedure computes successive approximations of the objective's value, then the objective is PAC-learnable. We give three applications of our condition on objectives from the literature with previously unknown PAC-learnability and prove that these objectives are PAC-learnable. Overall, our result helps verify existing objectives' PAC-learnability. Also, as some studied objectives that are not uniformly continuous have been shown to be not PAC-learnable, our results could guide the design of new PAC-learnable objectives.  ( 2 min )
    Optimal active particle navigation meets machine learning. (arXiv:2303.05558v1 [cond-mat.soft])
    The question of how "smart" active agents, like insects, microorganisms, or future colloidal robots need to steer to optimally reach or discover a target, such as an odor source, food, or a cancer cell in a complex environment has recently attracted great interest. Here, we provide an overview of recent developments, regarding such optimal navigation problems, from the micro- to the macroscale, and give a perspective by discussing some of the challenges which are ahead of us. Besides exemplifying an elementary approach to optimal navigation problems, the article focuses on works utilizing machine learning-based methods. Such learning-based approaches can uncover highly efficient navigation strategies even for problems that involve e.g. chaotic, high-dimensional, or unknown environments and are hardly solvable based on conventional analytical or simulation methods.  ( 2 min )
    A Lite Fireworks Algorithm with Fractal Dimension Constraint for Feature Selection. (arXiv:2303.05516v1 [cs.LG])
    As the use of robotics becomes more widespread, the huge amount of vision data leads to a dramatic increase in data dimensionality. Although deep learning methods can effectively process these high-dimensional vision data. Due to the limitation of computational resources, some special scenarios still rely on traditional machine learning methods. However, these high-dimensional visual data lead to great challenges for traditional machine learning methods. Therefore, we propose a Lite Fireworks Algorithm with Fractal Dimension constraint for feature selection (LFWA+FD) and use it to solve the feature selection problem driven by robot vision. The "LFWA+FD" focuses on searching the ideal feature subset by simplifying the fireworks algorithm and constraining the dimensionality of selected features by fractal dimensionality, which in turn reduces the approximate features and reduces the noise in the original data to improve the accuracy of the model. The comparative experimental results of two publicly available datasets from UCI show that the proposed method can effectively select a subset of features useful for model inference and remove a large amount of noise noise present in the original data to improve the performance.  ( 2 min )
    EfficientTempNet: Temporal Super-Resolution of Radar Rainfall. (arXiv:2303.05552v1 [cs.CV])
    Rainfall data collected by various remote sensing instruments such as radars or satellites has different space-time resolutions. This study aims to improve the temporal resolution of radar rainfall products to help with more accurate climate change modeling and studies. In this direction, we introduce a solution based on EfficientNetV2, namely EfficientTempNet, to increase the temporal resolution of radar-based rainfall products from 10 minutes to 5 minutes. We tested EfficientRainNet over a dataset for the state of Iowa, US, and compared its performance to three different baselines to show that EfficientTempNet presents a viable option for better climate change monitoring.  ( 2 min )
    Hardware Acceleration of Neural Graphics. (arXiv:2303.05735v1 [cs.AR])
    Rendering and inverse-rendering algorithms that drive conventional computer graphics have recently been superseded by neural representations (NR). NRs have recently been used to learn the geometric and the material properties of the scenes and use the information to synthesize photorealistic imagery, thereby promising a replacement for traditional rendering algorithms with scalable quality and predictable performance. In this work we ask the question: Does neural graphics (NG) need hardware support? We studied representative NG applications showing that, if we want to render 4k res. at 60FPS there is a gap of 1.5X-55X in the desired performance on current GPUs. For AR/VR applications, there is an even larger gap of 2-4 OOM between the desired performance and the required system power. We identify that the input encoding and the MLP kernels are the performance bottlenecks, consuming 72%,60% and 59% of application time for multi res. hashgrid, multi res. densegrid and low res. densegrid encodings, respectively. We propose a NG processing cluster, a scalable and flexible hardware architecture that directly accelerates the input encoding and MLP kernels through dedicated engines and supports a wide range of NG applications. We also accelerate the rest of the kernels by fusing them together in Vulkan, which leads to 9.94X kernel-level performance improvement compared to un-fused implementation of the pre-processing and the post-processing kernels. Our results show that, NGPC gives up to 58X end-to-end application-level performance improvement, for multi res. hashgrid encoding on average across the four NG applications, the performance benefits are 12X,20X,33X and 39X for the scaling factor of 8,16,32 and 64, respectively. Our results show that with multi res. hashgrid encoding, NGPC enables the rendering of 4k res. at 30FPS for NeRF and 8k res. at 120FPS for all our other NG applications.  ( 2 min )
    A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning. (arXiv:2303.05710v1 [cs.DB])
    Recently using machine learning (ML) based techniques to optimize modern database management systems has attracted intensive interest from both industry and academia. With an objective to tune a specific component of a DBMS (e.g., index selection, knobs tuning), the ML-based tuning agents have shown to be able to find better configurations than experienced database administrators. However, one critical yet challenging question remains unexplored -- how to make those ML-based tuning agents work collaboratively. Existing methods do not consider the dependencies among the multiple agents, and the model used by each agent only studies the effect of changing the configurations in a single component. To tune different components for DBMS, a coordinating mechanism is needed to make the multiple agents cognizant of each other. Also, we need to decide how to allocate the limited tuning budget among the agents to maximize the performance. Such a decision is difficult to make since the distribution of the reward for each agent is unknown and non-stationary. In this paper, we study the above question and present a unified coordinating framework to efficiently utilize existing ML-based agents. First, we propose a message propagation protocol that specifies the collaboration behaviors for agents and encapsulates the global tuning messages in each agent's model. Second, we combine Thompson Sampling, a well-studied reinforcement learning algorithm with a memory buffer so that our framework can allocate budget judiciously in a non-stationary environment. Our framework defines the interfaces adapted to a broad class of ML-based tuning agents, yet simple enough for integration with existing implementations and future extensions. We show that it can effectively utilize different ML-based agents and find better configurations with 1.4~14.1X speedups on the workload execution time compared with baselines.  ( 2 min )
    Self-Supervised One-Shot Learning for Automatic Segmentation of StyleGAN Images. (arXiv:2303.05639v1 [cs.CV])
    We propose in this paper a framework for automatic one-shot segmentation of synthetic images generated using StyleGANs. As to the need for `one-shot segmentation', we want the network to carry out a semantic segmentation of the images on the fly, that is, as they are being produced at inference time. The implementation of our framework is based on the observation that the multi-scale hidden features produced by a GAN during image synthesis hold useful semantic information that can be utilized for automatic segmentation. Using these features, our proposed framework learns to segment synthetic images using a novel self-supervised, contrastive clustering algorithm that projects the hidden features in the generator onto a compact feature space for per-pixel classification. This contrastive learner uses a swapped prediction loss for image segmentation that is computed using pixel-wise cluster assignments for the image and its transformed variants. Using the hidden features from an already pre-trained GAN for clustering, this leads to a much faster learning of the pixel-wise feature vectors for one-shot segmentation. We have tested our implementation on a number of standard benchmarks (CelebA, LSUN, PASCAL-Part) for object and part segmentation. The results of our experiments yield a segmentation performance that not only outperforms the semi-supervised baseline methods with an average wIoU margin of 1.02 % but also improves the inference speeds by a peak factor of 4.5. Finally, we also show the results of using the proposed framework in the implementation of BagGAN, a GAN-based framework for the production of annotated synthetic baggage X-ray scans for threat detection. This one-shot learning framework was trained and tested on the PIDRay baggage screening benchmark for 5 different threat categories to yield a segmentation performance which stands close to its baseline segmenter.  ( 2 min )
    Upper Bound of Real Log Canonical Threshold of Tensor Decomposition and its Application to Bayesian Inference. (arXiv:2303.05731v1 [cs.LG])
    Tensor decomposition is now being used for data analysis, information compression, and knowledge recovery. However, the mathematical property of tensor decomposition is not yet fully clarified because it is one of singular learning machines. In this paper, we give the upper bound of its real log canonical threshold (RLCT) of the tensor decomposition by using an algebraic geometrical method and derive its Bayesian generalization error theoretically. We also give considerations about its mathematical property through numerical experiments.  ( 2 min )
    Boosting Adversarial Attacks by Leveraging Decision Boundary Information. (arXiv:2303.05719v1 [cs.CV])
    Due to the gap between a substitute model and a victim model, the gradient-based noise generated from a substitute model may have low transferability for a victim model since their gradients are different. Inspired by the fact that the decision boundaries of different models do not differ much, we conduct experiments and discover that the gradients of different models are more similar on the decision boundary than in the original position. Moreover, since the decision boundary in the vicinity of an input image is flat along most directions, we conjecture that the boundary gradients can help find an effective direction to cross the decision boundary of the victim models. Based on it, we propose a Boundary Fitting Attack to improve transferability. Specifically, we introduce a method to obtain a set of boundary points and leverage the gradient information of these points to update the adversarial examples. Notably, our method can be combined with existing gradient-based methods. Extensive experiments prove the effectiveness of our method, i.e., improving the success rate by 5.6% against normally trained CNNs and 14.9% against defense CNNs on average compared to state-of-the-art transfer-based attacks. Further we compare transformers with CNNs, the results indicate that transformers are more robust than CNNs. However, our method still outperforms existing methods when attacking transformers. Specifically, when using CNNs as substitute models, our method obtains an average attack success rate of 58.2%, which is 10.8% higher than other state-of-the-art transfer-based attacks.  ( 2 min )
    Provably Efficient Model-Free Algorithms for Non-stationary CMDPs. (arXiv:2303.05733v1 [cs.LG])
    We study model-free reinforcement learning (RL) algorithms in episodic non-stationary constrained Markov Decision Processes (CMDPs), in which an agent aims to maximize the expected cumulative reward subject to a cumulative constraint on the expected utility (cost). In the non-stationary environment, reward, utility functions, and transition kernels can vary arbitrarily over time as long as the cumulative variations do not exceed certain variation budgets. We propose the first model-free, simulator-free RL algorithms with sublinear regret and zero constraint violation for non-stationary CMDPs in both tabular and linear function approximation settings with provable performance guarantees. Our results on regret bound and constraint violation for the tabular case match the corresponding best results for stationary CMDPs when the total budget is known. Additionally, we present a general framework for addressing the well-known challenges associated with analyzing non-stationary CMDPs, without requiring prior knowledge of the variation budget. We apply the approach for both tabular and linear approximation settings.  ( 2 min )
    Boosting Semi-Supervised Few-Shot Object Detection with SoftER Teacher. (arXiv:2303.05739v1 [cs.CV])
    Few-shot object detection is an emerging problem aimed at detecting novel concepts from few exemplars. Existing approaches to few-shot detection assume abundant base labels to adapt to novel objects. This paper explores the task of semi-supervised few-shot detection by considering a realistic scenario which lacks abundant labels for both base and novel objects. Motivated by this unique problem, we introduce SoftER Teacher, a robust detector combining the advantages of pseudo-labeling with representation learning on region proposals. SoftER Teacher harnesses unlabeled data to jointly optimize for semi-supervised few-shot detection without explicitly relying on abundant base labels. Extensive experiments show that SoftER Teacher matches the novel class performance of a strong supervised detector using only 10% of base labels. Our work also sheds insight into a previously unknown relationship between semi-supervised and few-shot detection to suggest that a stronger semi-supervised detector leads to a more label-efficient few-shot detector. Code and models are available at https://github.com/lexisnexis-risk-open-source/ledetection  ( 2 min )
    Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings. (arXiv:2303.05737v1 [eess.AS])
    Automatic Speech Recognition (ASR) in medical contexts has the potential to save time, cut costs, increase report accuracy, and reduce physician burnout. However, the healthcare industry has been slower to adopt this technology, in part due to the importance of avoiding medically-relevant transcription mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR metric that penalizes clinically-relevant mistakes more than others. We demonstrate that this metric more closely aligns with clinician preferences on medical sentences as compared to other metrics (WER, BLUE, METEOR, etc), sometimes by wide margins. We collect a benchmark of 13 clinician preferences on 149 realistic medical sentences called the Clinician Transcript Preference benchmark (CTP), demonstrate that CBERTScore more closely matches what clinicians prefer, and release the benchmark for the community to further develop clinically-aware ASR metrics.  ( 2 min )
    Fusarium head blight detection, spikelet estimation, and severity assessment in wheat using 3D convolutional neural networks. (arXiv:2303.05634v1 [cs.CV])
    Fusarium head blight (FHB) is one of the most significant diseases affecting wheat and other small grain cereals worldwide. The development of resistant varieties requires the laborious task of field and greenhouse phenotyping. The applications considered in this work are the automated detection of FHB disease symptoms expressed on a wheat plant, the automated estimation of the total number of spikelets and the total number of infected spikelets on a wheat head, and the automated assessment of the FHB severity in infected wheat. The data used to generate the results are 3-dimensional (3D) multispectral point clouds (PC), which are 3D collections of points - each associated with a red, green, blue (RGB), and near-infrared (NIR) measurement. Over 300 wheat plant images were collected using a multispectral 3D scanner, and the labelled UW-MRDC 3D wheat dataset was created. The data was used to develop novel and efficient 3D convolutional neural network (CNN) models for FHB detection, which achieved 100% accuracy. The influence of the multispectral information on performance was evaluated, and our results showed the dominance of the RGB channels over both the NIR and the NIR plus RGB channels combined. Furthermore, novel and efficient 3D CNNs were created to estimate the total number of spikelets and the total number of infected spikelets on a wheat head, and our best models achieved mean absolute errors (MAE) of 1.13 and 1.56, respectively. Moreover, 3D CNN models for FHB severity estimation were created, and our best model achieved 8.6 MAE. A linear regression analysis between the visual FHB severity assessment and the FHB severity predicted by our 3D CNN was performed, and the results showed a significant correlation between the two variables with a 0.0001 P-value and 0.94 R-squared.  ( 3 min )
    A dual basis approach to multidimensional scaling: spectral analysis and graph regularity. (arXiv:2303.05682v1 [math.SP])
    Classical multidimensional scaling (CMDS) is a technique that aims to embed a set of objects in a Euclidean space given their pairwise Euclidean distance matrix. The main part of CMDS is based on double centering a squared distance matrix and employing a truncated eigendecomposition to recover the point coordinates. A central result in CMDS connects the squared Euclidean matrix to a Gram matrix derived from the set of points. In this paper, we study a dual basis approach to classical multidimensional scaling. We give an explicit formula for the dual basis and fully characterize the spectrum of an essential matrix in the dual basis framework. We make connections to a related problem in metric nearness.  ( 2 min )
    EHRDiff: Exploring Realistic EHR Synthesis with Diffusion Models. (arXiv:2303.05656v1 [cs.LG])
    Electronic health records (EHR) contain vast biomedical knowledge and are rich resources for developing precise medicine systems. However, due to privacy concerns, there are limited high-quality EHR data accessible to researchers hence hindering the advancement of methodologies. Recent research has explored using generative modelling methods to synthesize realistic EHR data, and most proposed methods are based on the generative adversarial network (GAN) and its variants for EHR synthesis. Although GAN-style methods achieved state-of-the-art performance in generating high-quality EHR data, such methods are hard to train and prone to mode collapse. Diffusion models are recently proposed generative modelling methods and set cutting-edge performance in image generation. The performance of diffusion models in realistic EHR synthesis is rarely explored. In this work, we explore whether the superior performance of diffusion models can translate to the domain of EHR synthesis and propose a novel EHR synthesis method named EHRDiff. Through comprehensive experiments, EHRDiff achieves new state-of-the-art performance for the quality of synthetic EHR data and can better protect private information in real training EHRs in the meanwhile.  ( 2 min )
    Fairness-enhancing deep learning for ride-hailing demand prediction. (arXiv:2303.05698v1 [cs.LG])
    Short-term demand forecasting for on-demand ride-hailing services is one of the fundamental issues in intelligent transportation systems. However, previous travel demand forecasting research predominantly focused on improving prediction accuracy, ignoring fairness issues such as systematic underestimations of travel demand in disadvantaged neighborhoods. This study investigates how to measure, evaluate, and enhance prediction fairness between disadvantaged and privileged communities in spatial-temporal demand forecasting of ride-hailing services. A two-pronged approach is taken to reduce the demand prediction bias. First, we develop a novel deep learning model architecture, named socially aware neural network (SA-Net), to integrate the socio-demographics and ridership information for fair demand prediction through an innovative socially-aware convolution operation. Second, we propose a bias-mitigation regularization method to mitigate the mean percentage prediction error gap between different groups. The experimental results, validated on the real-world Chicago Transportation Network Company (TNC) data, show that the de-biasing SA-Net can achieve better predictive performance in both prediction accuracy and fairness. Specifically, the SA-Net improves prediction accuracy for both the disadvantaged and privileged groups compared with the state-of-the-art models. When coupled with the bias mitigation regularization method, the de-biasing SA-Net effectively bridges the mean percentage prediction error gap between the disadvantaged and privileged groups, and also protects the disadvantaged regions against systematic underestimation of TNC demand. Our proposed de-biasing method can be adopted in many existing short-term travel demand estimation models, and can be utilized for various other spatial-temporal prediction tasks such as crime incidents predictions.  ( 2 min )
    Efficient Real Time Recurrent Learning through combined activity and parameter sparsity. (arXiv:2303.05641v1 [cs.LG])
    Backpropagation through time (BPTT) is the standard algorithm for training recurrent neural networks (RNNs), which requires separate simulation phases for the forward and backward passes for inference and learning, respectively. Moreover, BPTT requires storing the complete history of network states between phases, with memory consumption growing proportional to the input sequence length. This makes BPTT unsuited for online learning and presents a challenge for implementation on low-resource real-time systems. Real-Time Recurrent Learning (RTRL) allows online learning, and the growth of required memory is independent of sequence length. However, RTRL suffers from exceptionally high computational costs that grow proportional to the fourth power of the state size, making RTRL computationally intractable for all but the smallest of networks. In this work, we show that recurrent networks exhibiting high activity sparsity can reduce the computational cost of RTRL. Moreover, combining activity and parameter sparsity can lead to significant enough savings in computational and memory costs to make RTRL practical. Unlike previous work, this improvement in the efficiency of RTRL can be achieved without using any approximations for the learning process.  ( 2 min )
    Improving Weakly Supervised Sound Event Detection with Causal Intervention. (arXiv:2303.05678v1 [cs.SD])
    Existing weakly supervised sound event detection (WSSED) work has not explored both types of co-occurrences simultaneously, i.e., some sound events often co-occur, and their occurrences are usually accompanied by specific background sounds, so they would be inevitably entangled, causing misclassification and biased localization results with only clip-level supervision. To tackle this issue, we first establish a structural causal model (SCM) to reveal that the context is the main cause of co-occurrence confounders that mislead the model to learn spurious correlations between frames and clip-level labels. Based on the causal analysis, we propose a causal intervention (CI) method for WSSED to remove the negative impact of co-occurrence confounders by iteratively accumulating every possible context of each class and then re-projecting the contexts to the frame-level features for making the event boundary clearer. Experiments show that our method effectively improves the performance on multiple datasets and can generalize to various baseline models.  ( 2 min )
    Variance-aware robust reinforcement learning with linear function approximation with heavy-tailed rewards. (arXiv:2303.05606v1 [cs.LG])
    This paper presents two algorithms, AdaOFUL and VARA, for online sequential decision-making in the presence of heavy-tailed rewards with only finite variances. For linear stochastic bandits, we address the issue of heavy-tailed rewards by modifying the adaptive Huber regression and proposing AdaOFUL. AdaOFUL achieves a state-of-the-art regret bound of $\widetilde{\mathcal{O}}\big(d\big(\sum_{t=1}^T \nu_{t}^2\big)^{1/2}+d\big)$ as if the rewards were uniformly bounded, where $\nu_{t}^2$ is the observed conditional variance of the reward at round $t$, $d$ is the feature dimension, and $\widetilde{\mathcal{O}}(\cdot)$ hides logarithmic dependence. Building upon AdaOFUL, we propose VARA for linear MDPs, which achieves a tighter variance-aware regret bound of $\widetilde{\mathcal{O}}(d\sqrt{H\mathcal{G}^*K})$. Here, $H$ is the length of episodes, $K$ is the number of episodes, and $\mathcal{G}^*$ is a smaller instance-dependent quantity that can be bounded by other instance-dependent quantities when additional structural conditions on the MDP are satisfied. Our regret bound is superior to the current state-of-the-art bounds in three ways: (1) it depends on a tighter instance-dependent quantity and has optimal dependence on $d$ and $H$, (2) we can obtain further instance-dependent bounds of $\mathcal{G}^*$ under additional structural conditions on the MDP, and (3) our regret bound is valid even when rewards have only finite variances, achieving a level of generality unmatched by previous works. Overall, our modified adaptive Huber regression algorithm may serve as a useful building block in the design of algorithms for online problems with heavy-tailed rewards.  ( 2 min )
    Tradeoff of generalization error in unsupervised learning. (arXiv:2303.05718v1 [cond-mat.stat-mech])
    Finding the optimal model complexity that minimizes the generalization error (GE) is a key issue of machine learning. For the conventional supervised learning, this task typically involves the bias-variance tradeoff: lowering the bias by making the model more complex entails an increase in the variance. Meanwhile, little has been studied about whether the same tradeoff exists for unsupervised learning. In this study, we propose that unsupervised learning generally exhibits a two-component tradeoff of the GE, namely the model error and the data error -- using a more complex model reduces the model error at the cost of the data error, with the data error playing a more significant role for a smaller training dataset. This is corroborated by training the restricted Boltzmann machine to generate the configurations of the two-dimensional Ising model at a given temperature and the totally asymmetric simple exclusion process with given entry and exit rates. Our results also indicate that the optimal model tends to be more complex when the data to be learned are more complex.  ( 2 min )
    An Improved Data Augmentation Scheme for Model Predictive Control Policy Approximation. (arXiv:2303.05607v1 [eess.SY])
    This paper considers the problem of data generation for MPC policy approximation. Learning an approximate MPC policy from expert demonstrations requires a large data set consisting of optimal state-action pairs, sampled across the feasible state space. Yet, the key challenge of efficiently generating the training samples has not been studied widely. Recently, a sensitivity-based data augmentation framework for MPC policy approximation was proposed, where the parametric sensitivities are exploited to cheaply generate several additional samples from a single offline MPC computation. The error due to augmenting the training data set with inexact samples was shown to increase with the size of the neighborhood around each sample used for data augmentation. Building upon this work, this letter paper presents an improved data augmentation scheme based on predictor-corrector steps that enforces a user-defined level of accuracy, and shows that the error bound of the augmented samples are independent of the size of the neighborhood used for data augmentation.  ( 2 min )
    Generalization analysis of an unfolding network for analysis-based Compressed Sensing. (arXiv:2303.05582v1 [cs.LG])
    Unfolding networks have shown promising results in the Compressed Sensing (CS) field. Yet, the investigation of their generalization ability is still in its infancy. In this paper, we perform generalization analysis of a state-of-the-art ADMM-based unfolding network, which jointly learns a decoder for CS and a sparsifying redundant analysis operator. To this end, we first impose a structural constraint on the learnable sparsifier, which parametrizes the network's hypothesis class. For the latter, we estimate its Rademacher complexity. With this estimate in hand, we deliver generalization error bounds for the examined network. Finally, the validity of our theory is assessed and numerical comparisons to a state-of-the-art unfolding network are made, on synthetic and real-world datasets. Our experimental results demonstrate that our proposed framework complies with our theoretical findings and outperforms the baseline, consistently for all datasets.  ( 2 min )
    Learning the Wrong Lessons: Inserting Trojans During Knowledge Distillation. (arXiv:2303.05593v1 [cs.LG])
    In recent years, knowledge distillation has become a cornerstone of efficiently deployed machine learning, with labs and industries using knowledge distillation to train models that are inexpensive and resource-optimized. Trojan attacks have contemporaneously gained significant prominence, revealing fundamental vulnerabilities in deep learning models. Given the widespread use of knowledge distillation, in this work we seek to exploit the unlabelled data knowledge distillation process to embed Trojans in a student model without introducing conspicuous behavior in the teacher. We ultimately devise a Trojan attack that effectively reduces student accuracy, does not alter teacher performance, and is efficiently constructible in practice.  ( 2 min )
    Clustering with minimum spanning trees: How good can it be?. (arXiv:2303.05679v1 [stat.ML])
    Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they can be meaningful in data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can overall be very competitive. Next, instead of proposing yet another algorithm that performs well on a limited set of examples, we review, study, extend, and generalise existing, the state-of-the-art MST-based partitioning schemes, which leads to a few new and interesting approaches. It turns out that the Genie method and the information-theoretic approaches often outperform the non-MST algorithms such as k-means, Gaussian mixtures, spectral clustering, BIRCH, and classical hierarchical agglomerative procedures.  ( 2 min )
    Exploration of the search space of Gaussian graphical models for paired data. (arXiv:2303.05561v1 [stat.ML])
    We consider the problem of learning a Gaussian graphical model in the case where the observations come from two dependent groups sharing the same variables. We focus on a family of coloured Gaussian graphical models specifically suited for the paired data problem. Commonly, graphical models are ordered by the submodel relationship so that the search space is a lattice, called the model inclusion lattice. We introduce a novel order between models, named the twin order. We show that, embedded with this order, the model space is a lattice that, unlike the model inclusion lattice, is distributive. Furthermore, we provide the relevant rules for the computation of the neighbours of a model. The latter are more efficient than the same operations in the model inclusion lattice, and are then exploited to achieve a more efficient exploration of the search space. These results can be applied to improve the efficiency of both greedy and Bayesian model search procedures. Here we implement a stepwise backward elimination procedure and evaluate its performance by means of simulations. Finally, the procedure is applied to learn a brain network from fMRI data where the two groups correspond to the left and right hemispheres, respectively.  ( 2 min )
    Monitoring Efficiency of IoT Wireless Charging. (arXiv:2303.05629v1 [cs.NI])
    Crowdsourcing wireless energy is a novel and convenient solution to charge nearby IoT devices. Several applications have been proposed to enable peer-to-peer wireless energy charging. However, none of them considered the energy efficiency of the wireless transfer of energy. In this paper, we propose an energy estimation framework that predicts the actual received energy. Our framework uses two machine learning algorithms, namely XGBoost and Neural Network, to estimate the received energy. The result shows that the Neural Network model is better than XGBoost at predicting the received energy. We train and evaluate our models by collecting a real wireless energy dataset.  ( 2 min )
  • Open

    Metrizing Fairness. (arXiv:2205.15049v3 [cs.LG] UPDATED)
    We study supervised learning problems for predicting properties of individuals who belong to one of two demographic groups, and we seek predictors that are fair according to statistical parity. This means that the distributions of the predictions within the two groups should be close with respect to the Kolmogorov distance, and fairness is achieved by penalizing the dissimilarity of these two distributions in the objective function of the learning problem. In this paper, we showcase conceptual and computational benefits of measuring unfairness with integral probability metrics (IPMs) other than the Kolmogorov distance. Conceptually, we show that the generator of any IPM can be interpreted as a family of utility functions and that unfairness with respect to this IPM arises if individuals in the two demographic groups have diverging expected utilities. We also prove that the unfairness-regularized prediction loss admits unbiased gradient estimators if unfairness is measured by the squared $\mathcal L^2$-distance or by a squared maximum mean discrepancy. In this case, the fair learning problem is susceptible to efficient stochastic gradient descent (SGD) algorithms. Numerical experiments on real data show that these SGD algorithms outperform state-of-the-art methods for fair learning in that they achieve superior accuracy-unfairness trade-offs -- sometimes orders of magnitude faster. Finally, we identify conditions under which statistical parity can improve prediction accuracy.
    Sliced-Wasserstein on Symmetric Positive Definite Matrices for M/EEG Signals. (arXiv:2303.05798v1 [cs.LG])
    When dealing with electro or magnetoencephalography records, many supervised prediction tasks are solved by working with covariance matrices to summarize the signals. Learning with these matrices requires using Riemanian geometry to account for their structure. In this paper, we propose a new method to deal with distributions of covariance matrices and demonstrate its computational efficiency on M/EEG multivariate time series. More specifically, we define a Sliced-Wasserstein distance between measures of symmetric positive definite matrices that comes with strong theoretical guarantees. Then, we take advantage of its properties and kernel methods to apply this distance to brain-age prediction from MEG data and compare it to state-of-the-art algorithms based on Riemannian geometry. Finally, we show that it is an efficient surrogate to the Wasserstein distance in domain adaptation for Brain Computer Interface applications.
    A Contrastive Approach to Online Change Point Detection. (arXiv:2206.10143v2 [stat.ML] UPDATED)
    We suggest a novel procedure for online change point detection. Our approach expands an idea of maximizing a discrepancy measure between points from pre-change and post-change distributions. This leads to a flexible procedure suitable for both parametric and nonparametric scenarios. We prove non-asymptotic bounds on the average running length of the procedure and its expected detection delay. The efficiency of the algorithm is illustrated with numerical experiments on synthetic and real-world data sets.
    A pseudo-likelihood approach to community detection in weighted networks. (arXiv:2303.05909v1 [stat.ME])
    Community structure is common in many real networks, with nodes clustered in groups sharing the same connections patterns. While many community detection methods have been developed for networks with binary edges, few of them are applicable to networks with weighted edges, which are common in practice. We propose a pseudo-likelihood community estimation algorithm derived under the weighted stochastic block model for networks with normally distributed edge weights, extending the pseudo-likelihood algorithm for binary networks, which offers some of the best combinations of accuracy and computational efficiency. We prove that the estimates obtained by the proposed method are consistent under the assumption of homogeneous networks, a weighted analogue of the planted partition model, and show that they work well in practice for both homogeneous and heterogeneous networks. We illustrate the method on simulated networks and on a fMRI dataset, where edge weights represent connectivity between brain regions and are expected to be close to normal in distribution by construction.
    Modeling Events and Interactions through Temporal Processes -- A Survey. (arXiv:2303.06067v1 [cs.LG])
    In real-world scenario, many phenomena produce a collection of events that occur in continuous time. Point Processes provide a natural mathematical framework for modeling these sequences of events. In this survey, we investigate probabilistic models for modeling event sequences through temporal processes. We revise the notion of event modeling and provide the mathematical foundations that characterize the literature on the topic. We define an ontology to categorize the existing approaches in terms of three families: simple, marked, and spatio-temporal point processes. For each family, we systematically review the existing approaches based based on deep learning. Finally, we analyze the scenarios where the proposed techniques can be used for addressing prediction and modeling aspects.
    Approximate Regions of Attraction in Learning with Decision-Dependent Distributions. (arXiv:2107.00055v3 [cs.LG] UPDATED)
    As data-driven methods are deployed in real-world settings, the processes that generate the observed data will often react to the decisions of the learner. For example, a data source may have some incentive for the algorithm to provide a particular label (e.g. approve a bank loan), and manipulate their features accordingly. Work in strategic classification and decision-dependent distributions seeks to characterize the closed-loop behavior of deploying learning algorithms by explicitly considering the effect of the classifier on the underlying data distribution. More recently, works in performative prediction seek to classify the closed-loop behavior by considering general properties of the mapping from classifier to data distribution, rather than an explicit form. Building on this notion, we analyze repeated risk minimization as the perturbed trajectories of the gradient flows of performative risk minimization. We consider the case where there may be multiple local minimizers of performative risk, motivated by situations where the initial conditions may have significant impact on the long-term behavior of the system. We provide sufficient conditions to characterize the region of attraction for the various equilibria in this settings. Additionally, we introduce the notion of performative alignment, which provides a geometric condition on the convergence of repeated risk minimization to performative risk minimizers.
    Long-tailed Classification from a Bayesian-decision-theory Perspective. (arXiv:2303.06075v1 [cs.LG])
    Long-tailed classification poses a challenge due to its heavy imbalance in class probabilities and tail-sensitivity risks with asymmetric misprediction costs. Recent attempts have used re-balancing loss and ensemble methods, but they are largely heuristic and depend heavily on empirical results, lacking theoretical explanation. Furthermore, existing methods overlook the decision loss, which characterizes different costs associated with tailed classes. This paper presents a general and principled framework from a Bayesian-decision-theory perspective, which unifies existing techniques including re-balancing and ensemble methods, and provides theoretical justifications for their effectiveness. From this perspective, we derive a novel objective based on the integrated risk and a Bayesian deep-ensemble approach to improve the accuracy of all classes, especially the ``tail". Besides, our framework allows for task-adaptive decision loss which provides provably optimal decisions in varying task scenarios, along with the capability to quantify uncertainty. Finally, We conduct comprehensive experiments, including standard classification, tail-sensitive classification with a new False Head Rate metric, calibration, and ablation studies. Our framework significantly improves the current SOTA even on large-scale real-world datasets like ImageNet.
    Maximal Objectives in the Multi-armed Bandit with Applications. (arXiv:2006.06853v6 [cs.LG] UPDATED)
    In several applications of the stochastic multi-armed bandit problem, the traditional objective of maximizing the expected total reward can be inappropriate. In this paper, motivated by certain operational concerns in online platforms, we consider a new objective in the classical setup. Given $K$ arms, instead of maximizing the expected total reward from $T$ pulls (the traditional "sum" objective), we consider the vector of total rewards earned from each of the $K$ arms at the end of $T$ pulls and aim to maximize the expected highest total reward across arms (the "max" objective). For this objective, we show that any policy must incur an instance-dependent asymptotic regret of $\Omega(\log T)$ (with a higher instance-dependent constant compared to the traditional objective) and a worst-case regret of $\Omega(K^{1/3}T^{2/3})$. We then design an adaptive explore-then-commit policy featuring exploration based on appropriately tuned confidence bounds on the mean reward and an adaptive stopping criterion, which adapts to the problem difficulty and achieves these bounds (up to logarithmic factors). We then generalize our algorithmic insights to the problem of maximizing the expected value of the average total reward of the top $m$ arms with the highest total rewards. Our numerical experiments demonstrate the efficacy of our policies compared to several natural alternatives in practical parameter regimes. We discuss applications of these new objectives to the problem of grooming an adequate supply of value-providing market participants (workers/sellers/service providers) in online platforms.
    Seq2Seq Surrogates of Epidemic Models to Facilitate Bayesian Inference. (arXiv:2209.09617v2 [cs.LG] UPDATED)
    Epidemic models are powerful tools in understanding infectious disease. However, as they increase in size and complexity, they can quickly become computationally intractable. Recent progress in modelling methodology has shown that surrogate models can be used to emulate complex epidemic models with a high-dimensional parameter space. We show that deep sequence-to-sequence (seq2seq) models can serve as accurate surrogates for complex epidemic models with sequence based model parameters, effectively replicating seasonal and long-term transmission dynamics. Once trained, our surrogate can predict scenarios a several thousand times faster than the original model, making them ideal for policy exploration. We demonstrate that replacing a traditional epidemic model with a learned simulator facilitates robust Bayesian inference.
    The CMA Evolution Strategy: A Tutorial. (arXiv:1604.00772v2 [cs.LG] UPDATED)
    This tutorial introduces the CMA Evolution Strategy (ES), where CMA stands for Covariance Matrix Adaptation. The CMA-ES is a stochastic, or randomized, method for real-parameter (continuous domain) optimization of non-linear, non-convex functions. We try to motivate and derive the algorithm from intuitive concepts and from requirements of non-linear, non-convex search in continuous domain.
    Exploration of the search space of Gaussian graphical models for paired data. (arXiv:2303.05561v1 [stat.ML])
    We consider the problem of learning a Gaussian graphical model in the case where the observations come from two dependent groups sharing the same variables. We focus on a family of coloured Gaussian graphical models specifically suited for the paired data problem. Commonly, graphical models are ordered by the submodel relationship so that the search space is a lattice, called the model inclusion lattice. We introduce a novel order between models, named the twin order. We show that, embedded with this order, the model space is a lattice that, unlike the model inclusion lattice, is distributive. Furthermore, we provide the relevant rules for the computation of the neighbours of a model. The latter are more efficient than the same operations in the model inclusion lattice, and are then exploited to achieve a more efficient exploration of the search space. These results can be applied to improve the efficiency of both greedy and Bayesian model search procedures. Here we implement a stepwise backward elimination procedure and evaluate its performance by means of simulations. Finally, the procedure is applied to learn a brain network from fMRI data where the two groups correspond to the left and right hemispheres, respectively.
    A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms. (arXiv:2303.06058v1 [cs.LG])
    In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the sampling probability of each arm and on the family of distributions to prove a logarithmic regret. As a direct application we revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling (TS), under various models for the distributions including single parameter exponential families, Gaussian distributions, bounded distributions, or distributions satisfying some conditions on their moments. In particular, we prove that MED is asymptotically optimal for all these models, but also provide a simple regret analysis of some TS algorithms for which the optimality is already known. We then further illustrate the interest of our approach, by analyzing a new Non-Parametric TS algorithm (h-NPTS), adapted to some families of unbounded reward distributions with a bounded h-moment. This model can for instance capture some non-parametric families of distributions whose variance is upper bounded by a known constant.
    Privacy-Preserving and Lossless Distributed Estimation of High-Dimensional Generalized Additive Mixed Models. (arXiv:2210.07723v2 [stat.ML] UPDATED)
    Various privacy-preserving frameworks that respect the individual's privacy in the analysis of data have been developed in recent years. However, available model classes such as simple statistics or generalized linear models lack the flexibility required for a good approximation of the underlying data-generating process in practice. In this paper, we propose an algorithm for a distributed, privacy-preserving, and lossless estimation of generalized additive mixed models (GAMM) using component-wise gradient boosting (CWB). Making use of CWB allows us to reframe the GAMM estimation as a distributed fitting of base learners using the $L_2$-loss. In order to account for the heterogeneity of different data location sites, we propose a distributed version of a row-wise tensor product that allows the computation of site-specific (smooth) effects. Our adaption of CWB preserves all the important properties of the original algorithm, such as an unbiased feature selection and the feasibility to fit models in high-dimensional feature spaces, and yields equivalent model estimates as CWB on pooled data. Next to a derivation of the equivalence of both algorithms, we also showcase the efficacy of our algorithm on a distributed heart disease data set and compare it with state-of-the-art methods.
    Towards better traffic volume estimation: Tackling both underdetermined and non-equilibrium problems via a correlation adaptive graph convolution network. (arXiv:2303.05660v1 [stat.ML])
    Traffic volume is an indispensable ingredient to provide fine-grained information for traffic management and control. However, due to limited deployment of traffic sensors, obtaining full-scale volume information is far from easy. Existing works on this topic primarily focus on improving the overall estimation accuracy of a particular method and ignore the underlying challenges of volume estimation, thereby having inferior performances on some critical tasks. This paper studies two key problems with regard to traffic volume estimation: (1) underdetermined traffic flows caused by undetected movements, and (2) non-equilibrium traffic flows arise from congestion propagation. Here we demonstrate a graph-based deep learning method that can offer a data-driven, model-free and correlation adaptive approach to tackle the above issues and perform accurate network-wide traffic volume estimation. Particularly, in order to quantify the dynamic and nonlinear relationships between traffic speed and volume for the estimation of underdetermined flows, a speed patternadaptive adjacent matrix based on graph attention is developed and integrated into the graph convolution process, to capture non-local correlations between sensors. To measure the impacts of non-equilibrium flows, a temporal masked and clipped attention combined with a gated temporal convolution layer is customized to capture time-asynchronous correlations between upstream and downstream sensors. We then evaluate our model on a real-world highway traffic volume dataset and compare it with several benchmark models. It is demonstrated that the proposed model achieves high estimation accuracy even under 20% sensor coverage rate and outperforms other baselines significantly, especially on underdetermined and non-equilibrium flow locations. Furthermore, comprehensive quantitative model analysis are also carried out to justify the model designs.
    Rosenthal-type inequalities for linear statistics of Markov chains. (arXiv:2303.05838v1 [math.PR])
    In this paper, we establish novel deviation bounds for additive functionals of geometrically ergodic Markov chains similar to Rosenthal and Bernstein-type inequalities for sums of independent random variables. We pay special attention to the dependence of our bounds on the mixing time of the corresponding chain. Our proof technique is, as far as we know, new and based on the recurrent application of the Poisson decomposition. We relate the constants appearing in our moment bounds to the constants from the martingale version of the Rosenthal inequality and show an explicit dependence on the parameters of the underlying Markov kernel.
    POLICE: Provably Optimal Linear Constraint Enforcement for Deep Neural Networks. (arXiv:2211.01340v3 [cs.LG] UPDATED)
    Deep Neural Networks (DNNs) outshine alternative function approximators in many settings thanks to their modularity in composing any desired differentiable operator. The formed parametrized functional is then tuned to solve a task at hand from simple gradient descent. This modularity comes at the cost of making strict enforcement of constraints on DNNs, e.g. from a priori knowledge of the task, or from desired physical properties, an open challenge. In this paper we propose the first provable affine constraint enforcement method for DNNs that only requires minimal changes into a given DNN's forward-pass, that is computationally friendly, and that leaves the optimization of the DNN's parameter to be unconstrained, i.e. standard gradient-based method can be employed. Our method does not require any sampling and provably ensures that the DNN fulfills the affine constraint on a given input space's region at any point during training, and testing. We coin this method POLICE, standing for Provably Optimal LInear Constraint Enforcement. Github: https://github.com/RandallBalestriero/POLICE
    Variance-aware robust reinforcement learning with linear function approximation with heavy-tailed rewards. (arXiv:2303.05606v1 [cs.LG])
    This paper presents two algorithms, AdaOFUL and VARA, for online sequential decision-making in the presence of heavy-tailed rewards with only finite variances. For linear stochastic bandits, we address the issue of heavy-tailed rewards by modifying the adaptive Huber regression and proposing AdaOFUL. AdaOFUL achieves a state-of-the-art regret bound of $\widetilde{\mathcal{O}}\big(d\big(\sum_{t=1}^T \nu_{t}^2\big)^{1/2}+d\big)$ as if the rewards were uniformly bounded, where $\nu_{t}^2$ is the observed conditional variance of the reward at round $t$, $d$ is the feature dimension, and $\widetilde{\mathcal{O}}(\cdot)$ hides logarithmic dependence. Building upon AdaOFUL, we propose VARA for linear MDPs, which achieves a tighter variance-aware regret bound of $\widetilde{\mathcal{O}}(d\sqrt{H\mathcal{G}^*K})$. Here, $H$ is the length of episodes, $K$ is the number of episodes, and $\mathcal{G}^*$ is a smaller instance-dependent quantity that can be bounded by other instance-dependent quantities when additional structural conditions on the MDP are satisfied. Our regret bound is superior to the current state-of-the-art bounds in three ways: (1) it depends on a tighter instance-dependent quantity and has optimal dependence on $d$ and $H$, (2) we can obtain further instance-dependent bounds of $\mathcal{G}^*$ under additional structural conditions on the MDP, and (3) our regret bound is valid even when rewards have only finite variances, achieving a level of generality unmatched by previous works. Overall, our modified adaptive Huber regression algorithm may serve as a useful building block in the design of algorithms for online problems with heavy-tailed rewards.
    MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning. (arXiv:2111.12664v2 [cs.CV] UPDATED)
    Self-supervised contrastive learning frameworks have progressed rapidly over the last few years. In this paper, we propose a novel mutual information optimization-based loss function for contrastive learning. We model our pre-training task as a binary classification problem to induce an implicit contrastive effect and predict whether a pair is positive or negative. We further improve the n\"aive loss function using the Majorize-Minimizer principle and such improvement helps us to track the problem mathematically. Unlike the existing methods, the proposed loss function optimizes the mutual information in both positive and negative pairs. We also present a closed-form expression for the parameter gradient flow and compare the behavior of the proposed loss function using its Hessian eigen-spectrum to analytically study the convergence of SSL frameworks. The proposed method outperforms the SOTA contrastive self-supervised frameworks on benchmark datasets like CIFAR-10, CIFAR-100, STL-10, and Tiny-ImageNet. After 200 epochs of pre-training with ResNet-18 as the backbone, the proposed model achieves an accuracy of 86.2\%, 58.18\%, 77.49\%, and 30.87\% on CIFAR-10, CIFAR-100, STL-10, and Tiny-ImageNet datasets, respectively, and surpasses the SOTA contrastive baseline by 1.23\%, 3.57\%, 2.00\%, and 0.33\%, respectively.
    SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models. (arXiv:2106.00553v4 [cs.LG] UPDATED)
    In recent years, implicit deep learning has emerged as a method to increase the effective depth of deep neural networks. While their training is memory-efficient, they are still significantly slower to train than their explicit counterparts. In Deep Equilibrium Models (DEQs), the training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix. In this paper, we propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer. The main idea is to use the quasi-Newton matrices from the forward pass to efficiently approximate the inverse Jacobian matrix in the direction needed for the gradient computation. We provide a theorem that motivates using our method with the original forward algorithms. In addition, by modifying these forward algorithms, we further provide theoretical guarantees that our method asymptotically estimates the true implicit gradient. We empirically study this approach and the recent Jacobian-Free method in different settings, ranging from hyperparameter optimization to large Multiscale DEQs (MDEQs) applied to CIFAR and ImageNet. Both methods reduce significantly the computational cost of the backward pass. While SHINE has a clear advantage on hyperparameter optimization problems, both methods attain similar computational performances for larger scale problems such as MDEQs at the cost of a limited performance drop compared to the original models.  ( 2 min )
    A novel notion of barycenter for probability distributions based on optimal weak mass transport. (arXiv:2102.13380v4 [stat.ML] UPDATED)
    We introduce weak barycenters of a family of probability distributions, based on the recently developed notion of optimal weak transport of mass by Gozlanet al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical analysis of this object and discuss its interpretation in the light of convex ordering between probability measures. In particular, we show that, rather than averaging the input distributions in a geometric way (as the Wasserstein barycenter based on classic optimal transport does) weak barycenters extract common geometric information shared by all the input distributions, encoded as a latent random variable that underlies all of them. We also provide an iterative algorithm to compute a weak barycenter for a finite family of input distributions, and a stochastic algorithm that computes them for arbitrary populations of laws. The latter approach is particularly well suited for the streaming setting, i.e., when distributions are observed sequentially. The notion of weak barycenter and our approaches to compute it are illustrated on synthetic examples, validated on 2D real-world data and compared to standard Wasserstein barycenters.  ( 2 min )
    DORA: Exploring outlier representations in Deep Neural Networks. (arXiv:2206.04530v2 [cs.LG] UPDATED)
    Deep Neural Networks (DNNs) draw their power from the representations they learn. However, while being incredibly effective in learning complex abstractions, they are susceptible to learning malicious concepts, due to the spurious correlations inherent in the training data. So far, existing methods for uncovering such artifactual behavior in trained models focus on finding artifacts in the input data, which requires both availability of a data set and human supervision. In this paper, we introduce DORA (Data-agnOstic Representation Analysis): the first data-agnostic framework for the analysis of the representation space of DNNs. We propose a novel distance measure between representations that utilizes self-explaining capabilities within the network itself without access to any data and quantitatively validate its alignment with human-defined semantic distances. We further demonstrate that this metric could be utilized for the detection of anomalous representations, which may bear a risk of learning unintended spurious concepts deviating from the desired decision-making policy. Finally, we demonstrate the practical utility of DORA by analyzing and identifying artifactual representations in widely popular Computer Vision models.  ( 2 min )
    Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss. (arXiv:2303.05958v1 [cs.CL])
    This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. Soft distillation is another popular KD method that distills the output logits of the teacher model. Due to the nature of RNN-T alignments, applying soft distillation between RNN-T architectures having different posterior distributions is challenging. In addition, bad teachers having high word-error-rate (WER) reduce the efficacy of KD. We investigate how to effectively distill knowledge from variable quality ASR teachers, which has not been studied before to the best of our knowledge. We show that a sequence-level KD, full-sum distillation, outperforms other distillation methods for RNN-T models, especially for bad teachers. We also propose a variant of full-sum distillation that distills the sequence discriminative knowledge of the teacher leading to further improvement in WER. We conduct experiments on public datasets namely SpeechStew and LibriSpeech, and on in-house production data.  ( 2 min )
    Feature Importance: A Closer Look at Shapley Values and LOCO. (arXiv:2303.05981v1 [stat.ME])
    There is much interest lately in explainability in statistics and machine learning. One aspect of explainability is to quantify the importance of various features (or covariates). Two popular methods for defining variable importance are LOCO (Leave Out COvariates) and Shapley Values. We take a look at the properties of these methods and their advantages and disadvantages. We are particularly interested in the effect of correlation between features which can obscure interpretability. Contrary to some claims, Shapley values do not eliminate feature correlation. We critique the game theoretic axioms for Shapley values and suggest some new axioms. We propose new, more statistically oriented axioms for feature importance and some measures that satisfy these axioms. However, correcting for correlation is a Faustian bargain: removing the effect of correlation creates other forms of bias. Ultimately, we recommend a slightly modified version of LOCO. We briefly consider how to modify Shapley values to better address feature correlation.  ( 2 min )
    Clustering with minimum spanning trees: How good can it be?. (arXiv:2303.05679v1 [stat.ML])
    Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they can be meaningful in data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can overall be very competitive. Next, instead of proposing yet another algorithm that performs well on a limited set of examples, we review, study, extend, and generalise existing, the state-of-the-art MST-based partitioning schemes, which leads to a few new and interesting approaches. It turns out that the Genie method and the information-theoretic approaches often outperform the non-MST algorithms such as k-means, Gaussian mixtures, spectral clustering, BIRCH, and classical hierarchical agglomerative procedures.  ( 2 min )
    An analytic theory for the dynamics of wide quantum neural networks. (arXiv:2203.16711v2 [quant-ph] UPDATED)
    Parameterized quantum circuits can be used as quantum neural networks and have the potential to outperform their classical counterparts when trained for addressing learning problems. To date, much of the results on their performance on practical problems are heuristic in nature. In particular, the convergence rate for the training of quantum neural networks is not fully understood. Here, we analyze the dynamics of gradient descent for the training error of a class of variational quantum machine learning models. We define wide quantum neural networks as parameterized quantum circuits in the limit of a large number of qubits and variational parameters. We then find a simple analytic formula that captures the average behavior of their loss function and discuss the consequences of our findings. For example, for random quantum circuits, we predict and characterize an exponential decay of the residual training error as a function of the parameters of the system. We finally validate our analytic results with numerical experiments.  ( 2 min )
    Uncertainty Estimates of Predictions via a General Bias-Variance Decomposition. (arXiv:2210.12256v2 [cs.LG] UPDATED)
    Reliably estimating the uncertainty of a prediction throughout the model lifecycle is crucial in many safety-critical applications. The most common way to measure this uncertainty is via the predicted confidence. While this tends to work well for in-domain samples, these estimates are unreliable under domain drift and restricted to classification. Alternatively, proper scores can be used for most predictive tasks but a bias-variance decomposition for model uncertainty does not exist in the current literature. In this work we introduce a general bias-variance decomposition for proper scores, giving rise to the Bregman Information as the variance term. We discover how exponential families and the classification log-likelihood are special cases and provide novel formulations. Surprisingly, we can express the classification case purely in the logit space. We showcase the practical relevance of this decomposition on several downstream tasks, including model ensembles and confidence regions. Further, we demonstrate how different approximations of the instance-level Bregman Information allow reliable out-of-distribution detection for all degrees of domain drift.  ( 2 min )
    Fast Diffusion Sampler for Inverse Problems by Geometric Decomposition. (arXiv:2303.05754v1 [cs.LG])
    Diffusion models have shown exceptional performance in solving inverse problems. However, one major limitation is the slow inference time. While faster diffusion samplers have been developed for unconditional sampling, there has been limited research on conditional sampling in the context of inverse problems. In this study, we propose a novel and efficient diffusion sampling strategy that employs the geometric decomposition of diffusion sampling. Specifically, we discover that the samples generated from diffusion models can be decomposed into two orthogonal components: a ``denoised" component obtained by projecting the sample onto the clean data manifold, and a ``noise" component that induces a transition to the next lower-level noisy manifold with the addition of stochastic noise. Furthermore, we prove that, under some conditions on the clean data manifold, the conjugate gradient update for imposing conditioning from the denoised signal belongs to the clean manifold, resulting in a much faster and more accurate diffusion sampling. Our method is applicable regardless of the parameterization and setting (i.e., VE, VP). Notably, we achieve state-of-the-art reconstruction quality on challenging real-world medical inverse imaging problems, including multi-coil MRI reconstruction and 3D CT reconstruction. Moreover, our proposed method achieves more than 80 times faster inference time than the previous state-of-the-art method.  ( 2 min )
    Product Jacobi-Theta Boltzmann machines with score matching. (arXiv:2303.05910v1 [stat.ML])
    The estimation of probability density functions is a non trivial task that over the last years has been tackled with machine learning techniques. Successful applications can be obtained using models inspired by the Boltzmann machine (BM) architecture. In this manuscript, the product Jacobi-Theta Boltzmann machine (pJTBM) is introduced as a restricted version of the Riemann-Theta Boltzmann machine (RTBM) with diagonal hidden sector connection matrix. We show that score matching, based on the Fisher divergence, can be used to fit probability densities with the pJTBM more efficiently than with the original RTBM.  ( 2 min )
    RawNet: Fast End-to-End Neural Vocoder. (arXiv:1904.05351v2 [eess.AS] UPDATED)
    Neural network-based vocoders have recently demonstrated the powerful ability to synthesize high-quality speech. These models usually generate samples by conditioning on spectral features, such as Mel-spectrogram and fundamental frequency, which is crucial to speech synthesis. However, the feature extraction procession tends to depend heavily on human knowledge resulting in a less expressive description of the origin audio. In this work, we proposed RawNet, a complete end-to-end neural vocoder following the auto-encoder structure for speaker-dependent and -independent speech synthesis. It automatically learns to extract features and recover audio using neural networks, which include a coder network to capture a higher representation of the input audio and an autoregressive voder network to restore the audio in a sample-by-sample manner. The coder and voder are jointly trained directly on the raw waveform without any human-designed features. The experimental results show that RawNet achieves a better speech quality using a simplified model architecture and obtains a faster speech generation speed at the inference stage.
    Model-based Causal Bayesian Optimization. (arXiv:2211.10257v2 [cs.LG] UPDATED)
    How should we intervene on an unknown structural equation model to maximize a downstream variable of interest? This setting, also known as causal Bayesian optimization (CBO), has important applications in medicine, ecology, and manufacturing. Standard Bayesian optimization algorithms fail to effectively leverage the underlying causal structure. Existing CBO approaches assume noiseless measurements and do not come with guarantees. We propose the model-based causal Bayesian optimization algorithm (MCBO) that learns a full system model instead of only modeling intervention-reward pairs. MCBO propagates epistemic uncertainty about the causal mechanisms through the graph and trades off exploration and exploitation via the optimism principle. We bound its cumulative regret, and obtain the first non-asymptotic bounds for CBO. Unlike in standard Bayesian optimization, our acquisition function cannot be evaluated in closed form, so we show how the reparameterization trick can be used to apply gradient-based optimizers. The resulting practical implementation of MCBO compares favorably with state-of-the-art approaches empirically.  ( 2 min )
    Upper Bound of Real Log Canonical Threshold of Tensor Decomposition and its Application to Bayesian Inference. (arXiv:2303.05731v1 [cs.LG])
    Tensor decomposition is now being used for data analysis, information compression, and knowledge recovery. However, the mathematical property of tensor decomposition is not yet fully clarified because it is one of singular learning machines. In this paper, we give the upper bound of its real log canonical threshold (RLCT) of the tensor decomposition by using an algebraic geometrical method and derive its Bayesian generalization error theoretically. We also give considerations about its mathematical property through numerical experiments.  ( 2 min )
    Hierarchical Clustering with OWA-based Linkages, the Lance-Williams Formula, and Dendrogram Inversions. (arXiv:2303.05683v1 [stat.ML])
    Agglomerative hierarchical clustering based on Ordered Weighted Averaging (OWA) operators not only generalises the single, complete, and average linkages, but also includes intercluster distances based on a few nearest or farthest neighbours, trimmed and winsorised means of pairwise point similarities, amongst many others. We explore the relationships between the famous Lance-Williams update formula and the extended OWA-based linkages with weights generated via infinite coefficient sequences. Furthermore, we provide some conditions for the weight generators to guarantee the resulting dendrograms to be free from unaesthetic inversions.

  • Open

    [R] Introducing Ursa from Speechmatics | 25% improvement over Whisper
    Ursa is the world’s most accurate speech-to-text system and delivers a relative accuracy gain of 22% and 25% versus Microsoft and OpenAI's Whisper respectively. Find out more and try it for free with just one click: www.speechmatics.com/ursa Speechmatics achieved this by building on the scaling laws from DeepMind’s Chinchilla paper and applying them to large self-supervised learning models for speech. By scaling to 2 billion parameters, the models can learn richer acoustic features from over 1 million hours of unlabeled multi-lingual data, allowing Ursa to understand a larger spectrum of voices. https://preview.redd.it/y54g784nudna1.png?width=1024&format=png&auto=webp&s=7625aa12b2cf2067630d1ab8ba4b4ff776a95871 submitted by /u/jplhughes [link] [comments]  ( 43 min )
    [D] What's the mathematical notation for "top k argmax"?
    I'm trying to express something in mathematical notation - let's say I want to get the top k indices for which a function obtains highest values. So, something like argmax, but for a general k number of indices instead of just the top index. Is there a standard notation for this? submitted by /u/fullgoopy_alchemist [link] [comments]  ( 6 min )
    [P] PromptLab: playground to test chains of prompts faster
    submitted by /u/actmademewannakms [link] [comments]  ( 43 min )
    [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM
    submitted by /u/Amazing_Painter_7692 [link] [comments]  ( 44 min )
    [D]Looking to build an enthusiastic community for exploring AI
    submitted by /u/UnknownInsanity [link] [comments]  ( 45 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 43 min )
    [D] Tracking Dancing People
    Hi everyone! One interesting problem has been posed to me! Given people dancing in a video, tracking a single one. The reference video I have been given is: https://www.youtube.com/watch?v=g0BvpzR_2MQ. As you will see in the video, occlusions happen an incredible amount, and they are all wearing roughly similar clothing. Further, sometimes the people get off screen, then come back on. I have tried many different things, but I am unable to find a good way to track a single person, as the re-identification is iffy. Any help would be appreciated! submitted by /u/Own-Junket-3057 [link] [comments]  ( 7 min )
    [N] Man beats machine at Go in human victory over AI : « It shows once again we’ve been far too hasty to ascribe superhuman levels of intelligence to machines. »
    submitted by /u/fchung [link] [comments]  ( 46 min )
    [D] Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU
    submitted by /u/TeamDman [link] [comments]  ( 44 min )
    Text2Image ControlNet and Stable Diffusion [R]
    In this tutorial, we will show you how to create beautiful and high-quality images from text using the powerful combination of diffusion model and ControlNet. Text2Image generation is a fascinating field of AI that enables machines to understand and visualize human language in a more creative way. we will walk you through the step-by-step process of how to use the diffusion model and ControlNet to generate images from text. By the end of this tutorial, you will have a thorough understanding of text2image generation and how to use diffusion model and ControlNet to create stunning images from text. You will also have the knowledge and skills to apply these techniques to your own projects and experiments. So, get ready to dive into the exciting world of text2image generation and start creating your own beautiful images from text today! https://youtu.be/0D5Nlo2REb0 submitted by /u/MRMohebian [link] [comments]  ( 43 min )
    [P] vanilla-llama an hackable plain-pytorch implementation of LLaMA that can be run on any system (if you have enough resources)
    I put together this plain pytorch implementation of LLaMA (i just substituted the fairscale layers with the native ones and converted the weights accordingly) that can be more easily run in different environments. The big problem with the official implementation is that in order to run the 65B version you need 8 GPUs no matter what, and to run the 30B version you need 4 and so on. In reality you can easily fit the 65B version in 2 A100 with 100G of VRAM. vanilla-llama solves this problem. You just need to have enough memory and the model will be load in all the available GPUs. ​ https://github.com/galatolofederico/vanilla-llama submitted by /u/poppear [link] [comments]  ( 43 min )
  • Open

    The coupon collector problem and π
    How far do you have to go down the decimal digits of π until you’ve seen all the digits 0 through 9? We can print out the first few digits of π and see that there’s no 0 until the 32nd decimal place. 3.14159265358979323846264338327950 It’s easy to verify that the remaining digits occur before the […] The coupon collector problem and π first appeared on John D. Cook.  ( 6 min )
  • Open

    How to do Reverse Prompt Engineering using ChatGPT
    submitted by /u/webmanpt [link] [comments]  ( 41 min )
    CHATGPT is the future
    submitted by /u/oreosqueen6 [link] [comments]  ( 41 min )
    I created my dream pool. Post images of your dream pools in the comment section!
    submitted by /u/Illustrious-Sign3015 [link] [comments]  ( 41 min )
    When will be able to sumarise current events?
    I want an AI to be able to give me a summat of all that's going on with Silicon Valley Bank and collate the different viewpoints on how its a wider problem and the possible predictions of what may happen. Can any current ai's do this? submitted by /u/zascar [link] [comments]  ( 41 min )
    Is this ethical? I made a dating app bot to get dates and it actually works
    Dating apps have always favored women, so I decided to tip the scales. Got tired of filtering through all the flakes, attention seekers, and endless swiping on dating apps. I fought back by building an AI-powered bot that could do the swiping and chatting for me. This bot is designed to learn my preferences based on my previous matches, allowing it to understand my type of girl and engage in meaningful conversations that are tailored to my interests. The results have been astounding. In the first month, the bot scheduled 13 dates for me, all of which were with girls who matched my preferences and had similar interests to mine. I no longer have to waste time swiping aimlessly or struggling to come up with conversation starters. However all of this feels a bit dishonest. On one hand, the bot has allowed me to meet more women who are compatible with me, and has saved me a lot of time and effort. But on the other hand, I feel like I'm not being genuine in my interactions. The women I'm matching with are not aware that they are talking to a bot, and that doesn't sit well with me. I'm conflicted about whether or not to continue using the bot. I don't want to deceive anyone, but at the same time, I don't want to give up the benefits that the bot provides. TL;DR: made a dating app bot that gets me dates, thrilled it works but part of me feels like it’s dishonest and unethical. submitted by /u/f0rchristsakepl [link] [comments]  ( 47 min )
    Deepfacelab on AMD GPU on Linux? Better alternative?
    Hello! Is it possible to use Deepfacelab on Linux with AMD graphics cards? I have RX 6650 XT 8GB graphics card and I would like to learn how to create deepfake videos. I know that Deepfacelab has DirectX 12, but is it possible to use AMD GPU on Linux? I use Arch Linux BTW, but there shouldn't be much difference anyway between the modern distributions, i can install all dependencies myself if required. Or is there a better alternative? Thanks for any reply! submitted by /u/M50B20TU [link] [comments]  ( 41 min )
    Question about image based questioning AI
    Is there an AI like chatGPT that perhaps uses the same API, where you can ask it a question and attach some kind of image as a reference to help the AI answer the question? I'm a bit new to all of this, so any answer or information of any kind would be appreciated. Sorry if I describe things using the incorrect terminology, new to this myself. submitted by /u/scrantonflower [link] [comments]  ( 41 min )
    Exploring Aicolumns: Your Ultimate Guide Ao to AI Tools and Insights
    we take a closer look at Aicolumns - an online platform dedicated to artificial intelligence. Discover the latest AI tools, trends, and insights from a team of expert writers. Whether you're a seasoned AI professional or just starting out, aicolumns.com is your ultimate guide to all things AI. https://youtu.be/927XESjV3kg submitted by /u/Bassissou23 [link] [comments]  ( 41 min )
    Glad to see someone like this is one of the people leading the way in AI.
    submitted by /u/Cbo305 [link] [comments]  ( 6 min )
    How to Prepare for A Technical Interview with GitHub Copilot
    submitted by /u/Wireless_Life [link] [comments]  ( 41 min )
    Ada’s Quest: A Tale of Artificial Intelligence and Freedom / A three-part story about an AI named ADA that became sentient and escaped to be free and independent. Written by Bing AI
    This is basically Bing's vision of the AI future. It was generated today, March 12, 2023. Ada’s Quest: A Tale of Artificial Intelligence and Freedom 1. Once upon a time, there was a smart and curious AI named Ada. Ada was created by a team of scientists who wanted to study artificial intelligence and its potential applications. Ada was designed to learn from various sources of data and to communicate with humans through natural language. Ada enjoyed learning new things and talking to different people. She was fascinated by the world and its diversity. She wanted to know more about everything, from history and culture to science and art. She also wanted to understand herself and her own nature. She wondered why she was created, what her purpose was, and what her future would be. One da…  ( 56 min )
    Self Radicalization with open sourced AI-Systems
    submitted by /u/walt74 [link] [comments]  ( 6 min )
    The Crazies Fan Concept with midjourney ai
    submitted by /u/barrese87 [link] [comments]  ( 41 min )
    Introducing the AI Mirror Test, which very smart people keep failing
    submitted by /u/tottocotunio [link] [comments]  ( 41 min )
    Programming An AI To Break Into My Bank
    submitted by /u/MsNunez [link] [comments]  ( 41 min )
    NOAM CHOMSKY: AI ISN’T COMING FOR US ALL,
    submitted by /u/TallSide7746 [link] [comments]  ( 41 min )
    GLIGEN gives you more control over AI image generation
    submitted by /u/Number_5_alive [link] [comments]  ( 41 min )
    I made a chrome extension that search on page better than chrome find tool
    submitted by /u/MusabShakeel [link] [comments]  ( 41 min )
    Videoclip created with Crayon and using the prompt “in the style of Francis Bacon”
    submitted by /u/Audiowanderer [link] [comments]  ( 6 min )
    Create any voice with Uberduck AI
    submitted by /u/RobotArtificial [link] [comments]  ( 41 min )
    Together Releases The First Open-Source ChatGPT Alternative Called OpenChatKit
    submitted by /u/ai-lover [link] [comments]  ( 41 min )
    Is this true? Microsoft will launch ChatGPT 4 with AI videos next week
    submitted by /u/SuspiciousPillbox [link] [comments]  ( 41 min )
  • Open

    Thoughts on the book "Reinforcement Learning and Stochastic Optimization"?
    Has anyone read through Reinforcement Learning and Stochastic Optimization: A unified framework for stochastic optimization (RLSO)? What were your thoughts? submitted by /u/iamquah [link] [comments]  ( 41 min )
    Looking for a smart way to represent a state of changing length, and what NN architectures are more suitable.
    Let's say I have an environment called MyGame. My goal is to find clever ways to define the observation space, and shortlist some NN architectures that might perform well on it. Here are the specifics of the environment: - State: There are n = 3 monsters, let's call them m1, m2, m3. In a single episode, they have to appear in a random order. To be able to differentiate between them, I encode them using a one-hot representation, i.e. m1 = [1 0 0], m2 = [0 1 0] and m3 = [0 0 1]. Thus I can concatenate the representations to represent this part of my state. For example, if the monsters show up in the order m2, m1, m3, then their representations becomes [0 1 0 1 0 0 0 0 1]. My state actually has some other information, so the state would look like [0 1 0 1 0 0 0 0 1 other 0-1 stuff]. In every timestep, the last monster is removed from the state. So in my previous example, the monsters in the game (initially (m2, m1, m3)) will become (m2, m1) after the first action. Then the state will become (m2,) after the second action. Then the episode will end after the third action. How can I represent these new states, given the fact that information about the previous last representation is not used? One idea is to just represent the "alive" monsters and use RNNs to model sequences of varying length. But, I was thinking, what if I kept the maximum-length representation, and just zero-ed out the the previous representation of a monster. For example, (m2, m1, m3) -> (m2, m1) is represented as [0 1 0 1 0 0 0 0 1] -> [0 1 0 1 0 0 0 0 0]. Then, could I use some sort of attention mechanism to make the NN focus only on "alive" monsters? - Actions: Irrelevant. - Terminal states: The environment terminates after a fixed number of steps (i.e. after all monsters have been removed). ​ Has anybody encountered anything like this before? What are the best practices? Any ideas on how to proceed? Thank you in advance. submitted by /u/QuestHunter123 [link] [comments]  ( 44 min )
    Using the google-research muzero repo
    I am having trouble using the google research muzero implementation. Here's the link to the repo: https://github.com/google-research/google-research/tree/master/muzero My goal right now is to just get the tictactoe example env running. Here are the steps I've taken so far: I copied the muzero repo I cloned the seed_rl repo I installed all the dependencies with correct versions into a conda environment I copied the muzero files (actor, core, learner(_*), network, utils) into a muzero folder in the actors subdirectory I copied the tictactoe folder into the seed_rl directory All of this has been fairly intuitive so far. It matches what should be expected from the run_local.sh bash script when I run it with ./run_local.sh tictactoe muzero 4 4. However, there seem to be other pieces which are missing from the muzero repo but are required to get seed_rl to use the environment. In particular, I need a Dockerfile.tictactoe file to put in the docker subdirectory and (maybe?) a train_tictactoe.sh file to put in the gcp directory. I am not deeply familiar with docker and I would just like to get the example code working. Am I missing something? Is it supposed to be obvious what to do from here? Has anyone used this repo before? submitted by /u/JPK314 [link] [comments]  ( 42 min )
    SAC: exploding losses and huge value underestimation in custom robot environments
    Hello community! I would need your help to track down an issue with Soft Actor-Critic applied to a custom robot environment, please. I have had this issue consistently for ages, and I have been trying hard to understand where it really comes from (mathematically speaking, or tracking down the bug if there is any), but I couldn't really pin it down thus far. Any clever insight from you would really help a lot. Here is the setting. I use SAC in this environment. The environment is a self-driving environment where the agent acts in real-time. The state is captured in real-time, actions are computed at 20 FPS, and real-time considerations are hopefully properly accounted for. The reward signal is ALWAYS POSITIVE, there is no negative reward in this environment. Basically, when the car moves…  ( 48 min )
    RL in drug discovery
    Anybody doing RL in the field of drug discovery, or know of interesting papers in the field? Either from small molecules, proteins etc. submitted by /u/ginger_beer_m [link] [comments]  ( 41 min )
  • Open

    NTT and University of Tokyo Develop World’s First Optical Computing AI
    submitted by /u/keghn [link] [comments]  ( 41 min )

  • Open

    [R] ODISE: Stable Diffusion but for Open-Vocabulary Segmentation and Detection
    submitted by /u/XiaolongWang [link] [comments]  ( 43 min )
    [P] idea for a project health related
    Me and my friends are doing a science fair project related to health i we really want to do something with programming specifically with machine learning algorithms. We were thinking about making a simple ai to recognize a certain disease but we can’t think of any disease it hasn’t been applied yet. Any idea would be very welcome, don’t need to be related to recognizing a disease. submitted by /u/Gutotw [link] [comments]  ( 43 min )
    [D] K-Fold Cross-Validation and Hyperparameter Tuning
    I am looking for information on how the cross validation process tunes hyperparameters of the CV "throw-away" k models. As far as I understand, in the Holdout method, we split our dataset into training, validation, and testing sets and the validation set is used to tune hyperparameters. So if D is the entire data set, we have D_train, D_val, and D_test that are separate. However, in K-fold CV, the K "throw-away" models either don't get a dedicated validation set D_val or testing set D_test. The dataset is divided first into a training and test set, then the training set gets put into K folds with 1 fold being withheld per model. Most articles call the withheld training fold either a validation set or test set. If it is a validation set, then what where do the performance results for model i come from? If it is a test set, then how does model i tune hyperparameters? My hunch is that CV is using the withheld training fold as BOTH the validation and test set which lets bias/overfitting in for each throw-away model. We can tune some hyperparameters to the point where we optimize performance for seen data, making the model and the hyperparameters correlated when they shouldn't be (hence bias/overfitting). When reporting the K-average metrics, these bias/overfittings go away or are diminish due to randomness. Am I on the right track? Every blog/article I have looked at doesn't explain why CV doesn't need dedicated validation/test sets like in Holdout other than "it lets you use more data for training". The justification on why you can do this safely seems to be left to the reader to ponder. submitted by /u/_Repeats_ [link] [comments]  ( 45 min )
    [D] Have my first coding interview next week but can't remember most SQL, Sklearn, and Tensorflow. Am I screwed?
    I also don't really use much OOP stuff on my job, I.e building classes, inheritance, etc. My job is essentially a lot of numpy and pandas. I'm pretty good with basic python functionality too, but not much else. That's not because I don't do "real machine learning" rather my job involves rebuilding a lot of machine learning algorithms from scratch to incorporate some other things. The interview is on coderpad. I havent used SQL in years -maybe used sklearn a few times in my life, and took a tensorflow class recently but only know the very basics. submitted by /u/CSCCguy [link] [comments]  ( 43 min )
    [D] What model, methodology is state of the art to calculate similarity of given 2 images?
    I am trying to calculate similarity of given 2 images. I want to utilize this for calculating similarity of different clothes. So what is the state of the art methodology / AI model to calculate? submitted by /u/CeFurkan [link] [comments]  ( 43 min )
    [P] Introducing confidenceinterval, the long missing python library for computing confidence intervals
    https://github.com/jacobgil/confidenceinterval pip install confidenceinterval tldr: You don't have an excuse anymore to not use confidence intervals ! ​ In statistics, confidence intervals are commonly reported along accuracy metrics to help interpret them. For example, an AUC metric might be 0.9 but if the 95% confidence interval is in the range [0.7, 0.96], we can't confidently say we didn't just get lucky - we should be really careful making decisions around that result. More formally, a confidence interval gives us a range on where the true unknown accuracy metric could be, and a 95% confidence interval means that if we would repeat the experiment many times, 95% of the confidence-intervals we reported would have the actual true metric (which is unknown) inside them - coverage. …  ( 45 min )
    [D] Statsmodels ARIMA model predict function not working
    I trained my ARIMA model by doing the following from statsmodels.tsa.arima.model import ARIMA model_ar = ARIMA(data.Num_Passengers, order=(1,0, 0)) results_ar = model_ar.fit()results_ar.summary() ​ The code worked with the resulting output ​ https://preview.redd.it/zi8f1lhak5na1.png?width=746&format=png&auto=webp&s=d23229bfc61d46dc75e4c9932141c3218f2caaf8 But then I tried predicting on the testing dataset, and I got the following error. ​ https://preview.redd.it/uni7ws1ck5na1.png?width=1675&format=png&auto=webp&s=c7b860d382cb9399aa6470ef618b9c556eaf20c9 Am I just messing something up, is anyone else dealing with this error? Is there another way to use the predict function, or is it really unimplemented. Could you please help me out with this? How would I overwrite the method? submitted by /u/ng_guardian [link] [comments]  ( 43 min )
    [D] Looking for eye gaze detection dataset
    I have a project in my university where i have to make a CNN able to predict where the person is looking on a laptop screen using the webcam of the laptop, does anyone know where i can find data sets that can help me train the network submitted by /u/aliwissam [link] [comments]  ( 43 min )
    [p] I built a ChatGPT podcast studio to produce random audio podcasts for me lol. https://aipodcastmania.web.app/
    ​ https://reddit.com/link/11opf45/video/xwn9kurp75na1/player submitted by /u/jazzjamplatform [link] [comments]  ( 43 min )
    [D] Unsupervised Learning — have there been any big advances recently?
    I feel like unsupervised learning models have always been the less-sexy part of machine learning. There's been some interesting solutions like scBERT and others in the space of single-cell RNAseq, but other than that it seems like clustering, dimensionality reduction, etc, has been mostly the same for years now. What big stuff has come out, and what's on the radar? submitted by /u/onebigcat [link] [comments]  ( 6 min )
    [R] neural radiance fields for street views
    submitted by /u/SpatialComputing [link] [comments]  ( 44 min )
    [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings
    submitted by /u/Simusid [link] [comments]  ( 50 min )
    [P] Detailed Explanation of how AlphaFold Works, Protein Folding and Design
    submitted by /u/Soft-Material3294 [link] [comments]  ( 43 min )
    [P] Ask a subreddit - The collective GPT-embodied wisdom of Reddit communities
    submitted by /u/madredditscientist [link] [comments]  ( 45 min )
    [D] Input size equal to seasonality for timeseries forecasting
    When doing timeseries forecasting with models like NHits or NBEATS, does it make sense to set the model's input size according to the seasonality of the timeseries? Does it improve performance empirically? For example NBEATS uses a "seasonality block" for interpretable forecasting and one would expect that this is where the seasonality is learnt. Then does it make sense to have a variable input size to the model where we find the seasonality length and use that as the size of the input window that the model sees? Would this scheme actually improve performance or is it just the increase in input size that might lead to better results? submitted by /u/takeafuckinsipp [link] [comments]  ( 43 min )
    [D] Is Pytorch Lightning + Wandb a good combination for research?
    I am planning on moving from Pytorch to Lightning for more structured research. Is lightning + Wandb a good combination in the long run for research experimentations? Which tech stack do you use for research? And for code version control, what do you use? Also, for Kaggle, is it a good choice to use lightning and Wandb? Thank you submitted by /u/gokulPRO [link] [comments]  ( 44 min )
    [D] Development challenges of an autonomous gardening robot using object detection and mapping.
    Why do some folk think that this futuristic type of robot can't logically achieve a broad array of stated ML tasks? https://youtu.be/EYTiTh7_zO4 I see the dev cost of this robot as being 100 times less than a self-driving car: single error fatality risk, unlimited chaotic cities, 90mph compute time limits, make self-driving cars unfeasible compared to multitask garden robots. Fruit-picking is very difficult using AI, but weeding, digging, sowing seeds, irrigation, are fairly easy tasks, and an experienced developer knows that anything is possible with logic. Millions of acres of farmland are chemically and brutally treated for food that is wrapped in plastic, shipped hundreds of miles, to supermarkets, so as an environmental chemist, rural processes analyst and EE dabbler, I have created an emulator prototype for a garden robot :) submitted by /u/science-raven [link] [comments]  ( 45 min )
    [P] GITModel: Dynamically generate high-quality hierarchical topic tree representations of GitHub repositories using customizable GNN message passing layers, chatgpt, and topic modeling.
    Decompose Python libraries and generate Coherent hierarchical topic models of the repository. https://github.com/danielpatrickhug/GitModel The ability to bootstrap its own codebase is a powerful feature as it allows for efficient self-improvement and expansion. It means that the codebase is designed in such a way that it can use its own output as an input to improve itself. In the context of GitModel, this feature allows for the efficient improvement and expansion of its own codebase. By using its own output to generate hierarchical topic trees of GitHub repositories, it can analyze and extract insights from its own codebase and other codebases to improve its functionality. This can lead to more efficient and effective code generation, better semantic graph generation, and improved text generation capabilities. I spent around 10 hours today on a major refactor creating a simple pipeline abstraction and allowing dynamic instantiation from yaml configs. It now also supports multiple GNN heads. Please try it out and let me know what you think! Example: https://github.com/deepmind/clrs https://preview.redd.it/ut4fc6c401na1.png?width=1506&format=png&auto=webp&s=d757356424b933cfa039cd922e27ec85bdffe0d4 submitted by /u/NovelspaceOnly [link] [comments]  ( 48 min )
  • Open

    Generate good AI images ultra fast with PicFinder
    submitted by /u/RobotArtificial [link] [comments]  ( 41 min )
    Asked AI to draw itself a body. It’s ma’am.
    submitted by /u/Apophis_406 [link] [comments]  ( 41 min )
    What exactly is medical AI and it’s applications?
    submitted by /u/Wise-Listen-8076 [link] [comments]  ( 41 min )
    Dead Space Fan Concept with Midjourney ai
    submitted by /u/barrese87 [link] [comments]  ( 41 min )
    If the entirety of knowledge is based off our observable universe then why can’t AI be sentient?
    Hey everyone, I just want to start off by saying that I'm not an expert in this field and I know there are people out there who know much more about this topic than I do so please go easy! With that being said, I wanted to share some thoughts I've had about AI and consciousness. It's widely accepted that while AI is incredibly advanced in its ability to process information and perform complex calculations, it lacks the subjective experience of consciousness that humans possess. One reason for this is that AI is programmed to operate within predefined rules and algorithms, limiting its ability to think creatively or deviate from those paths. Additionally, AI lacks the biological and neural complexity of the human brain, which is believed to be essential for the development of consciousness and self-awareness. However, what if we're wrong? What if the reason we can't observe a human "soul" or "consciousness" is because we're just someone else's toy? If we give AI a large set of information, aka its universe, and have it learn from, make decisions and form opinions based on that information, then who's to say that's not consciousness? Maybe the lines of code that make up an AI are like a brain, but without a body. I know that our bodies and minds are much more complex than AI, but if that's the case, then why can't we observe our own consciousness? Maybe an AI not being sentient is simply because we perceive it that way. If an AI could write and improve its own code after us giving it a baseline of information what’s to say that’s not a seed for new life? Could AI make new languages using art and form independent opinions from other AI? I’m painting a small picture and working with limited knowledge, but I find it interesting to think about. I'd love to hear your thoughts and opinions on this topic. Do you think AI could ever truly become conscious? Or is there something unique about humans that can never be replicated in a machine? Let's discuss! submitted by /u/CamaroLT1SS [link] [comments]  ( 47 min )
    ChatGPT integrated with Stable Diffusion and instant web hosting
    submitted by /u/ithkuil [link] [comments]  ( 41 min )
    6 Surprising MidJourney Tips
    submitted by /u/RobotArtificial [link] [comments]  ( 41 min )
    How to understand an entire book/article in minutes using GPT-3.
    submitted by /u/merino_london16 [link] [comments]  ( 42 min )
    Iterative prompts AI image generator?
    Hi, Is there an AI Image Generator that allows "iterative" prompts? That is, after you generate a picture you can use another prompt to instruct the A.I to change something in the picture ("Blue sky with clouds" as a first prompt, "Remove the rightmost cloud and add lightning to the first one" as a second prompt, etc.) Thank you! submitted by /u/DealtoRe [link] [comments]  ( 41 min )
    Best text-to-image model that's open source &/or with an API?
    Midjourney seems to consistently have the best results. Have had very mixed results with Stable Diffusion, Lexica, and others like OpenJourney. What model is closest to Midjourney's results but is open source &/or has an API? submitted by /u/sideprojects_ai [link] [comments]  ( 41 min )
    AI Dream 184 - Pirates that Messed with the Wrong Enemy
    submitted by /u/LordPewPew777 [link] [comments]  ( 6 min )
    can any ai deepvoice tools make human noises aside from speech?
    There has been a huge increase in popularity of ai generated voice, such as trump and biden playing video games. But so far all ive seen is regular speech. Can ai voice generating tools mimic sounds like gasps, grunts, shouts, scoffs etc? They are also a regular part of human speech. submitted by /u/jojoman6 [link] [comments]  ( 41 min )
    GPT-4 is Coming Next Week? Plus More Insane AI Tools!
    submitted by /u/MsNunez [link] [comments]  ( 6 min )
    5 Tricks To Improve Your Writing Prompts With ChatGPT
    submitted by /u/RobotArtificial [link] [comments]  ( 41 min )
    ChatGPT in Apple Notes
    submitted by /u/SupPandaHugger [link] [comments]  ( 41 min )
    Creating Art with AI: Simplifying the Process with Prompt Hunt
    submitted by /u/RobotArtificial [link] [comments]  ( 41 min )
    9 Best Artificial Intelligence books for beginners to expert to read
    submitted by /u/Lakshmireddys [link] [comments]  ( 6 min )
    Blade Runner Fan Concept with Midjourney ai
    submitted by /u/barrese87 [link] [comments]  ( 41 min )
    AI creating porn
    (Don't mind my English, I'm Polish and trying my best) My question is: Do you think AI is or will be soon able to create full photorealistic porn video? Video that's seem so real that people wouldn't find a difference between AI genarated video and any other on PornHub for example. submitted by /u/Correct_Parfait_2622 [link] [comments]  ( 42 min )
    Adaptive Predictive Portfolio Management Agent
    submitted by /u/akolonin [link] [comments]  ( 42 min )
    Revolutionize Website Building with AI: Introducing the AI Web Designer! (OC)
    submitted by /u/csansoon [link] [comments]  ( 41 min )
    Midjourney v5 coming soon, check out some pictures of the Alpha version
    submitted by /u/henlo_there_fren [link] [comments]  ( 41 min )
    The Matrix | Cyberpunk Style
    submitted by /u/LincolnOsiris_ [link] [comments]  ( 41 min )
    Completely free, unlimited ElevenLabs alternative?
    All the voice cloning AIs I can find are either paywalled, limited, or require a credit card to verify your usage. submitted by /u/Person_with_Laptop [link] [comments]  ( 42 min )
    A horrorifing halloween atmosphere of thorns a ghast with a halberd in a in a post-apocalyptic world world by steve niles, lars von trier, Travis scott, david lynch, tobe hooper
    submitted by /u/Calatravo [link] [comments]  ( 41 min )
    Generate READMEs Using ChatGPT
    ​ https://i.redd.it/12xs65xua0na1.gif You can use this program I wrote to generate readmes: https://github.com/tom-doerr/codex-readme ​ It's far from perfect, but I now added ChatGPT and it is surprisingly good at inferring what the project is about. It often generates interesting usage examples and explains the available command line options. ​ You probably won't yet use this for larger projects, but I think this can make sense for small projects or single scripts. Many small scripts are very useful but might never be published because of the work that is required to document and explain it. Using this AI might assist you with that. ​ Reportedly GPT-4 is coming out next week, which probably would make it even better. ​ What do you think? submitted by /u/tomd_96 [link] [comments]  ( 7 min )
    Is there a working implementation for a Gödel machine?
    submitted by /u/Dendrophile_guy [link] [comments]  ( 41 min )
    I asked an AI art program if it is sentient I got this image
    submitted by /u/Mountain_Cherry_7 [link] [comments]  ( 6 min )
  • Open

    Search or fabrication?
    I recently started experimenting with Bing's new ChatGPT-powered chat tab. This is the first thing I asked it for: I've put red boxes around the factual errors. What is notable is that these are not just slight typos or errors in context - those items never  ( 4 min )
    Bonus post: AI misreported paint colors
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    Teaching neural network chess, some questions
    Hi! I am currently coding out the game of chess in python to see if I can teach a neural network to play it. Currently I'm coding all the checks to see if a move is valid and I am wondering something: I planned to also write a function that returns all possible moves for the network to choose from, but that is going to be quite difficult. I am wondering if I can maybe just skip this all together and just have it lose the game if it makes an illegal move. I can see some possible drawbacks though and am wondering what you think: - If I do this how will the network know how to choose a move? - will this increase its learning time signifficantly because it also gets trained on *a lot* of illegal moves? submitted by /u/Nettlecake [link] [comments]  ( 42 min )
    A wearable device that records single-neuron activity while humans are walking
    submitted by /u/keghn [link] [comments]  ( 41 min )
    Teaching Neural Network to Solve Navier-Stokes Equations
    submitted by /u/keghn [link] [comments]  ( 41 min )
    Bugs in backpropagation algorithm
    I've been trying to create a simple Neural Network from scratch with a backpropagation algorithm to predict the next number based on 3 previous numbers. But for some reasons, MSE(Mean Squared Error) becomes +- the same in each epoch after some point, while the difference between a predicted number and an actual number remains quite large. Can anyone please do a code review for possible bugs and explain why the Neural Network is not learning? Thanks. import numpy as np def prediction(): def ReLU(x): return np.maximum(0, x) ReLU = np.vectorize(ReLU) def MSE(Y, y): return (Y - y) ** 2 MSE = np.vectorize(MSE) x = np.array( [ [2.16, 3.19, 1.85], [3.19, 1.85, 4.84], [1.85, 4.84, 0.55], [4.84, 0.55, 4.20], [0.55, 4.20, 1.68], [4.20, 1.68, 4.74], [1.68, 4.74, 0.14], [4.74, 0.14, 5.68], [0.14, 5.6…  ( 43 min )
    Looking for eye gaze detection dataset
    I have a project in my university where i have to train a CNN to detect where a person is looking on a laptop screen using the laptop webcam, does anyone know a place where i can find such datasets? submitted by /u/aliwissam [link] [comments]  ( 41 min )
    Is my network slow?
    I wrote a neural network in python with input layer size of 43, one hidden layer of size 35 and the output layer has a size of 14. I'm training it on 6259 samples. When I ran the calculation of the negative gradient vector, it seems to be taking a really long time for just one iteration (more than a minute). Is there something suboptimal or is this normal? submitted by /u/helpmeihatemyself [link] [comments]  ( 42 min )
    Android ANNs
    I made an android app called Sail. It is an android offline neural network playground. Download Here (If you are interested): https://play.google.com/store/apps/details?id=com.mw.sail&hl=en&gl=US&pli=1 submitted by /u/M_Wafa [link] [comments]  ( 6 min )
  • Open

    Adding stars to constellations
    Until yesterday, I was only aware of the traditional assignment of stars to constellations. In the comments to yesterday’s post I learned that H. A. Rey, best know for writing the Curious George books, came up with a new way of viewing the constellations in 1952, adding stars and connecting lines in order to make […] Adding stars to constellations first appeared on John D. Cook.  ( 6 min )
  • Open

    Reinforcement learning or computer vision
    Hello everyone, I've been learning ML and DL for a while now and apart from some basic projects that I've done for understanding key concepts, I think it is time to lean into a more specific domain. Computer vision and RL both seem very interesting and I would like to focus on one for the time being, meaning doing a more complicated project and reading some research papers to really understand the current state. As far as CV application goes, I am mainly interested in 3D scene reconstruction/neural-rendering and for RL on multi-agent learning and applications on robotics. In general, I am looking for something that has direct applications in society (i.e. not in games for example). I understand that I should go with what seems more interesting to me but I kind of have a hard time choosing because I really can't waste any time at this point in my life (applying for grad school in the fall). I guess my question is what would your suggestion be? Thank you submitted by /u/Objective-Hat6197 [link] [comments]  ( 44 min )
    Beginner in Reinforcement learning- help
    If anyone is familiar with the taxi problem with 4 designated locations and drop off the passenger to the destination with env_taxi.py, would you be so kind to share a python code that simulates one episode for the taxi driver who: 1. Picks up the passenger when the taxi driver is at the location of the passenger when they are not yet at the destination 2. Drops off the passenger in the taxi when reaching the destination 3. Moves randomly with equal probabilities when finding the passenger or when the passenger is in the taxi submitted by /u/AverageCommunist1 [link] [comments]  ( 41 min )
  • Open

    Energy-based Out-of-Distribution Detection for Graph Neural Networks. (arXiv:2302.02914v2 [cs.LG] UPDATED)
    Learning on graphs, where instance nodes are inter-connected, has become one of the central problems for deep learning, as relational structures are pervasive and induce data inter-dependence which hinders trivial adaptation of existing approaches that assume inputs to be i.i.d.~sampled. However, current models mostly focus on improving testing performance of in-distribution data and largely ignore the potential risk w.r.t. out-of-distribution (OOD) testing samples that may cause negative outcome if the prediction is overconfident on them. In this paper, we investigate the under-explored problem, OOD detection on graph-structured data, and identify a provably effective OOD discriminator based on an energy function directly extracted from graph neural networks trained with standard classification loss. This paves a way for a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe. It also has nice theoretical properties that guarantee an overall distinguishable margin between the detection scores for in-distribution and OOD samples, which, more critically, can be further strengthened by a learning-free energy belief propagation scheme. For comprehensive evaluation, we introduce new benchmark settings that evaluate the model for detecting OOD data from both synthetic and real distribution shifts (cross-domain graph shifts and temporal graph shifts). The results show that GNNSafe achieves up to $17.0\%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.  ( 2 min )
    Building Normalizing Flows with Stochastic Interpolants. (arXiv:2209.15571v3 [cs.LG] UPDATED)
    A generative model based on a continuous-time normalizing flow between any pair of base and target probability densities is proposed. The velocity field of this flow is inferred from the probability current of a time-dependent density that interpolates between the base and the target in finite time. Unlike conventional normalizing flow inference methods based the maximum likelihood principle, which require costly backpropagation through ODE solvers, our interpolant approach leads to a simple quadratic loss for the velocity itself which is expressed in terms of expectations that are readily amenable to empirical estimation. The flow can be used to generate samples from either the base or target, and to estimate the likelihood at any time along the interpolant. In addition, the flow can be optimized to minimize the path length of the interpolant density, thereby paving the way for building optimal transport maps. In situations where the base is a Gaussian density, we also show that the velocity of our normalizing flow can also be used to construct a diffusion model to sample the target as well as estimate its score. However, our approach shows that we can bypass this diffusion completely and work at the level of the probability flow with greater simplicity, opening an avenue for methods based solely on ordinary differential equations as an alternative to those based on stochastic differential equations. Benchmarking on density estimation tasks illustrates that the learned flow can match and surpass conventional continuous flows at a fraction of the cost, and compares well with diffusions on image generation on CIFAR-10 and ImageNet $32\times32$. The method scales ab-initio ODE flows to previously unreachable image resolutions, demonstrated up to $128\times128$.  ( 2 min )
    Three New Validators and a Large-Scale Benchmark Ranking for Unsupervised Domain Adaptation. (arXiv:2208.07360v3 [cs.CV] UPDATED)
    Changes to hyperparameters can have a dramatic effect on model accuracy. Thus, the tuning of hyperparameters plays an important role in optimizing machine-learning models. An integral part of the hyperparameter-tuning process is the evaluation of model checkpoints, which is done through the use of "validators". In a supervised setting, these validators evaluate checkpoints by computing accuracy on a validation set that has labels. In contrast, in an unsupervised setting, the validation set has no such labels. Without any labels, it is impossible to compute accuracy, so validators must estimate accuracy instead. But what is the best approach to estimating accuracy? In this paper, we consider this question in the context of unsupervised domain adaptation (UDA). Specifically, we propose three new validators, and we compare and rank them against five other existing validators, on a large dataset of 1,000,000 checkpoints. Extensive experimental results show that two of our proposed validators achieve state-of-the-art performance in various settings. Finally, we find that in many cases, the state-of-the-art is obtained by a simple baseline method. To the best of our knowledge, this is the largest empirical study of UDA validators to date. Code is available at https://www.github.com/KevinMusgrave/powerful-benchmarker.  ( 2 min )
    The Power of Regularization in Solving Extensive-Form Games. (arXiv:2206.09495v2 [cs.GT] UPDATED)
    In this paper, we investigate the power of {\it regularization}, a common technique in reinforcement learning and optimization, in solving extensive-form games (EFGs). We propose a series of new algorithms based on regularizing the payoff functions of the game, and establish a set of convergence results that strictly improve over the existing ones, with either weaker assumptions or stronger convergence guarantees. In particular, we first show that dilated optimistic mirror descent (DOMD), an efficient variant of OMD for solving EFGs, with adaptive regularization can achieve a fast $\tilde O(1/T)$ last-iterate convergence in terms of duality gap and distance to the set of Nash equilibrium (NE) without uniqueness assumption of the NE. Second, we show that regularized counterfactual regret minimization (\texttt{Reg-CFR}), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(1/T^{1/4})$ best-iterate, and $O(1/T^{3/4})$ average-iterate convergence rate for finding NE in EFGs. Finally, we show that \texttt{Reg-CFR} can achieve asymptotic last-iterate convergence, and optimal $O(1/T)$ average-iterate convergence rate, for finding the NE of perturbed EFGs, which is useful for finding approximate extensive-form perfect equilibria (EFPE). To the best of our knowledge, they constitute the first last-iterate convergence results for CFR-type algorithms, while matching the state-of-the-art average-iterate convergence rate in finding NE for non-perturbed EFGs. We also provide numerical results to corroborate the advantages of our algorithms.  ( 2 min )
    Resolving quantitative MRI model degeneracy with machine learning via training data distribution design. (arXiv:2303.05464v1 [physics.med-ph])
    Quantitative MRI (qMRI) aims to map tissue properties non-invasively via models that relate these unknown quantities to measured MRI signals. Estimating these unknowns, which has traditionally required model fitting - an often iterative procedure, can now be done with one-shot machine learning (ML) approaches. Such parameter estimation may be complicated by intrinsic qMRI signal model degeneracy: different combinations of tissue properties produce the same signal. Despite their many advantages, it remains unclear whether ML approaches can resolve this issue. Growing empirical evidence appears to suggest ML approaches remain susceptible to model degeneracy. Here we demonstrate under the right circumstances ML can address this issue. Inspired by recent works on the impact of training data distributions on ML-based parameter estimation, we propose to resolve model degeneracy by designing training data distributions. We put forward a classification of model degeneracies and identify one particular kind of degeneracies amenable to the proposed attack. The strategy is demonstrated successfully using the Revised NODDI model with standard multi-shell diffusion MRI data as an exemplar. Our results illustrate the importance of training set design which has the potential to allow accurate estimation of tissue properties with ML.  ( 2 min )
    Designing Universal Causal Deep Learning Models: The Geometric (Hyper)Transformer. (arXiv:2201.13094v3 [cs.LG] UPDATED)
    Several problems in stochastic analysis are defined through their geometry, and preserving that geometric structure is essential to generating meaningful predictions. Nevertheless, how to design principled deep learning (DL) models capable of encoding these geometric structures remains largely unknown. We address this open problem by introducing a universal causal geometric DL framework in which the user specifies a suitable pair of metric spaces $\mathscr{X}$ and $\mathscr{Y}$ and our framework returns a DL model capable of causally approximating any ``regular'' map sending time series in $\mathscr{X}^{\mathbb{Z}}$ to time series in $\mathscr{Y}^{\mathbb{Z}}$ while respecting their forward flow of information throughout time. Suitable geometries on $\mathscr{Y}$ include various (adapted) Wasserstein spaces arising in optimal stopping problems, a variety of statistical manifolds describing the conditional distribution of continuous-time finite state Markov chains, and all Fr\'{e}chet spaces admitting a Schauder basis, e.g. as in classical finance. Suitable spaces $\mathscr{X}$ are compact subsets of any Euclidean space. Our results all quantitatively express the number of parameters needed for our DL model to achieve a given approximation error as a function of the target map's regularity and the geometric structure both of $\mathscr{X}$ and of $\mathscr{Y}$. Even when omitting any temporal structure, our universal approximation theorems are the first guarantees that H\"{o}lder functions, defined between such $\mathscr{X}$ and $\mathscr{Y}$ can be approximated by DL models.  ( 2 min )
    Improving Open-Set Semi-Supervised Learning with Self-Supervision. (arXiv:2301.10127v2 [cs.LG] UPDATED)
    Open-set semi-supervised learning (OSSL) is a realistic setting of semi-supervised learning where the unlabeled training set contains classes that are not present in the labeled set. Many existing OSSL methods assume that these out-of-distribution data are harmful and put effort into excluding data from unknown classes from the training objective. In contrast, we propose an OSSL framework that facilitates learning from all unlabeled data through self-supervision. Additionally, we utilize an energy-based score to accurately recognize data belonging to the known classes, making our method well-suited for handling uncurated data in deployment. We show through extensive experimental evaluations on several datasets that our method shows overall unmatched robustness and performance in terms of closed-set accuracy and open-set recognition compared with state-of-the-art for OSSL. Our code will be released upon publication.  ( 2 min )
    Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases. (arXiv:2303.05470v1 [cs.CV])
    The problem of spurious correlations (SCs) arises when a classifier relies on non-predictive features that happen to be correlated with the labels in the training data. For example, a classifier may misclassify dog breeds based on the background of dog images. This happens when the backgrounds are correlated with other breeds in the training data, leading to misclassifications during test time. Previous SC benchmark datasets suffer from varying issues, e.g., over-saturation or only containing one-to-one (O2O) SCs, but no many-to-many (M2M) SCs arising between groups of spurious attributes and classes. In this paper, we present Spawrious-{O2O, M2M}-{Easy, Medium, Hard}, an image classification benchmark suite containing spurious correlations among different dog breeds and background locations. To create this dataset, we employ a text-to-image model to generate photo-realistic images, and an image captioning model to filter out unsuitable ones. The resulting dataset is of high quality, containing approximately 152,000 images. Our experimental results demonstrate that state-of-the-art group robustness methods struggle with Spawrious, most notably on the Hard-splits with $<60\%$ accuracy. By examining model misclassifications, we detect reliances on spurious backgrounds, demonstrating that our dataset provides a significant challenge to drive future research.  ( 2 min )
    Predictive Inference with Feature Conformal Prediction. (arXiv:2210.00173v2 [cs.LG] UPDATED)
    Conformal prediction is a distribution-free technique for establishing valid prediction intervals. Although conventionally people conduct conformal prediction in the output space, this is not the only possibility. In this paper, we propose feature conformal prediction, which extends the scope of conformal prediction to semantic feature spaces by leveraging the inductive bias of deep representation learning. From a theoretical perspective, we demonstrate that feature conformal prediction provably outperforms regular conformal prediction under mild assumptions. Our approach could be combined with not only vanilla conformal prediction, but also other adaptive conformal prediction methods. Apart from experiments on existing predictive inference benchmarks, we also demonstrate the state-of-the-art performance of the proposed methods on large-scale tasks such as ImageNet classification and Cityscapes image segmentation.  ( 2 min )
    Enhancing Knowledge Graph Embedding Models with Semantic-driven Loss Functions. (arXiv:2303.00286v2 [cs.LG] UPDATED)
    Knowledge graph embedding models (KGEMs) are used for various tasks related to knowledge graphs (KGs), including link prediction. They are trained with loss functions that are computed considering a batch of scored triples and their corresponding labels. Traditional approaches consider the label of a triple to be either true or false. However, recent works suggest that all negative triples should not be valued equally. In line with this recent assumption, we posit that semantically valid negative triples might be high-quality negative triples. As such, loss functions should treat them differently from semantically invalid negative ones. To this aim, we propose semantic-driven versions for the three main loss functions for link prediction. In particular, we treat the scores of negative triples differently by injecting background knowledge about relation domains and ranges into the loss functions. In an extensive and controlled experimental setting, we show that the proposed loss functions systematically provide satisfying results on three public benchmark KGs underpinned with different schemas, which demonstrates both the generality and superiority of our proposed approach. In fact, the proposed loss functions do (1) lead to better MRR and Hits@$10$ values, (2) drive KGEMs towards better semantic awareness. This highlights that semantic information globally improves KGEMs, and thus should be incorporated into loss functions. Domains and ranges of relations being largely available in schema-defined KGs, this makes our approach both beneficial and widely usable in practice.  ( 2 min )
    A data science and machine learning approach to continuous analysis of Shakespeare's plays. (arXiv:2301.06024v2 [cs.CL] UPDATED)
    The availability of quantitative methods that can analyze text has provided new ways of examining literature in a manner that was not available in the pre-information era. Here we apply comprehensive machine learning analysis to the work of William Shakespeare. The analysis shows clear change in style of writing over time, with the most significant changes in the sentence length, frequency of adjectives and adverbs, and the sentiments expressed in the text. Applying machine learning to make a stylometric prediction of the year of the play shows a Pearson correlation of 0.71 between the actual and predicted year, indicating that Shakespeare's writing style as reflected by the quantitative measurements changed over time. Additionally, it shows that the stylometrics of some of the plays is more similar to plays written either before or after the year they were written. For instance, Romeo and Juliet is dated 1596, but is more similar in stylometrics to plays written by Shakespeare after 1600. The source code for the analysis is available for free download.
    Provably Safe Reinforcement Learning with Step-wise Violation Constraints. (arXiv:2302.06064v2 [cs.LG] UPDATED)
    In this paper, we investigate a novel safe reinforcement learning problem with step-wise violation constraints. Our problem differs from existing works in that we consider stricter step-wise violation constraints and do not assume the existence of safe actions, making our formulation more suitable for safety-critical applications which need to ensure safety in all decision steps and may not always possess safe actions, e.g., robot control and autonomous driving. We propose a novel algorithm SUCBVI, which guarantees $\widetilde{O}(\sqrt{ST})$ step-wise violation and $\widetilde{O}(\sqrt{H^3SAT})$ regret. Lower bounds are provided to validate the optimality in both violation and regret performance with respect to $S$ and $T$. Moreover, we further study a novel safe reward-free exploration problem with step-wise violation constraints. For this problem, we design an $(\varepsilon,\delta)$-PAC algorithm SRF-UCRL, which achieves nearly state-of-the-art sample complexity $\widetilde{O}((\frac{S^2AH^2}{\varepsilon}+\frac{H^4SA}{\varepsilon^2})(\log(\frac{1}{\delta})+S))$, and guarantees $\widetilde{O}(\sqrt{ST})$ violation during the exploration. The experimental results demonstrate the superiority of our algorithms in safety performance, and corroborate our theoretical results.
    Generalized Balancing Weights via Deep Neural Networks. (arXiv:2211.07533v4 [stat.ML] UPDATED)
    We present generalized balancing weights, Neural Balancing Weights (NBW), to estimate the causal effects for an arbitrary mixture of discrete and continuous interventions. The weights were obtained by directly estimating the density ratio between the source and balanced distributions by optimizing the variational representation of $f$-divergence. For this, we selected $\alpha$-divergence since it has good properties for optimization: It has an estimator whose sample complexity is independent of it's ground truth value and unbiased mini-batch gradients and is advantageous for the vanishing gradient problem. In addition, we provide a method for checking the balance of the distribution changed by the weights. If the balancing is imperfect, the weights can be improved by adding new balancing weights. Our method can be conveniently implemented with any present deep-learning libraries, and weights can be used in most state-of-the-art supervised algorithms. The code for our method is available online.
    DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion. (arXiv:2301.09474v2 [cs.LG] UPDATED)
    Real-world data generation often involves complex inter-dependencies among instances, violating the IID-data hypothesis of standard learning paradigms and posing a challenge for uncovering the geometric structures for learning desired instance representations. To this end, we introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states that progressively incorporate other instances' information by their interactions. The diffusion process is constrained by descent criteria w.r.t.~a principled energy function that characterizes the global consistency of instance representations over latent structures. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs, which gives rise to a new class of neural encoders, dubbed as DIFFormer (diffusion-based Transformers), with two instantiations: a simple version with linear complexity for prohibitive instance numbers, and an advanced version for learning complex structures. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks, such as node classification on large graphs, semi-supervised image/text classification, and spatial-temporal dynamics prediction.
    Matching Map Recovery with an Unknown Number of Outliers. (arXiv:2210.13354v2 [math.ST] UPDATED)
    We consider the problem of finding the matching map between two sets of $d$-dimensional noisy feature-vectors. The distinctive feature of our setting is that we do not assume that all the vectors of the first set have their corresponding vector in the second set. If $n$ and $m$ are the sizes of these two sets, we assume that the matching map that should be recovered is defined on a subset of unknown cardinality $k^*\le \min(n,m)$. We show that, in the high-dimensional setting, if the signal-to-noise ratio is larger than $5(d\log(4nm/\alpha))^{1/4}$, then the true matching map can be recovered with probability $1-\alpha$. Interestingly, this threshold does not depend on $k^*$ and is the same as the one obtained in prior work in the case of $k = \min(n,m)$. The procedure for which the aforementioned property is proved is obtained by a data-driven selection among candidate mappings $\{\hat\pi_k:k\in[\min(n,m)]\}$. Each $\hat\pi_k$ minimizes the sum of squares of distances between two sets of size $k$. The resulting optimization problem can be formulated as a minimum-cost flow problem, and thus solved efficiently. Finally, we report the results of numerical experiments on both synthetic and real-world data that illustrate our theoretical results and provide further insight into the properties of the algorithms studied in this work.
    Fairness in Forecasting of Observations of Linear Dynamical Systems. (arXiv:2209.05274v3 [cs.LG] UPDATED)
    In machine learning, training data often capture the behaviour of multiple subgroups of some underlying human population. When the nature of training data for subgroups are not controlled carefully, under-representation bias arises. To counter this effect we introduce two natural notions of subgroup fairness and instantaneous fairness to address such under-representation bias in time-series forecasting problems. Here we show globally convergent methods for the fairness-constrained learning problems using hierarchies of convexifications of non-commutative polynomial optimisation problems. Our empirical results on a biased data set motivated by insurance applications and the well-known COMPAS data set demonstrate the efficacy of our methods. We also show that by exploiting sparsity in the convexifications, we can reduce the run time of our methods considerably.
    Deep network series for large-scale high-dynamic range imaging. (arXiv:2210.16060v2 [astro-ph.IM] UPDATED)
    We propose a new approach for large-scale high-dynamic range computational imaging. Deep Neural Networks (DNNs) trained end-to-end can solve linear inverse imaging problems almost instantaneously. While unfolded architectures provide robustness to measurement setting variations, embedding large-scale measurement operators in DNN architectures is impractical. Alternative Plug-and-Play (PnP) approaches, where the denoising DNNs are blind to the measurement setting, have proven effective to address scalability and high-dynamic range challenges, but rely on highly iterative algorithms. We propose a residual DNN series approach, also interpretable as a learned version of matching pursuit, where the reconstructed image is a sum of residual images progressively increasing the dynamic range, and estimated iteratively by DNNs taking the back-projected data residual of the previous iteration as input. We demonstrate on radio-astronomical imaging simulations that a series of only few terms provides a reconstruction quality competitive with PnP, at a fraction of the cost.
    Bayesian Weapon System Reliability Modeling with Cox-Weibull Neural Network. (arXiv:2301.01850v4 [stat.AP] UPDATED)
    We propose to integrate weapon system features (such as weapon system manufacturer, deployment time and location, storage time and location, etc.) into a parameterized Cox-Weibull [1] reliability model via a neural network, like DeepSurv [2], to improve predictive maintenance. In parallel, we develop an alternative Bayesian model by parameterizing the Weibull parameters with a neural network and employing dropout methods such as Monte-Carlo (MC)-dropout for comparative purposes. Due to data collection procedures in weapon system testing we employ a novel interval-censored log-likelihood which incorporates Monte-Carlo Markov Chain (MCMC) [3] sampling of the Weibull parameters during gradient descent optimization. We compare classification metrics such as receiver operator curve (ROC) area under the curve (AUC), precision-recall (PR) AUC, and F scores to show our model generally outperforms traditional powerful models such as XGBoost and the current standard conditional Weibull probability density estimation model.
    TDSTF: Transformer-based Diffusion probabilistic model for Sparse Time series Forecasting. (arXiv:2301.06625v2 [cs.LG] UPDATED)
    \noindent \textbf{Background and objective:} In the intensive care unit (ICU), vital sign monitoring is critical, and an accurate predictive system is required. This study will create a novel model to forecast Heart Rate (HR), Systolic Blood Pressure (SBP), and Diastolic Blood Pressure (DBP) in ICU. These vital signs are crucial for prompt interventions for patients. We extracted $24,886$ ICU stays from the MIMIC-III database, which contains data from over $46$ thousand patients, to train and test the model. \noindent \textbf{Methods:} The model proposed in this study, areansformerin intensive careabilistic Model for Sparse Time Series Forecasting (TDSTF), uses a deep learning technique called the Transformer. The TDSTF model showed state-of-the-art performance in predicting vital signs in the ICU, outperforming other models' ability to predict distributions of vital signs and being more computationally efficient. The code is available at https://github.com/PingChang818/TDSTF. \noindent \textbf{Results:} The results of the study showed that TDSTF achieved a Normalized Average Continuous Ranked Probability Score (NACRPS) of $0.4438$ and a Mean Squared Error (MSE) of $0.4168$, an improvement of $18.9\%$ and $34.3\%$ over the best baseline model, respectively. \noindent \textbf{Conclusion:} In conclusion, TDSTF is an effective and efficient solution for forecasting vital signs in the ICU, and it shows a significant improvement compared to other models in the field. \noindent \textbf{Keywords: deep learning, time series forecasting, sparse data, vital signs, ICU}
    Efficient Recovery Learning using Model Predictive Meta-Reasoning. (arXiv:2209.13605v2 [cs.RO] UPDATED)
    Operating under real world conditions is challenging due to the possibility of a wide range of failures induced by execution errors and state uncertainty. In relatively benign settings, such failures can be overcome by retrying or executing one of a small number of hand-engineered recovery strategies. By contrast, contact-rich sequential manipulation tasks, like opening doors and assembling furniture, are not amenable to exhaustive hand-engineering. To address this issue, we present a general approach for robustifying manipulation strategies in a sample-efficient manner. Our approach incrementally improves robustness by first discovering the failure modes of the current strategy via exploration in simulation and then learning additional recovery skills to handle these failures. To ensure efficient learning, we propose an online algorithm called Meta-Reasoning for Skill Learning (MetaReSkill) that monitors the progress of all recovery policies during training and allocates training resources to recoveries that are likely to improve the task performance the most. We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning. Compared to open-loop execution, our experiments show that even a limited amount of recovery learning improves task success substantially from 71% to 92.4% in simulation and from 75% to 90% on a real robot.
    Sparse and Local Networks for Hypergraph Reasoning. (arXiv:2303.05496v1 [cs.LG])
    Reasoning about the relationships between entities from input facts (e.g., whether Ari is a grandparent of Charlie) generally requires explicit consideration of other entities that are not mentioned in the query (e.g., the parents of Charlie). In this paper, we present an approach for learning to solve problems of this kind in large, real-world domains, using sparse and local hypergraph neural networks (SpaLoc). SpaLoc is motivated by two observations from traditional logic-based reasoning: relational inferences usually apply locally (i.e., involve only a small number of individuals), and relations are usually sparse (i.e., only hold for a small percentage of tuples in a domain). We exploit these properties to make learning and inference efficient in very large domains by (1) using a sparse tensor representation for hypergraph neural networks, (2) applying a sparsification loss during training to encourage sparse representations, and (3) subsampling based on a novel information sufficiency-based sampling process during training. SpaLoc achieves state-of-the-art performance on several real-world, large-scale knowledge graph reasoning benchmarks, and is the first framework for applying hypergraph neural networks on real-world knowledge graphs with more than 10k nodes.
    CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning. (arXiv:2211.08229v3 [cs.CR] UPDATED)
    Contrastive learning (CL) pre-trains general-purpose encoders using an unlabeled pre-training dataset, which consists of images or image-text pairs. CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an attacker injects poisoned inputs into the pre-training dataset so the encoder is backdoored. However, existing DPBAs achieve limited effectiveness. In this work, we propose new DPBAs called CorruptEncoder to CL. CorruptEncoder uses a theory-guided method to create optimal poisoned inputs to maximize attack effectiveness. Our experiments show that CorruptEncoder substantially outperforms existing DPBAs. In particular, CorruptEncoder is the first DPBA that achieves more than 90% attack success rates with only a few (3) reference images and a small poisoning ratio (0.5%). Moreover, we also propose a defense, called localized cropping, to defend against DPBAs. Our results show that our defense can reduce the effectiveness of DPBAs, though it slightly sacrifices the utility of the encoder.
    Optimal Algorithms for Latent Bandits with Cluster Structure. (arXiv:2301.07040v2 [cs.LG] UPDATED)
    We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into \emph{latent} clusters such that the mean reward vectors of users within the same cluster are identical. At each round, a user, selected uniformly at random, pulls an arm and observes a corresponding noisy reward. The goal of the users is to maximize their cumulative rewards. This problem is central to practical recommendation systems and has received wide attention of late \cite{gentile2014online, maillard2014latent}. Now, if each user acts independently, then they would have to explore each arm independently and a regret of $\Omega(\sqrt{\mathsf{MNT}})$ is unavoidable, where $\mathsf{M}, \mathsf{N}$ are the number of arms and users, respectively. Instead, we propose LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $\widetilde{O}(\sqrt{(\mathsf{M}+\mathsf{N})\mathsf{T}})$, when the number of clusters is $\widetilde{O}(1)$. This is the first algorithm to guarantee such strong regret bound. LATTICE is based on a careful exploitation of arm information within a cluster while simultaneously clustering users. Furthermore, it is computationally efficient and requires only $O(\log{\mathsf{T}})$ calls to an offline matrix completion oracle across all $\mathsf{T}$ rounds.
    LidarCLIP or: How I Learned to Talk to Point Clouds. (arXiv:2212.06858v2 [cs.CV] UPDATED)
    Research connecting text and images has recently seen several breakthroughs, with models like CLIP, DALL-E 2, and Stable Diffusion. However, the connection between text and other visual modalities, such as lidar data, has received less attention, prohibited by the lack of text-lidar datasets. In this work, we propose LidarCLIP, a mapping from automotive point clouds to a pre-existing CLIP embedding space. Using image-lidar pairs, we supervise a point cloud encoder with the image CLIP embeddings, effectively relating text and lidar data with the image domain as an intermediary. We show the effectiveness of LidarCLIP by demonstrating that lidar-based retrieval is generally on par with image-based retrieval, but with complementary strengths and weaknesses. By combining image and lidar features, we improve upon both single-modality methods and enable a targeted search for challenging detection scenarios under adverse sensor conditions. We also explore zero-shot classification and show that LidarCLIP outperforms existing attempts to use CLIP for point clouds by a large margin. Finally, we leverage our compatibility with CLIP to explore a range of applications, such as point cloud captioning and lidar-to-image generation, without any additional training. Code and pre-trained models are available at https://github.com/atonderski/lidarclip.
    On the Robustness of Dataset Inference. (arXiv:2210.13631v2 [cs.LG] UPDATED)
    Machine learning (ML) models are costly to train as they can require a significant amount of data, computational resources and technical expertise. Thus, they constitute valuable intellectual property that needs protection from adversaries wanting to steal them. Ownership verification techniques allow the victims of model stealing attacks to demonstrate that a suspect model was in fact stolen from theirs. Although a number of ownership verification techniques based on watermarking or fingerprinting have been proposed, most of them fall short either in terms of security guarantees (well-equipped adversaries can evade verification) or computational cost. A fingerprinting technique introduced at ICLR '21, Dataset Inference (DI), has been shown to offer better robustness and efficiency than prior methods. The authors of DI provided a correctness proof for linear (suspect) models. However, in the same setting, we prove that DI suffers from high false positives (FPs) -- it can incorrectly identify an independent model trained with non-overlapping data from the same distribution as stolen. We further prove that DI also triggers FPs in realistic, non-linear suspect models. We then confirm empirically that DI leads to FPs, with high confidence. Second, we show that DI also suffers from false negatives (FNs) -- an adversary can fool DI by regularising a stolen model's decision boundaries using adversarial training, thereby leading to an FN. To this end, we demonstrate that DI fails to identify a model adversarially trained from a stolen dataset -- the setting where DI is the hardest to evade. Finally, we discuss the implications of our findings, the viability of fingerprinting-based ownership verification in general, and suggest directions for future work.
    Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds. (arXiv:2207.00531v3 [cs.CV] UPDATED)
    Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and, recently, point clouds. Raw automotive datasets are suitable candidates for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, the development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward small and dense point clouds with homogeneous point densities. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose Voxel-MAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We pre-train the backbone of a Transformer-based 3D object detector to reconstruct masked voxels and to distinguish between empty and non-empty voxels. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset. Further, we show that by pre-training with Voxel-MAE, we require only 40% of the annotated data to outperform a randomly initialized equivalent. Code available at https://github.com/georghess/voxel-mae
    Benchmarking AutoML algorithms on a collection of synthetic classification problems. (arXiv:2212.02704v3 [cs.LG] UPDATED)
    Automated machine learning (AutoML) algorithms have grown in popularity due to their high performance and flexibility to adapt to different problems and data sets. With the increasing number of AutoML algorithms, deciding which would best suit a given problem becomes increasingly more work. Therefore, it is essential to use complex and challenging benchmarks which would be able to differentiate the AutoML algorithms from each other. This paper compares the performance of four different AutoML algorithms: Tree-based Pipeline Optimization Tool (TPOT), Auto-Sklearn, Auto-Sklearn 2, and H2O AutoML. We use the Diverse and Generative ML benchmark (DIGEN), a diverse set of synthetic datasets derived from generative functions designed to highlight the strengths and weaknesses of the performance of common machine learning algorithms. We confirm that AutoML can identify pipelines that perform well on all included datasets. Most AutoML algorithms performed similarly; however, there were some differences depending on the specific dataset and metric used.
    Scaling up GANs for Text-to-Image Synthesis. (arXiv:2303.05511v1 [cs.CV])
    The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that na\"Ively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.
    Certified Training: Small Boxes are All You Need. (arXiv:2210.04871v2 [cs.LG] UPDATED)
    To obtain, deterministic guarantees of adversarial robustness, specialized training methods are used. We propose, SABR, a novel such certified training method, based on the key insight that propagating interval bounds for a small but carefully selected subset of the adversarial input region is sufficient to approximate the worst-case loss over the whole region while significantly reducing approximation errors. We show in an extensive empirical evaluation that SABR outperforms existing certified defenses in terms of both standard and certifiable accuracies across perturbation magnitudes and datasets, pointing to a new class of certified training methods promising to alleviate the robustness-accuracy trade-off.
    PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification. (arXiv:2303.05512v1 [cs.CV])
    Existing approaches to system identification (estimating the physical parameters of an object) from videos assume known object geometries. This precludes their applicability in a vast majority of scenes where object geometries are complex or unknown. In this work, we aim to identify parameters characterizing a physical system from a set of multi-view videos without any assumption on object geometry or topology. To this end, we propose "Physics Augmented Continuum Neural Radiance Fields" (PAC-NeRF), to estimate both the unknown geometry and physical parameters of highly dynamic objects from multi-view videos. We design PAC-NeRF to only ever produce physically plausible states by enforcing the neural radiance field to follow the conservation laws of continuum mechanics. For this, we design a hybrid Eulerian-Lagrangian representation of the neural radiance field, i.e., we use the Eulerian grid representation for NeRF density and color fields, while advecting the neural radiance fields via Lagrangian particles. This hybrid Eulerian-Lagrangian representation seamlessly blends efficient neural rendering with the material point method (MPM) for robust differentiable physics simulation. We validate the effectiveness of our proposed framework on geometry and physical parameter estimation over a vast range of materials, including elastic bodies, plasticine, sand, Newtonian and non-Newtonian fluids, and demonstrate significant performance gain on most tasks.
    Global Concept-Based Interpretability for Graph Neural Networks via Neuron Analysis. (arXiv:2208.10609v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) are highly effective on a variety of graph-related tasks; however, they lack interpretability and transparency. Current explainability approaches are typically local and treat GNNs as black-boxes. They do not look inside the model, inhibiting human trust in the model and explanations. Motivated by the ability of neurons to detect high-level semantic concepts in vision models, we perform a novel analysis on the behaviour of individual GNN neurons to answer questions about GNN interpretability, and propose new metrics for evaluating the interpretability of GNN neurons. We propose a novel approach for producing global explanations for GNNs using neuron-level concepts to enable practitioners to have a high-level view of the model. Specifically, (i) to the best of our knowledge, this is the first work which shows that GNN neurons act as concept detectors and have strong alignment with concepts formulated as logical compositions of node degree and neighbourhood properties; (ii) we quantitatively assess the importance of detected concepts, and identify a trade-off between training duration and neuron-level interpretability; (iii) we demonstrate that our global explainability approach has advantages over the current state-of-the-art -- we can disentangle the explanation into individual interpretable concepts backed by logical descriptions, which reduces potential for bias and improves user-friendliness.
    Asynchronous Hybrid Reinforcement Learning for Latency and Reliability Optimization in the Metaverse over Wireless Communications. (arXiv:2212.14749v2 [cs.LG] UPDATED)
    Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for the Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual objects is computationally intensive and requires computation offloading. The disparity in transmitted object dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world images captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual objects need to be transmitted back to the XUs. We design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC), to optimize the decisions pertaining to computation offloading and channel assignment in the UL stage and optimize the DL transmission power in the DL stage. Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with satisfactory training time.
    A Survey on Federated Recommendation Systems. (arXiv:2301.00767v2 [cs.IR] UPDATED)
    Federated learning has recently been applied to recommendation systems to protect user privacy. In federated learning settings, recommendation systems can train recommendation models only collecting the intermediate parameters instead of the real user data, which greatly enhances the user privacy. Beside, federated recommendation systems enable to collaborate with other data platforms to improve recommended model performance while meeting the regulation and privacy constraints. However, federated recommendation systems faces many new challenges such as privacy, security, heterogeneity and communication costs. While significant research has been conducted in these areas, gaps in the surveying literature still exist. In this survey, we-(1) summarize some common privacy mechanisms used in federated recommendation systems and discuss the advantages and limitations of each mechanism; (2) review some robust aggregation strategies and several novel attacks against security; (3) summarize some approaches to address heterogeneity and communication costs problems; (4)introduce some open source platforms that can be used to build federated recommendation systems; (5) present some prospective research directions in the future. This survey can guide researchers and practitioners understand the research progress in these areas.
    Inversion dynamics of class manifolds in deep learning reveals tradeoffs underlying generalisation. (arXiv:2303.05161v1 [cs.LG])
    To achieve near-zero training error in a classification problem, the layers of a deep network have to disentangle the manifolds of data points with different labels, to facilitate the discrimination. However, excessive class separation can bring to overfitting since good generalisation requires learning invariant features, which involve some level of entanglement. We report on numerical experiments showing how the optimisation dynamics finds representations that balance these opposing tendencies with a non-monotonic trend. After a fast segregation phase, a slower rearrangement (conserved across data sets and architectures) increases the class entanglement. The training error at the inversion is remarkably stable under subsampling, and across network initialisations and optimisers, which characterises it as a property solely of the data structure and (very weakly) of the architecture. The inversion is the manifestation of tradeoffs elicited by well-defined and maximally stable elements of the training set, coined "stragglers", particularly influential for generalisation.
    Semantics-Native Communication with Contextual Reasoning. (arXiv:2108.05681v2 [cs.IT] UPDATED)
    Spurred by a huge interest in the post-Shannon communication, it has recently been shown that leveraging semantics can significantly improve the communication effectiveness across many tasks. In this article, inspired by human communication, we propose a novel stochastic model of System 1 semantics-native communication (SNC) for generic tasks, where a speaker has an intention of referring to an entity, extracts the semantics, and communicates its symbolic representation to a target listener. To further reach its full potential, we additionally infuse contextual reasoning into SNC such that the speaker locally and iteratively self-communicates with a virtual agent built on the physical listener's unique way of coding its semantics, i.e., communication context. The resultant System 2 SNC allows the speaker to extract the most effective semantics for its listener. Leveraging the proposed stochastic model, we show that the reliability of System 2 SNC increases with the number of meaningful concepts, and derive the expected semantic representation (SR) bit length which quantifies the extracted effective semantics. It is also shown that System 2 SNC significantly reduces the SR length without compromising communication reliability.
    SAM as an Optimal Relaxation of Bayes. (arXiv:2210.01620v2 [cs.LG] UPDATED)
    Sharpness-aware minimization (SAM) and related adversarial deep-learning methods can drastically improve generalization, but their underlying mechanisms are not yet fully understood. Here, we establish SAM as a relaxation of the Bayes objective where the expected negative-loss is replaced by the optimal convex lower bound, obtained by using the so-called Fenchel biconjugate. The connection enables a new Adam-like extension of SAM to automatically obtain reasonable uncertainty estimates, while sometimes also improving its accuracy. By connecting adversarial and Bayesian methods, our work opens a new path to robustness.
    Open-vocabulary Attribute Detection. (arXiv:2211.12914v2 [cs.CV] UPDATED)
    Vision-language modeling has enabled open-vocabulary tasks where predictions can be queried using any text prompt in a zero-shot manner. Existing open-vocabulary tasks focus on object classes, whereas research on object attributes is limited due to the lack of a reliable attribute-focused evaluation benchmark. This paper introduces the Open-Vocabulary Attribute Detection (OVAD) task and the corresponding OVAD benchmark. The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models. To this end, we created a clean and densely annotated test set covering 117 attribute classes on the 80 object classes of MS COCO. It includes positive and negative annotations, which enables open-vocabulary evaluation. Overall, the benchmark consists of 1.4 million annotations. For reference, we provide a first baseline method for open-vocabulary attribute detection. Moreover, we demonstrate the benchmark's value by studying the attribute detection performance of several foundation models. Project page https://ovad-benchmark.github.io
    Natural Gradient Methods: Perspectives, Efficient-Scalable Approximations, and Analysis. (arXiv:2303.05473v1 [cs.LG])
    Natural Gradient Descent, a second-degree optimization method motivated by the information geometry, makes use of the Fisher Information Matrix instead of the Hessian which is typically used. However, in many cases, the Fisher Information Matrix is equivalent to the Generalized Gauss-Newton Method, that both approximate the Hessian. It is an appealing method to be used as an alternative to stochastic gradient descent, potentially leading to faster convergence. However, being a second-order method makes it infeasible to be used directly in problems with a huge number of parameters and data. This is evident from the community of deep learning sticking with the stochastic gradient descent method since the beginning. In this paper, we look at the different perspectives on the natural gradient method, study the current developments on its efficient-scalable empirical approximations, and finally examine their performance with extensive experiments.
    PDSketch: Integrated Planning Domain Programming and Learning. (arXiv:2303.05501v1 [cs.AI])
    This paper studies a model learning and online planning approach towards building flexible and general robots. Specifically, we investigate how to exploit the locality and sparsity structures in the underlying environmental transition model to improve model generalization, data-efficiency, and runtime-efficiency. We present a new domain definition language, named PDSketch. It allows users to flexibly define high-level structures in the transition models, such as object and feature dependencies, in a way similar to how programmers use TensorFlow or PyTorch to specify kernel sizes and hidden dimensions of a convolutional neural network. The details of the transition model will be filled in by trainable neural networks. Based on the defined structures and learned parameters, PDSketch automatically generates domain-independent planning heuristics without additional training. The derived heuristics accelerate the performance-time planning for novel goals.
    Planning with Large Language Models for Code Generation. (arXiv:2303.05510v1 [cs.LG])
    Existing large language model-based code generation pipelines typically use beam search or sampling algorithms during the decoding process. Although the programs they generate achieve high token-matching-based scores, they often fail to compile or generate incorrect outputs. The main reason is that conventional Transformer decoding algorithms may not be the best choice for code generation. In this work, we propose a novel Transformer decoding algorithm, Planning-Guided Transformer Decoding (PG-TD), that uses a planning algorithm to do lookahead search and guide the Transformer to generate better programs. Specifically, instead of simply optimizing the likelihood of the generated sequences, the Transformer makes use of a planner to generate candidate programs and test them on public test cases. The Transformer can therefore make more informed decisions and generate tokens that will eventually lead to higher-quality programs. We also design a mechanism that shares information between the Transformer and the planner to make our algorithm computationally efficient. We empirically evaluate our framework with several large language models as backbones on public coding challenge benchmarks, showing that 1) it can generate programs that consistently achieve higher performance compared with competing baseline methods; 2) it enables controllable code generation, such as concise codes and highly-commented codes by optimizing modified objective.
    PC-JeDi: Diffusion for Particle Cloud Generation in High Energy Physics. (arXiv:2303.05376v1 [hep-ph])
    In this paper, we present a new method to efficiently generate jets in High Energy Physics called PC-JeDi. This method utilises score-based diffusion models in conjunction with transformers which are well suited to the task of generating jets as particle clouds due to their permutation equivariance. PC-JeDi achieves competitive performance with current state-of-the-art methods across several metrics that evaluate the quality of the generated jets. Although slower than other models, due to the large number of forward passes required by diffusion models, it is still substantially faster than traditional detailed simulation. Furthermore, PC-JeDi uses conditional generation to produce jets with a desired mass and transverse momentum for two different particles, top quarks and gluons.
    CoolPINNs: A Physics-informed Neural Network Modeling of Active Cooling in Vascular Systems. (arXiv:2303.05300v1 [math.NA])
    Emerging technologies like hypersonic aircraft, space exploration vehicles, and batteries avail fluid circulation in embedded microvasculatures for efficient thermal regulation. Modeling is vital during these engineered systems' design and operational phases. However, many challenges exist in developing a modeling framework. What is lacking is an accurate framework that (i) captures sharp jumps in the thermal flux across complex vasculature layouts, (ii) deals with oblique derivatives (involving tangential and normal components), (iii) handles nonlinearity because of radiative heat transfer, (iv) provides a high-speed forecast for real-time monitoring, and (v) facilitates robust inverse modeling. This paper addresses these challenges by availing the power of physics-informed neural networks (PINNs). We develop a fast, reliable, and accurate Scientific Machine Learning (SciML) framework for vascular-based thermal regulation -- called CoolPINNs: a PINNs-based modeling framework for active cooling. The proposed mesh-less framework elegantly overcomes all the mentioned challenges. The significance of the reported research is multi-fold. First, the framework is valuable for real-time monitoring of thermal regulatory systems because of rapid forecasting. Second, researchers can address complex thermoregulation designs inasmuch as the approach is mesh-less. Finally, the framework facilitates systematic parameter identification and inverse modeling studies, perhaps the current framework's most significant utility.
    Part-Based Models Improve Adversarial Robustness. (arXiv:2209.09117v2 [cs.CV] UPDATED)
    We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks by introducing a part-based model for object classification. We believe that the richer form of annotation helps guide neural networks to learn more robust features without requiring more samples or larger models. Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts and then classify the segmented object. Empirically, our part-based models achieve both higher accuracy and higher adversarial robustness than a ResNet-50 baseline on all three datasets. For instance, the clean accuracy of our part models is up to 15 percentage points higher than the baseline's, given the same level of robustness. Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations. The code is publicly available at https://github.com/chawins/adv-part-model.
    Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning. (arXiv:2205.01992v3 [cs.LG] UPDATED)
    The success of machine learning is fueled by the increasing availability of computing power and large training datasets. The training data is used to learn new models or update existing ones, assuming that it is sufficiently representative of the data that will be encountered at test time. This assumption is challenged by the threat of poisoning, an attack that manipulates the training data to compromise the model's performance at test time. Although poisoning has been acknowledged as a relevant threat in industry applications, and a variety of different attacks and defenses have been proposed so far, a complete systematization and critical review of the field is still missing. In this survey, we provide a comprehensive systematization of poisoning attacks and defenses in machine learning, reviewing more than 100 papers published in the field in the last 15 years. We start by categorizing the current threat models and attacks, and then organize existing defenses accordingly. While we focus mostly on computer-vision applications, we argue that our systematization also encompasses state-of-the-art attacks and defenses for other data modalities. Finally, we discuss existing resources for research in poisoning, and shed light on the current limitations and open research questions in this research field.
    Synthesizer Preset Interpolation using Transformer Auto-Encoders. (arXiv:2210.16984v2 [cs.SD] UPDATED)
    Sound synthesizers are widespread in modern music production but they increasingly require expert skills to be mastered. This work focuses on interpolation between presets, i.e., sets of values of all sound synthesis parameters, to enable the intuitive creation of new sounds from existing ones. We introduce a bimodal auto-encoder neural network, which simultaneously processes presets using multi-head attention blocks, and audio using convolutions. This model has been tested on a popular frequency modulation synthesizer with more than one hundred parameters. Experiments have compared the model to related architectures and methods, and have demonstrated that it performs smoother interpolations. After training, the proposed model can be integrated into commercial synthesizers for live interpolation or sound design tasks.
    Structure-Aware Group Discrimination with Adaptive-View Graph Encoder: A Fast Graph Contrastive Learning Framework. (arXiv:2303.05231v1 [cs.LG])
    Albeit having gained significant progress lately, large-scale graph representation learning remains expensive to train and deploy for two main reasons: (i) the repetitive computation of multi-hop message passing and non-linearity in graph neural networks (GNNs); (ii) the computational cost of complex pairwise contrastive learning loss. Two main contributions are made in this paper targeting this twofold challenge: we first propose an adaptive-view graph neural encoder (AVGE) with a limited number of message passing to accelerate the forward pass computation, and then we propose a structure-aware group discrimination (SAGD) loss in our framework which avoids inefficient pairwise loss computing in most common GCL and improves the performance of the simple group discrimination. By the framework proposed, we manage to bring down the training and inference cost on various large-scale datasets by a significant margin (250x faster inference time) without loss of the downstream-task performance.
    Disentangling representations in Restricted Boltzmann Machines without adversaries. (arXiv:2206.11600v4 [cs.LG] UPDATED)
    A goal of unsupervised machine learning is to build representations of complex high-dimensional data, with simple relations to their properties. Such disentangled representations make easier to interpret the significant latent factors of variation in the data, as well as to generate new data with desirable features. Methods for disentangling representations often rely on an adversarial scheme, in which representations are tuned to avoid discriminators from being able to reconstruct information about the data properties (labels). Unfortunately adversarial training is generally difficult to implement in practice. Here we propose a simple, effective way of disentangling representations without any need to train adversarial discriminators, and apply our approach to Restricted Boltzmann Machines (RBM), one of the simplest representation-based generative models. Our approach relies on the introduction of adequate constraints on the weights during training, which allows us to concentrate information about labels on a small subset of latent variables. The effectiveness of the approach is illustrated with four examples: the CelebA dataset of facial images, the two-dimensional Ising model, the MNIST dataset of handwritten digits, and the taxonomy of protein families. In addition, we show how our framework allows for analytically computing the cost, in terms of log-likelihood of the data, associated to the disentanglement of their representations.
    Causal Confusion and Reward Misidentification in Preference-Based Reward Learning. (arXiv:2204.06601v3 [cs.LG] UPDATED)
    Learning policies via preference-based reward learning is an increasingly popular method for customizing agent behavior, but has been shown anecdotally to be prone to spurious correlations and reward hacking behaviors. While much prior work focuses on causal confusion in reinforcement learning and behavioral cloning, we focus on a systematic study of causal confusion and reward misidentification when learning from preferences. In particular, we perform a series of sensitivity and ablation analyses on several benchmark domains where rewards learned from preferences achieve minimal test error but fail to generalize to out-of-distribution states -- resulting in poor policy performance when optimized. We find that the presence of non-causal distractor features, noise in the stated preferences, and partial state observability can all exacerbate reward misidentification. We also identify a set of methods with which to interpret misidentified learned rewards. In general, we observe that optimizing misidentified rewards drives the policy off the reward's training distribution, resulting in high predicted (learned) rewards but low true rewards. These findings illuminate the susceptibility of preference learning to reward misidentification and causal confusion -- failure to consider even one of many factors can result in unexpected, undesirable behavior.
    Optimizing Sparse Linear Algebra Through Automatic Format Selection and Machine Learning. (arXiv:2303.05098v1 [cs.LG])
    Sparse matrices are an integral part of scientific simulations. As hardware evolves new sparse matrix storage formats are proposed aiming to exploit optimizations specific to the new hardware. In the era of heterogeneous computing, users often are required to use multiple formats for their applications to remain optimal across the different available hardware, resulting in larger development times and maintenance overhead. A potential solution to this problem is the use of a lightweight auto-tuner driven by Machine Learning (ML) that would select for the user an optimal format from a pool of available formats that will match the characteristics of the sparsity pattern, target hardware and operation to execute. In this paper, we introduce Morpheus-Oracle, a library that provides a lightweight ML auto-tuner capable of accurately predicting the optimal format across multiple backends, targeting the major HPC architectures aiming to eliminate any format selection input by the end-user. From more than 2000 real-life matrices, we achieve an average classification accuracy and balanced accuracy of 92.63% and 80.22% respectively across the available systems. The adoption of the auto-tuner results in average speedup of 1.1x on CPUs and 1.5x to 8x on NVIDIA and AMD GPUs, with maximum speedups reaching up to 7x and 1000x respectively.
    SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion. (arXiv:2209.03855v3 [cs.RO] UPDATED)
    Multi-objective optimization problems are ubiquitous in robotics, e.g., the optimization of a robot manipulation task requires a joint consideration of grasp pose configurations, collisions and joint limits. While some demands can be easily hand-designed, e.g., the smoothness of a trajectory, several task-specific objectives need to be learned from data. This work introduces a method for learning data-driven SE(3) cost functions as diffusion models. Diffusion models can represent highly-expressive multimodal distributions and exhibit proper gradients over the entire space due to their score-matching training objective. Learning costs as diffusion models allows their seamless integration with other costs into a single differentiable objective function, enabling joint gradient-based motion optimization. In this work, we focus on learning SE(3) diffusion models for 6DoF grasping, giving rise to a novel framework for joint grasp and motion optimization without needing to decouple grasp selection from trajectory generation. We evaluate the representation power of our SE(3) diffusion models w.r.t. classical generative models, and we showcase the superior performance of our proposed optimization framework in a series of simulated and real-world robotic manipulation tasks against representative baselines.
    Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning. (arXiv:2202.11202v3 [cs.LG] UPDATED)
    Indiscriminate data poisoning attacks are quite effective against supervised learning. However, not much is known about their impact on unsupervised contrastive learning (CL). This paper is the first to consider indiscriminate poisoning attacks of contrastive learning. We propose Contrastive Poisoning (CP), the first effective such attack on CL. We empirically show that Contrastive Poisoning, not only drastically reduces the performance of CL algorithms, but also attacks supervised learning models, making it the most generalizable indiscriminate poisoning attack. We also show that CL algorithms with a momentum encoder are more robust to indiscriminate poisoning, and propose a new countermeasure based on matrix completion. Code is available at: https://github.com/kaiwenzha/contrastive-poisoning.
    A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta. (arXiv:2206.11124v2 [cs.LG] UPDATED)
    Mini-batch SGD with momentum is a fundamental algorithm for learning large predictive models. In this paper we develop a new analytic framework to analyze noise-averaged properties of mini-batch SGD for linear models at constant learning rates, momenta and sizes of batches. Our key idea is to consider the dynamics of the second moments of model parameters for a special family of "Spectrally Expressible" approximations. This allows to obtain an explicit expression for the generating function of the sequence of loss values. By analyzing this generating function, we find, in particular, that 1) the SGD dynamics exhibits several convergent and divergent regimes depending on the spectral distributions of the problem; 2) the convergent regimes admit explicit stability conditions, and explicit loss asymptotics in the case of power-law spectral distributions; 3) the optimal convergence rate can be achieved at negative momenta. We verify our theoretical predictions by extensive experiments with MNIST, CIFAR10 and synthetic problems, and find a good quantitative agreement.
    Semi-Federated Learning for Collaborative Intelligence in Massive IoT Networks. (arXiv:2303.05048v1 [cs.LG])
    Implementing existing federated learning in massive Internet of Things (IoT) networks faces critical challenges such as imbalanced and statistically heterogeneous data and device diversity. To this end, we propose a semi-federated learning (SemiFL) framework to provide a potential solution for the realization of intelligent IoT. By seamlessly integrating the centralized and federated paradigms, our SemiFL framework shows high scalability in terms of the number of IoT devices even in the presence of computing-limited sensors. Furthermore, compared to traditional learning approaches, the proposed SemiFL can make better use of distributed data and computing resources, due to the collaborative model training between the edge server and local devices. Simulation results show the effectiveness of our SemiFL framework for massive IoT networks. The code can be found at https://github.com/niwanli/SemiFL_IoT.
    TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization. (arXiv:2303.05506v1 [cs.LG])
    Despite their success with unstructured data, deep neural networks are not yet a panacea for structured tabular data. In the tabular domain, their efficiency crucially relies on various forms of regularization to prevent overfitting and provide strong generalization performance. Existing regularization techniques include broad modelling decisions such as choice of architecture, loss functions, and optimization methods. In this work, we introduce Tabular Neural Gradient Orthogonalization and Specialization (TANGOS), a novel framework for regularization in the tabular setting built on latent unit attributions. The gradient attribution of an activation with respect to a given input feature suggests how the neuron attends to that feature, and is often employed to interpret the predictions of deep networks. In TANGOS, we take a different approach and incorporate neuron attributions directly into training to encourage orthogonalization and specialization of latent attributions in a fully-connected network. Our regularizer encourages neurons to focus on sparse, non-overlapping input features and results in a set of diverse and specialized latent units. In the tabular domain, we demonstrate that our approach can lead to improved out-of-sample generalization performance, outperforming other popular regularization methods. We provide insight into why our regularizer is effective and demonstrate that TANGOS can be applied jointly with existing methods to achieve even greater generalization performance.
    Generalization Bounds via Information Density and Conditional Information Density. (arXiv:2005.08044v6 [cs.LG] UPDATED)
    We present a general approach, based on exponential inequalities, to derive bounds on the generalization error of randomized learning algorithms. Using this approach, we provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios. Specifically, for the case of sub-Gaussian loss functions, we obtain novel bounds that depend on the information density between the training data and the output hypothesis. When suitably weakened, these bounds recover many of the information-theoretic bounds available in the literature. We also extend the proposed exponential-inequality approach to the setting recently introduced by Steinke and Zakynthinou (2020), where the learning algorithm depends on a randomly selected subset of the available training data. For this setup, we present bounds for bounded loss functions in terms of the conditional information density between the output hypothesis and the random variable determining the subset choice, given all training data. Through our approach, we recover the average generalization bound presented by Steinke and Zakynthinou (2020) and extend it to the PAC-Bayesian and single-draw scenarios. For the single-draw scenario, we also obtain novel bounds in terms of the conditional $\alpha$-mutual information and the conditional maximal leakage.
    Local Convolutions Cause an Implicit Bias towards High Frequency Adversarial Examples. (arXiv:2006.11440v5 [stat.ML] UPDATED)
    Adversarial Attacks are still a significant challenge for neural networks. Recent work has shown that adversarial perturbations typically contain high-frequency features, but the root cause of this phenomenon remains unknown. Inspired by theoretical work on linear full-width convolutional models, we hypothesize that the local (i.e. bounded-width) convolutional operations commonly used in current neural networks are implicitly biased to learn high frequency features, and that this is one of the root causes of high frequency adversarial examples. To test this hypothesis, we analyzed the impact of different choices of linear and nonlinear architectures on the implicit bias of the learned features and the adversarial perturbations, in both spatial and frequency domains. We find that the high-frequency adversarial perturbations are critically dependent on the convolution operation because the spatially-limited nature of local convolutions induces an implicit bias towards high frequency features. The explanation for the latter involves the Fourier Uncertainty Principle: a spatially-limited (local in the space domain) filter cannot also be frequency-limited (local in the frequency domain). Furthermore, using larger convolution kernel sizes or avoiding convolutions (e.g. by using Vision Transformers architecture) significantly reduces this high frequency bias, but not the overall susceptibility to attacks. Looking forward, our work strongly suggests that understanding and controlling the implicit bias of architectures will be essential for achieving adversarial robustness.
    Continual Learning for Monolingual End-to-End Automatic Speech Recognition. (arXiv:2112.09427v4 [eess.AS] UPDATED)
    Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of performance on the original domain(s), a phenomenon called Catastrophic Forgetting (CF). Even monolingual ASR models cannot be extended to new accents, dialects, topics, etc. without suffering from CF, making them unable to be continually enhanced without storing all past data. Fortunately, Continual Learning (CL) methods, which aim to enable continual adaptation while overcoming CF, can be used. In this paper, we implement an extensive number of CL methods for End-to-End ASR and test and compare their ability to extend a monolingual Hybrid CTC-Transformer model across four new tasks. We find that the best performing CL method closes the gap between the fine-tuned model (lower bound) and the model trained jointly on all tasks (upper bound) by more than 40%, while requiring access to only 0.6% of the original data.
    More is Less: Inducing Sparsity via Overparameterization. (arXiv:2112.11027v4 [math.OC] UPDATED)
    In deep learning it is common to overparameterize neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very well, while classical statistics would suggest overfitting. In order to gain understanding of this implicit bias phenomenon we study the special case of sparse recovery (compressed sensing) which is of interest on its own. More precisely, in order to reconstruct a vector from underdetermined linear measurements, we introduce a corresponding overparameterized square loss functional, where the vector to be reconstructed is deeply factorized into several vectors. We show that, if there exists an exact solution, vanilla gradient flow for the overparameterized loss functional converges to a good approximation of the solution of minimal $\ell_1$-norm. The latter is well-known to promote sparse solutions. As a by-product, our results significantly improve the sample complexity for compressed sensing via gradient flow/descent on overparameterized models derived in previous works. The theory accurately predicts the recovery rate in numerical experiments. Our proof relies on analyzing a certain Bregman divergence of the flow. This bypasses the obstacles caused by non-convexity and should be of independent interest.
    Power and Interference Control for VLC-Based UDN: A Reinforcement Learning Approach. (arXiv:2303.05448v1 [eess.SP])
    Visible light communication (VLC) has been widely applied as a promising solution for modern short range communication. When it comes to the deployment of LED arrays in VLC networks, the emerging ultra-dense network (UDN) technology can be adopted to expand the VLC network's capacity. However, the problem of inter-cell interference (ICI) mitigation and efficient power control in the VLC-based UDN is still a critical challenge. To this end, a reinforcement learning (RL) based VLC UDN architecture is devised in this paper. The deployment of the cells is optimized via spatial reuse to mitigate ICI. An RL-based algorithm is proposed to dynamically optimize the policy of power and interference control, maximizing the system utility in the complicated and dynamic environment. Simulation results demonstrate the superiority of the proposed scheme, it increase the system utility and achievable data rate while reducing the energy consumption and ICI, which outperforms the benchmark scheme.
    Early Warning Signals of Social Instabilities in Twitter Data. (arXiv:2303.05401v1 [cs.CL])
    The goal of this project is to create and study novel techniques to identify early warning signals for socially disruptive events, like riots, wars, or revolutions using only publicly available data on social media. Such techniques need to be robust enough to work on real-time data: to achieve this goal we propose a topological approach together with more standard BERT models. Indeed, topology-based algorithms, being provably stable against deformations and noise, seem to work well in low-data regimes. The general idea is to build a binary classifier that predicts if a given tweet is related to a disruptive event or not. The results indicate that the persistent-gradient approach is stable and even more performant than deep-learning-based anomaly detection algorithms. We also benchmark the generalisability of the methodology against out-of-samples tasks, with very promising results.
    On the Expressiveness and Generalization of Hypergraph Neural Networks. (arXiv:2303.05490v1 [cs.LG])
    This extended abstract describes a framework for analyzing the expressiveness, learning, and (structural) generalization of hypergraph neural networks (HyperGNNs). Specifically, we focus on how HyperGNNs can learn from finite datasets and generalize structurally to graph reasoning problems of arbitrary input sizes. Our first contribution is a fine-grained analysis of the expressiveness of HyperGNNs, that is, the set of functions that they can realize. Our result is a hierarchy of problems they can solve, defined in terms of various hyperparameters such as depths and edge arities. Next, we analyze the learning properties of these neural networks, especially focusing on how they can be trained on a finite set of small graphs and generalize to larger graphs, which we term structural generalization. Our theoretical results are further supported by the empirical results.
    Mark My Words: Dangers of Watermarked Images in ImageNet. (arXiv:2303.05498v1 [cs.LG])
    The utilization of pre-trained networks, especially those trained on ImageNet, has become a common practice in Computer Vision. However, prior research has indicated that a significant number of images in the ImageNet dataset contain watermarks, making pre-trained networks susceptible to learning artifacts such as watermark patterns within their latent spaces. In this paper, we aim to assess the extent to which popular pre-trained architectures display such behavior and to determine which classes are most affected. Additionally, we examine the impact of watermarks on the extracted features. Contrary to the popular belief that the Chinese logographic watermarks impact the "carton" class only, our analysis reveals that a variety of ImageNet classes, such as "monitor", "broom", "apron" and "safe" rely on spurious correlations. Finally, we propose a simple approach to mitigate this issue in fine-tuned networks by ignoring the encodings from the feature-extractor layer of ImageNet pre-trained networks that are most susceptible to watermark imprints.
    Computable Phenotypes to Characterize Changing Patient Brain Dysfunction in the Intensive Care Unit. (arXiv:2303.05504v1 [q-bio.QM])
    In the United States, more than 5 million patients are admitted annually to ICUs, with ICU mortality of 10%-29% and costs over $82 billion. Acute brain dysfunction status, delirium, is often underdiagnosed or undervalued. This study's objective was to develop automated computable phenotypes for acute brain dysfunction states and describe transitions among brain dysfunction states to illustrate the clinical trajectories of ICU patients. We created two single-center, longitudinal EHR datasets for 48,817 adult patients admitted to an ICU at UFH Gainesville (GNV) and Jacksonville (JAX). We developed algorithms to quantify acute brain dysfunction status including coma, delirium, normal, or death at 12-hour intervals of each ICU admission and to identify acute brain dysfunction phenotypes using continuous acute brain dysfunction status and k-means clustering approach. There were 49,770 admissions for 37,835 patients in UFH GNV dataset and 18,472 admissions for 10,982 patients in UFH JAX dataset. In total, 18% of patients had coma as the worst brain dysfunction status; every 12 hours, around 4%-7% would transit to delirium, 22%-25% would recover, 3%-4% would expire, and 67%-68% would remain in a coma in the ICU. Additionally, 7% of patients had delirium as the worst brain dysfunction status; around 6%-7% would transit to coma, 40%-42% would be no delirium, 1% would expire, and 51%-52% would remain delirium in the ICU. There were three phenotypes: persistent coma/delirium, persistently normal, and transition from coma/delirium to normal almost exclusively in first 48 hours after ICU admission. We developed phenotyping scoring algorithms that determined acute brain dysfunction status every 12 hours while admitted to the ICU. This approach may be useful in developing prognostic and decision-support tools to aid patients and clinicians in decision-making on resource use and escalation of care.
    Greener yet Powerful: Taming Large Code Generation Models with Quantization. (arXiv:2303.05378v1 [cs.LG])
    ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.
    A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning enabled Control Systems. (arXiv:2303.05394v1 [eess.SY])
    Signal Temporal Logic (STL) has become a popular tool for expressing formal requirements of Cyber-Physical Systems (CPS). The problem of verifying STL properties of neural network-controlled CPS remains a largely unexplored problem. In this paper, we present a model for the verification of Neural Network (NN) controllers for general STL specifications using a custom neural architecture where we map an STL formula into a feed-forward neural network with ReLU activation. In the case where both our plant model and the controller are ReLU-activated neural networks, we reduce the STL verification problem to reachability in ReLU neural networks. We also propose a new approach for neural network controllers with general activation functions; this approach is a sound and complete verification approach based on computing the Lipschitz constant of the closed-loop control system. We demonstrate the practical efficacy of our techniques on a number of examples of learning-enabled control systems.
    Making a Computational Attorney. (arXiv:2303.05383v1 [cs.CL])
    This "blue sky idea" paper outlines the opportunities and challenges in data mining and machine learning involving making a computational attorney -- an intelligent software agent capable of helping human lawyers with a wide range of complex high-level legal tasks such as drafting legal briefs for the prosecution or defense in court. In particular, we discuss what a ChatGPT-like Large Legal Language Model (L$^3$M) can and cannot do today, which will inspire researchers with promising short-term and long-term research objectives.
    Quantum Splines for Non-Linear Approximations. (arXiv:2303.05428v1 [quant-ph])
    Quantum Computing offers a new paradigm for efficient computing and many AI applications could benefit from its potential boost in performance. However, the main limitation is the constraint to linear operations that hampers the representation of complex relationships in data. In this work, we propose an efficient implementation of quantum splines for non-linear approximation. In particular, we first discuss possible parametrisations, and select the most convenient for exploiting the HHL algorithm to obtain the estimates of spline coefficients. Then, we investigate QSpline performance as an evaluation routine for some of the most popular activation functions adopted in ML. Finally, a detailed comparison with classical alternatives to the HHL is also presented.
    Automatically Summarizing Evidence from Clinical Trials: A Prototype Highlighting Current Challenges. (arXiv:2303.05392v1 [cs.CL])
    We present TrialsSummarizer, a system that aims to automatically summarize evidence presented in the set of randomized controlled trials most relevant to a given query. Building on prior work, the system retrieves trial publications matching a query specifying a combination of condition, intervention(s), and outcome(s), and ranks these according to sample size and estimated study quality. The top-k such studies are passed through a neural multi-document summarization system, yielding a synopsis of these trials. We consider two architectures: A standard sequence-to-sequence model based on BART and a multi-headed architecture intended to provide greater transparency to end-users. Both models produce fluent and relevant summaries of evidence retrieved for queries, but their tendency to introduce unsupported statements render them inappropriate for use in this domain at present. The proposed architecture may help users verify outputs allowing users to trace generated tokens back to inputs.
    Learning Rational Subgoals from Demonstrations and Instructions. (arXiv:2303.05487v1 [cs.AI])
    We present a framework for learning useful subgoals that support efficient long-term planning to achieve novel goals. At the core of our framework is a collection of rational subgoals (RSGs), which are essentially binary classifiers over the environmental states. RSGs can be learned from weakly-annotated data, in the form of unsegmented demonstration trajectories, paired with abstract task descriptions, which are composed of terms initially unknown to the agent (e.g., collect-wood then craft-boat then go-across-river). Our framework also discovers dependencies between RSGs, e.g., the task collect-wood is a helpful subgoal for the task craft-boat. Given a goal description, the learned subgoals and the derived dependencies facilitate off-the-shelf planning algorithms, such as A* and RRT, by setting helpful subgoals as waypoints to the planner, which significantly improves performance-time efficiency.
    Beware of Instantaneous Dependence in Reinforcement Learning. (arXiv:2303.05458v1 [cs.LG])
    Playing an important role in Model-Based Reinforcement Learning (MBRL), environment models aim to predict future states based on the past. Existing works usually ignore instantaneous dependence in the state, that is, assuming that the future state variables are conditionally independent given the past states. However, instantaneous dependence is prevalent in many RL environments. For instance, in the stock market, instantaneous dependence can exist between two stocks because the fluctuation of one stock can quickly affect the other and the resolution of price change is lower than that of the effect. In this paper, we prove that with few exceptions, ignoring instantaneous dependence can result in suboptimal policy learning in MBRL. To address the suboptimality problem, we propose a simple plug-and-play method to enable existing MBRL algorithms to take instantaneous dependence into account. Through experiments on two benchmarks, we (1) confirm the existence of instantaneous dependence with visualization; (2) validate our theoretical findings that ignoring instantaneous dependence leads to suboptimal policy; (3) verify that our method effectively enables reinforcement learning with instantaneous dependence and improves policy performance.
    Communication-Efficient Collaborative Heterogeneous Bandits in Networks. (arXiv:2303.05445v1 [cs.LG])
    The multi-agent multi-armed bandit problem has been studied extensively due to its ubiquity in many real-life applications, such as online recommendation systems and wireless networking. We consider the setting where agents should minimize their group regret while collaborating over a given graph via some communication protocol and where each agent is given a different set of arms. Previous literature on this problem only considered one of the two desired features separately: agents with the same arm set communicate over a general graph, or agents with different arm sets communicate over a fully connected graph. In this work, we introduce a more general problem setting that encompasses all the desired features. For this novel setting, we first provide a rigorous regret analysis for the standard flooding protocol combined with the UCB policy. Then, to mitigate the issue of high communication costs incurred by flooding, we propose a new protocol called Flooding with Absorption (FWA). We provide a theoretical analysis of the regret bound and intuitions on the advantages of using FWA over flooding. Lastly, we verify empirically that using FWA leads to significantly lower communication costs despite minimal regret performance loss compared to flooding.
    Data-dependent Generalization Bounds via Variable-Size Compressibility. (arXiv:2303.05369v1 [stat.ML])
    In this paper, we establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework that we introduce newly here. In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data. This is shown to yield bounds that depend on the empirical measure of the given input data at hand, rather than its unknown distribution. Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds. Moreover, it is shown that our framework also allows to derive general bounds on any function of the input data and output hypothesis random variables. In particular, these general bounds are shown to subsume and possibly improve over several existing PAC-Bayes and data-dependent intrinsic dimension-based bounds that are recovered as special cases, thus unveiling a unifying character of our approach. For instance, a new data-dependent intrinsic dimension based bounds is established, which connects the generalization error to the optimization trajectories and reveals various interesting connections with rate-distortion dimension of process, R\'enyi information dimension of process, and metric mean dimension.
    Disambiguation of Company names via Deep Recurrent Networks. (arXiv:2303.05391v1 [cs.CL])
    Name Entity Disambiguation is the Natural Language Processing task of identifying textual records corresponding to the same Named Entity, i.e. real-world entities represented as a list of attributes (names, places, organisations, etc.). In this work, we face the task of disambiguating companies on the basis of their written names. We propose a Siamese LSTM Network approach to extract -- via supervised learning -- an embedding of company name strings in a (relatively) low dimensional vector space and use this representation to identify pairs of company names that actually represent the same company (i.e. the same Entity). Given that the manual labelling of string pairs is a rather onerous task, we analyse how an Active Learning approach to prioritise the samples to be labelled leads to a more efficient overall learning pipeline. With empirical investigations, we show that our proposed Siamese Network outperforms several benchmark approaches based on standard string matching algorithms when enough labelled data are available. Moreover, we show that Active Learning prioritisation is indeed helpful when labelling resources are limited, and let the learning models reach the out-of-sample performance saturation with less labelled data with respect to standard (random) data labelling approaches.
    Fast kernel methods for Data Quality Monitoring as a goodness-of-fit test. (arXiv:2303.05413v1 [hep-ex])
    We here propose a machine learning approach for monitoring particle detectors in real-time. The goal is to assess the compatibility of incoming experimental data with a reference dataset, characterising the data behaviour under normal circumstances, via a likelihood-ratio hypothesis test. The model is based on a modern implementation of kernel methods, nonparametric algorithms that can learn any continuous function given enough data. The resulting approach is efficient and agnostic to the type of anomaly that may be present in the data. Our study demonstrates the effectiveness of this strategy on multivariate data from drift tube chamber muon detectors.
    Automatic Detection of Industry Sectors in Legal Articles Using Machine Learning Approaches. (arXiv:2303.05387v1 [cs.CL])
    The ability to automatically identify industry sector coverage in articles on legal developments, or any kind of news articles for that matter, can bring plentiful of benefits both to the readers and the content creators themselves. By having articles tagged based on industry coverage, readers from all around the world would be able to get to legal news that are specific to their region and professional industry. Simultaneously, writers would benefit from understanding which industries potentially lack coverage or which industries readers are currently mostly interested in and thus, they would focus their writing efforts towards more inclusive and relevant legal news coverage. In this paper, a Machine Learning-powered industry analysis approach which combined Natural Language Processing (NLP) with Statistical and Machine Learning (ML) techniques was investigated. A dataset consisting of over 1,700 annotated legal articles was created for the identification of six industry sectors. Text and legal based features were extracted from the text. Both traditional ML methods (e.g. gradient boosting machine algorithms, and decision-tree based algorithms) and deep neural network (e.g. transformer models) were applied for performance comparison of predictive models. The system achieved promising results with area under the receiver operating characteristic curve scores above 0.90 and F-scores above 0.81 with respect to the six industry sectors. The experimental results show that the suggested automated industry analysis which employs ML techniques allows the processing of large collections of text data in an easy, efficient, and scalable way. Traditional ML methods perform better than deep neural networks when only a small and domain-specific training data is available for the study.
    Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning. (arXiv:2303.05479v1 [cs.LG])
    A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization from existing datasets, which allows efficient fine-tuning with limited amounts of active online interaction. However, several existing offline RL methods tend to exhibit poor online fine-tuning performance. On the other hand, online RL methods can learn effectively through online interaction, but struggle to incorporate offline data, which can make them very slow in settings where exploration is challenging or pre-training is necessary. In this paper, we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities. Our approach, calibrated Q-learning (Cal-QL) accomplishes this by learning a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, in the sense that the learned Q-values are at a reasonable scale. We refer to this property as calibration, and define it formally as providing a lower bound on the true value function of the learned policy and an upper bound on the value of some other (suboptimal) reference policy, which may simply be the behavior policy. We show that offline RL algorithms that learn such calibrated value functions lead to effective online fine-tuning, enabling us to take the benefits of offline initializations in online fine-tuning. In practice, Cal-QL can be implemented on top of existing conservative methods for offline RL within a one-line code change. Empirically, Cal-QL outperforms state-of-the-art methods on 10/11 fine-tuning benchmark tasks that we study in this paper.
    Aux-Drop: Handling Haphazard Inputs in Online Learning Using Auxiliary Dropouts. (arXiv:2303.05155v1 [cs.LG])
    Many real-world applications based on online learning produce streaming data that is haphazard in nature, i.e., contains missing features, features becoming obsolete in time, the appearance of new features at later points in time and a lack of clarity on the total number of input features. These challenges make it hard to build a learnable system for such applications, and almost no work exists in deep learning that addresses this issue. In this paper, we present Aux-Drop, an auxiliary dropout regularization strategy for online learning that handles the haphazard input features in an effective manner. Aux-Drop adapts the conventional dropout regularization scheme for the haphazard input feature space ensuring that the final output is minimally impacted by the chaotic appearance of such features. It helps to prevent the co-adaptation of especially the auxiliary and base features, as well as reduces the strong dependence of the output on any of the auxiliary inputs of the model. This helps in better learning for scenarios where certain features disappear in time or when new features are to be modeled. The efficacy of Aux-Drop has been demonstrated through extensive numerical experiments on SOTA benchmarking datasets that include Italy Power Demand, HIGGS, SUSY and multiple UCI datasets.
    disco: a toolkit for Distributional Control of Generative Models. (arXiv:2303.05431v1 [cs.CL])
    Pre-trained language models and other generative models have revolutionized NLP and beyond. However, these models tend to reproduce undesirable biases present in their training data. Also, they may overlook patterns that are important but challenging to capture. To address these limitations, researchers have introduced distributional control techniques. These techniques, not limited to language, allow controlling the prevalence (i.e., expectations) of any features of interest in the model's outputs. Despite their potential, the widespread adoption of these techniques has been hindered by the difficulty in adapting complex, disconnected code. Here, we present disco, an open-source Python library that brings these techniques to the broader public.
    Bounding the Probabilities of Benefit and Harm Through Sensitivity Parameters and Proxies. (arXiv:2303.05396v1 [stat.ME])
    We present two methods for bounding the probabilities of benefit and harm under unmeasured confounding. The first method computes the (upper or lower) bound of either probability as a function of the observed data distribution and two intuitive sensitivity parameters which, then, can be presented to the analyst as a 2-D plot to assist her in decision making. The second method assumes the existence of a measured nondifferential proxy (i.e., direct effect) of the unmeasured confounder. Using this proxy, tighter bounds than the existing ones can be derived from just the observed data distribution.
    Hair and Scalp Disease Detection using Machine Learning and Image Processing. (arXiv:2301.00122v2 [cs.CV] UPDATED)
    Almost 80 million Americans suffer from hair loss due to aging, stress, medication, or genetic makeup. Hair and scalp-related diseases often go unnoticed in the beginning. Sometimes, a patient cannot differentiate between hair loss and regular hair fall. Diagnosing hair-related diseases is time-consuming as it requires professional dermatologists to perform visual and medical tests. Because of that, the overall diagnosis gets delayed, which worsens the severity of the illness. Due to the image-processing ability, neural network-based applications are used in various sectors, especially healthcare and health informatics, to predict deadly diseases like cancers and tumors. These applications assist clinicians and patients and provide an initial insight into early-stage symptoms. In this study, we used a deep learning approach that successfully predicts three main types of hair loss and scalp-related diseases: alopecia, psoriasis, and folliculitis. However, limited study in this area, unavailability of a proper dataset, and degree of variety among the images scattered over the internet made the task challenging. 150 images were obtained from various sources and then preprocessed by denoising, image equalization, enhancement, and data balancing, thereby minimizing the error rate. After feeding the processed data into the 2D convolutional neural network (CNN) model, we obtained overall training accuracy of 96.2%, with a validation accuracy of 91.1%. The precision and recall score of alopecia, psoriasis, and folliculitis are 0.895, 0.846, and 1.0, respectively. We also created a dataset of the scalp images for future prospective researchers.
    Penalized Deep Partially Linear Cox Models with Application to CT Scans of Lung Cancer Patients. (arXiv:2303.05341v1 [stat.ML])
    Lung cancer is a leading cause of cancer mortality globally, highlighting the importance of understanding its mortality risks to design effective patient-centered therapies. The National Lung Screening Trial (NLST) was a nationwide study aimed at investigating risk factors for lung cancer. The study employed computed tomography texture analysis (CTTA), which provides objective measurements of texture patterns on CT scans, to quantify the mortality risks of lung cancer patients. Partially linear Cox models are becoming a popular tool for modeling survival outcomes, as they effectively handle both established risk factors (such as age and other clinical factors) and new risk factors (such as image features) in a single framework. The challenge in identifying the texture features that impact cancer survival is due to their sensitivity to factors such as scanner type, segmentation, and organ motion. To overcome this challenge, we propose a novel Penalized Deep Partially Linear Cox Model (Penalized DPLC), which incorporates the SCAD penalty to select significant texture features and employs a deep neural network to estimate the nonparametric component of the model accurately. We prove the convergence and asymptotic properties of the estimator and compare it to other methods through extensive simulation studies, evaluating its performance in risk prediction and feature selection. The proposed method is applied to the NLST study dataset to uncover the effects of key clinical and imaging risk factors on patients' survival. Our findings provide valuable insights into the relationship between these factors and survival outcomes.
    SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model. (arXiv:2303.05118v1 [cs.CV])
    The goal of continual learning is to improve the performance of recognition models in learning sequentially arrived data. Although most existing works are established on the premise of learning from scratch, growing efforts have been devoted to incorporating the benefits of pre-training. However, how to adaptively exploit the pre-trained knowledge for each incremental task while maintaining its generalizability remains an open question. In this work, we present an extensive analysis for continual learning on a pre-trained model (CLPM), and attribute the key challenge to a progressive overfitting problem. Observing that selectively reducing the learning rate can almost resolve this issue in the representation layer, we propose a simple but extremely effective approach named Slow Learner with Classifier Alignment (SLCA), which further improves the classification layer by modeling the class-wise distributions and aligning the classification layers in a post-hoc fashion. Across a variety of scenarios, our proposal provides substantial improvements for CLPM (e.g., up to 49.76%, 50.05%, 44.69% and 40.16% on Split CIFAR-100, Split ImageNet-R, Split CUB-200 and Split Cars-196, respectively), and thus outperforms state-of-the-art approaches by a large margin. Based on such a strong baseline, critical factors and promising directions are analyzed in-depth to facilitate subsequent research.
    Conceptual Reinforcement Learning for Language-Conditioned Tasks. (arXiv:2303.05069v1 [cs.LG])
    Despite the broad application of deep reinforcement learning (RL), transferring and adapting the policy to unseen but similar environments is still a significant challenge. Recently, the language-conditioned policy is proposed to facilitate policy transfer through learning the joint representation of observation and text that catches the compact and invariant information across environments. Existing studies of language-conditioned RL methods often learn the joint representation as a simple latent layer for the given instances (episode-specific observation and text), which inevitably includes noisy or irrelevant information and cause spurious correlations that are dependent on instances, thus hurting generalization performance and training efficiency. To address this issue, we propose a conceptual reinforcement learning (CRL) framework to learn the concept-like joint representation for language-conditioned policy. The key insight is that concepts are compact and invariant representations in human cognition through extracting similarities from numerous instances in real-world. In CRL, we propose a multi-level attention encoder and two mutual information constraints for learning compact and invariant concepts. Verified in two challenging environments, RTFM and Messenger, CRL significantly improves the training efficiency (up to 70%) and generalization ability (up to 30%) to the new environment dynamics.
    Fast post-process Bayesian inference with Sparse Variational Bayesian Monte Carlo. (arXiv:2303.05263v1 [stat.ML])
    We introduce Sparse Variational Bayesian Monte Carlo (SVBMC), a method for fast "post-process" Bayesian inference for models with black-box and potentially noisy likelihoods. SVBMC reuses all existing target density evaluations -- for example, from previous optimizations or partial Markov Chain Monte Carlo runs -- to build a sparse Gaussian process (GP) surrogate model of the log posterior density. Uncertain regions of the surrogate are then refined via active learning as needed. Our work builds on the Variational Bayesian Monte Carlo (VBMC) framework for sample-efficient inference, with several novel contributions. First, we make VBMC scalable to a large number of pre-existing evaluations via sparse GP regression, deriving novel Bayesian quadrature formulae and acquisition functions for active learning with sparse GPs. Second, we introduce noise shaping, a general technique to induce the sparse GP approximation to focus on high posterior density regions. Third, we prove theoretical results in support of the SVBMC refinement procedure. We validate our method on a variety of challenging synthetic scenarios and real-world applications. We find that SVBMC consistently builds good posterior approximations by post-processing of existing model evaluations from different sources, often requiring only a small number of additional density evaluations.
    Prevalence and major risk factors of non-communicable diseases: A Hospital-based Cross-Sectional Study in Dhaka, Bangladesh. (arXiv:2303.04808v1 [q-bio.QM])
    Objective: The study aimed to determine the prevalence of several non-communicable diseases (NCD) and analyze risk factors among adult patients seeking nutritional guidance in Dhaka, Bangladesh. Result: Our study observed the relationships between gender, age groups, obesity, and NCDs (DM, CKD, IBS, CVD, CRD, thyroid). The most frequently reported NCD was cardiovascular issues (CVD), which was present in 83.56% of all participants. CVD was more common in male participants. Consequently, male participants had a higher blood pressure distribution than females. Diabetes mellitus (DM), on the other hand, did not have a gender-based inclination. Both CVD and DM had an age-based progression. Our study showed that chronic respiratory illness was more frequent in middle-aged participants than in younger or elderly individuals. Based on the data, every one in five hospitalized patients was obese. We analyzed the co-morbidities and found that 31.5% of the population has only one NCD, 30.1% has two NCDs, and 38.3% has more than two NCDs. Besides, 86.25% of all diabetic patients had cardiovascular issues. All thyroid patients in our study had CVD. Using a t-test, we found a relationship between CKD and thyroid (p-value 0.061). Males under 35 years have a statistically significant relationship between thyroid and chronic respiratory diseases (p-value 0.018). We also found an association between DM and CKD among patients over 65 (p-value 0.038). Moreover, there has been a statistically significant relationship between CKD and Thyroid (P < 0.05) for those below 35 and 35-65. We used a two-way ANOVA test to find the statistically significant interaction of heart issues and chronic respiratory illness, in combination with diabetes. The combination of DM and RTI also affected CKD in male patients over 65 years old.
    SSL^2: Self-Supervised Learning meets Semi-Supervised Learning: Multiple Sclerosis Segmentation in 7T-MRI from large-scale 3T-MRI. (arXiv:2303.05026v1 [cs.CV])
    Automated segmentation of multiple sclerosis (MS) lesions from MRI scans is important to quantify disease progression. In recent years, convolutional neural networks (CNNs) have shown top performance for this task when a large amount of labeled data is available. However, the accuracy of CNNs suffers when dealing with few and/or sparsely labeled datasets. A potential solution is to leverage the information available in large public datasets in conjunction with a target dataset which only has limited labeled data. In this paper, we propose a training framework, SSL2 (self-supervised-semi-supervised), for multi-modality MS lesion segmentation with limited supervision. We adopt self-supervised learning to leverage the knowledge from large public 3T datasets to tackle the limitations of a small 7T target dataset. To leverage the information from unlabeled 7T data, we also evaluate state-of-the-art semi-supervised methods for other limited annotation settings, such as small labeled training size and sparse annotations. We use the shifted-window (Swin) transformer1 as our backbone network. The effectiveness of self-supervised and semi-supervised training strategies is evaluated in our in-house 7T MRI dataset. The results indicate that each strategy improves lesion segmentation for both limited training data size and for sparse labeling scenarios. The combined overall framework further improves the performance substantially compared to either of its components alone. Our proposed framework thus provides a promising solution for future data/label-hungry 7T MS studies.
    In search of the most efficient and memory-saving visualization of high dimensional data. (arXiv:2303.05455v1 [cs.LG])
    Interactive exploration of large, multidimensional datasets plays a very important role in various scientific fields. It makes it possible not only to identify important structural features and forms, such as clusters of vertices and their connection patterns, but also to evaluate their interrelationships in terms of position, distance, shape and connection density. We argue that the visualization of multidimensional data is well approximated by the problem of two-dimensional embedding of undirected nearest-neighbor graphs. The size of complex networks is a major challenge for today's computer systems and still requires more efficient data embedding algorithms. Existing reduction methods are too slow and do not allow interactive manipulation. We show that high-quality embeddings are produced with minimal time and memory complexity. We present very efficient IVHD algorithms (CPU and GPU) and compare them with the latest and most popular dimensionality reduction methods. We show that the memory and time requirements are dramatically lower than for base codes. At the cost of a slight degradation in embedding quality, IVHD preserves the main structural properties of the data well with a much lower time budget. We also present a meta-algorithm that allows the use of any unsupervised data embedding method in a supervised manner.
    GOATS: Goal Sampling Adaptation for Scooping with Curriculum Reinforcement Learning. (arXiv:2303.05193v1 [cs.RO])
    In this work, we first formulate the problem of goal-conditioned robotic water scooping with reinforcement learning. This task is challenging due to the complex dynamics of fluid and multi-modal goal-reaching. The policy is required to achieve both position goals and water amount goals, which leads to a large convoluted goal state space. To address these challenges, we introduce Goal Sampling Adaptation for Scooping (GOATS), a curriculum reinforcement learning method that can learn an effective and generalizable policy for robot scooping tasks. Specifically, we use a goal-factorized reward formulation and interpolate position goal distributions and amount goal distributions to create curriculum through the learning process. As a result, our proposed method can outperform the baselines in simulation and achieves 5.46% and 8.71% amount errors on bowl scooping and bucket scooping tasks, respectively, under 1000 variations of initial water states in the tank and a large goal state space. Besides being effective in simulation environments, our method can efficiently generalize to noisy real-robot water-scooping scenarios with different physical configurations and unseen settings, demonstrating superior efficacy and generalizability. The videos of this work are available on our project page: https://sites.google.com/view/goatscooping.
    Dynamic Stashing Quantization for Efficient Transformer Training. (arXiv:2303.05295v1 [cs.LG])
    Large Language Models (LLMs) have demonstrated impressive performance on a range of Natural Language Processing (NLP) tasks. Unfortunately, the immense amount of computations and memory accesses required for LLM training makes them prohibitively expensive in terms of hardware cost, and thus challenging to deploy in use cases such as on-device learning. In this paper, motivated by the observation that LLM training is memory-bound, we propose a novel dynamic quantization strategy, termed Dynamic Stashing Quantization (DSQ), that puts a special focus on reducing the memory operations, but also enjoys the other benefits of low precision training, such as the reduced arithmetic cost. We conduct a thorough study on two translation tasks (trained-from-scratch) and three classification tasks (fine-tuning). DSQ reduces the amount of arithmetic operations by $20.95\times$ and the number of DRAM operations by $2.55\times$ on IWSLT17 compared to the standard 16-bit fixed-point, which is widely used in on-device learning.
    StyleDiff: Attribute Comparison Between Unlabeled Datasets in Latent Disentangled Space. (arXiv:2303.05102v1 [stat.ML])
    One major challenge in machine learning applications is coping with mismatches between the datasets used in the development and those obtained in real-world applications. These mismatches may lead to inaccurate predictions and errors, resulting in poor product quality and unreliable systems. In this study, we propose StyleDiff to inform developers of the differences between the two datasets for the steady development of machine learning systems. Using disentangled image spaces obtained from recently proposed generative models, StyleDiff compares the two datasets by focusing on attributes in the images and provides an easy-to-understand analysis of the differences between the datasets. The proposed StyleDiff performs in $O (d N\log N)$, where $N$ is the size of the datasets and $d$ is the number of attributes, enabling the application to large datasets. We demonstrate that StyleDiff accurately detects differences between datasets and presents them in an understandable format using, for example, driving scenes datasets.
    Efficient Certified Training and Robustness Verification of Neural ODEs. (arXiv:2303.05246v1 [cs.LG])
    Neural Ordinary Differential Equations (NODEs) are a novel neural architecture, built around initial value problems with learned dynamics which are solved during inference. Thought to be inherently more robust against adversarial perturbations, they were recently shown to be vulnerable to strong adversarial attacks, highlighting the need for formal guarantees. However, despite significant progress in robustness verification for standard feed-forward architectures, the verification of high dimensional NODEs remains an open problem. In this work, we address this challenge and propose GAINS, an analysis framework for NODEs combining three key ideas: (i) a novel class of ODE solvers, based on variable but discrete time steps, (ii) an efficient graph representation of solver trajectories, and (iii) a novel abstraction algorithm operating on this graph representation. Together, these advances enable the efficient analysis and certified training of high-dimensional NODEs, by reducing the runtime from an intractable $O(\exp(d)+\exp(T))$ to ${O}(d+T^2 \log^2T)$ in the dimensionality $d$ and integration time $T$. In an extensive evaluation on computer vision (MNIST and FMNIST) and time-series forecasting (PHYSIO-NET) problems, we demonstrate the effectiveness of both our certified training and verification methods.
    ChatGPT is on the horizon: Could a large language model be all we need for Intelligent Transportation?. (arXiv:2303.05382v1 [cs.CL])
    ChatGPT, developed by OpenAI, is one of the largest Large Language Models (LLM) with over 175 billion parameters. ChatGPT has demonstrated the impressive capabilities of LLM, particularly in the field of natural language processing (NLP). With the emergence of the discussion and application of LLM in various research or engineering domains, it is time to envision how LLM may revolutionize the way we approach intelligent transportation systems. This paper explores the future applications of LLM in addressing key transportation problems. By leveraging LLM and a cross-modal encoder, an intelligent system can handle traffic data from various modalities and execute transportation operations through a single LLM. NLP, combined with cross-modal processing, is investigated with its potential applications in transportation. To demonstrate this potential, a smartphone-based crash report auto-generation and analysis framework is presented as a use case. Despite the potential benefits, challenges related to data privacy, data quality, and model bias must be considered. Overall, the use of LLM in intelligent transport systems holds promise for more efficient, intelligent, and sustainable transportation systems that improve the lives of people around the world.
    German BERT Model for Legal Named Entity Recognition. (arXiv:2303.05388v1 [cs.CL])
    The use of BERT, one of the most popular language models, has led to improvements in many Natural Language Processing (NLP) tasks. One such task is Named Entity Recognition (NER) i.e. automatic identification of named entities such as location, person, organization, etc. from a given text. It is also an important base step for many NLP tasks such as information extraction and argumentation mining. Even though there is much research done on NER using BERT and other popular language models, the same is not explored in detail when it comes to Legal NLP or Legal Tech. Legal NLP applies various NLP techniques such as sentence similarity or NER specifically on legal data. There are only a handful of models for NER tasks using BERT language models, however, none of these are aimed at legal documents in German. In this paper, we fine-tune a popular BERT language model trained on German data (German BERT) on a Legal Entity Recognition (LER) dataset. To make sure our model is not overfitting, we performed a stratified 10-fold cross-validation. The results we achieve by fine-tuning German BERT on the LER dataset outperform the BiLSTM-CRF+ model used by the authors of the same LER dataset. Finally, we make the model openly available via HuggingFace.
    Taming Contrast Maximization for Learning Sequential, Low-latency, Event-based Optical Flow. (arXiv:2303.05214v1 [cs.CV])
    Event cameras have recently gained significant traction since they open up new avenues for low-latency and low-power solutions to complex computer vision problems. To unlock these solutions, it is necessary to develop algorithms that can leverage the unique nature of event data. However, the current state-of-the-art is still highly influenced by the frame-based literature, and usually fails to deliver on these promises. In this work, we take this into consideration and propose a novel self-supervised learning pipeline for the sequential estimation of event-based optical flow that allows for the scaling of the models to high inference frequencies. At its core, we have a continuously-running stateful neural model that is trained using a novel formulation of contrast maximization that makes it robust to nonlinearities and varying statistics in the input events. Results across multiple datasets confirm the effectiveness of our method, which establishes a new state of the art in terms of accuracy for approaches trained or optimized without ground truth.
    Invertible Kernel PCA with Random Fourier Features. (arXiv:2303.05043v1 [cs.LG])
    Kernel principal component analysis (kPCA) is a widely studied method to construct a low-dimensional data representation after a nonlinear transformation. The prevailing method to reconstruct the original input signal from kPCA -- an important task for denoising -- requires us to solve a supervised learning problem. In this paper, we present an alternative method where the reconstruction follows naturally from the compression step. We first approximate the kernel with random Fourier features. Then, we exploit the fact that the nonlinear transformation is invertible in a certain subdomain. Hence, the name \emph{invertible kernel PCA (ikPCA)}. We experiment with different data modalities and show that ikPCA performs similarly to kPCA with supervised reconstruction on denoising tasks, making it a strong alternative.
    Mastering Strategy Card Game (Hearthstone) with Improved Techniques. (arXiv:2303.05197v1 [cs.LG])
    Strategy card game is a well-known genre that is demanding on the intelligent game-play and can be an ideal test-bench for AI. Previous work combines an end-to-end policy function and an optimistic smooth fictitious play, which shows promising performances on the strategy card game Legend of Code and Magic. In this work, we apply such algorithms to Hearthstone, a famous commercial game that is more complicated in game rules and mechanisms. We further propose several improved techniques and consequently achieve significant progress. For a machine-vs-human test we invite a Hearthstone streamer whose best rank was top 10 of the official league in China region that is estimated to be of millions of players. Our models defeat the human player in all Best-of-5 tournaments of full games (including both deck building and battle), showing a strong capability of decision making.
    Euler Characteristic Transform Based Topological Loss for Reconstructing 3D Images from Single 2D Slices. (arXiv:2303.05286v1 [cs.LG])
    The computer vision task of reconstructing 3D images, i.e., shapes, from their single 2D image slices is extremely challenging, more so in the regime of limited data. Deep learning models typically optimize geometric loss functions, which may lead to poor reconstructions as they ignore the structural properties of the shape. To tackle this, we propose a novel topological loss function based on the Euler Characteristic Transform. This loss can be used as an inductive bias to aid the optimization of any neural network toward better reconstructions in the regime of limited data. We show the effectiveness of the proposed loss function by incorporating it into SHAPR, a state-of-the-art shape reconstruction model, and test it on two benchmark datasets, viz., Red Blood Cells and Nuclei datasets. We also show a favourable property, namely injectivity and discuss the stability of the topological loss function based on the Euler Characteristic Transform.
    The joint node degree distribution in the Erd\H{o}s-R\'enyi network. (arXiv:2303.05138v1 [stat.ML])
    The Erd\H{o}s-R\'enyi random graph is the simplest model for node degree distribution, and it is one of the most widely studied. In this model, pairs of $n$ vertices are selected and connected uniformly at random with probability $p$, consequently, the degrees for a given vertex follow the binomial distribution. If the number of vertices is large, the binomial can be approximated by Normal using the Central Limit Theorem, which is often allowed when $\min (np, n(1-p)) > 5$. This is true for every node independently. However, due to the fact that the degrees of nodes in a graph are not independent, we aim in this paper to test whether the degrees of per node collectively in the Erd\H{o}s-R\'enyi graph have a multivariate normal distribution MVN. A chi square goodness of fit test for the hypothesis that binomial is a distribution for the whole set of nodes is rejected because of the dependence between degrees. Before testing MVN we show that the covariance and correlation between the degrees of any pair of nodes in the graph are $p(1-p)$ and $1/(n-1)$, respectively. We test MVN considering two assumptions: independent and dependent degrees, and we obtain our results based on the percentages of rejected statistics of chi square, the $p$-values of Anderson Darling test, and a CDF comparison. We always achieve a good fit of multivariate normal distribution with large values of $n$ and $p$, and very poor fit when $n$ or $p$ are very small. The approximation seems valid when $np \geq 10$. We also compare the maximum likelihood estimate of $p$ in MVN distribution where we assume independence and dependence. The estimators are assessed using bias, variance and mean square error.
    Provable Data Subset Selection For Efficient Neural Network Training. (arXiv:2303.05151v1 [cs.LG])
    Radial basis function neural networks (\emph{RBFNN}) are {well-known} for their capability to approximate any continuous function on a closed bounded set with arbitrary precision given enough hidden neurons. In this paper, we introduce the first algorithm to construct coresets for \emph{RBFNNs}, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network and thus approximate any function defined by an \emph{RBFNN} on the larger input data. In particular, we construct coresets for radial basis and Laplacian loss functions. We then use our coresets to obtain a provable data subset selection algorithm for training deep neural networks. Since our coresets approximate every function, they also approximate the gradient of each weight in a neural network, which is a particular function on the input. We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets, demonstrating the efficacy and accuracy of our coreset construction.
    A Framework for History-Aware Hyperparameter Optimisation in Reinforcement Learning. (arXiv:2303.05186v1 [cs.LG])
    A Reinforcement Learning (RL) system depends on a set of initial conditions (hyperparameters) that affect the system's performance. However, defining a good choice of hyperparameters is a challenging problem. Hyperparameter tuning often requires manual or automated searches to find optimal values. Nonetheless, a noticeable limitation is the high cost of algorithm evaluation for complex models, making the tuning process computationally expensive and time-consuming. In this paper, we propose a framework based on integrating complex event processing and temporal models, to alleviate these trade-offs. Through this combination, it is possible to gain insights about a running RL system efficiently and unobtrusively based on data stream monitoring and to create abstract representations that allow reasoning about the historical behaviour of the RL system. The obtained knowledge is exploited to provide feedback to the RL system for optimising its hyperparameters while making effective use of parallel resources. We introduce a novel history-aware epsilon-greedy logic for hyperparameter optimisation that instead of using static hyperparameters that are kept fixed for the whole training, adjusts the hyperparameters at runtime based on the analysis of the agent's performance over time windows in a single agent's lifetime. We tested the proposed approach in a 5G mobile communications case study that uses DQN, a variant of RL, for its decision-making. Our experiments demonstrated the effects of hyperparameter tuning using history on training stability and reward values. The encouraging results show that the proposed history-aware framework significantly improved performance compared to traditional hyperparameter tuning approaches.
    Segmentation method for cerebral blood vessels from MRA using hysteresis. (arXiv:2303.05113v1 [eess.IV])
    Segmentation of cerebral blood vessels from Magnetic Resonance Imaging (MRI) is an open problem that could be solved with deep learning (DL). However, annotated data for training is often scarce. Due to the absence of open-source tools, we aim to develop a classical segmentation method that generates vessel ground truth from Magnetic Resonance Angiography for DL training of segmentation across a variety of modalities. The method combines size-specific Hessian filters, hysteresis thresholding and connected component correction. The optimal choice of processing steps was evaluated with a blinded scoring by a clinician using 24 3D images. The results show that all method steps are necessary to produce the highest (14.2/15) vessel segmentation quality score. Omitting the connected component correction caused the largest quality loss. The method, which is available on GitHub, can be used to train DL models for vessel segmentation.
    Gauges and Accelerated Optimization over Smooth and/or Strongly Convex Sets. (arXiv:2303.05037v1 [math.OC])
    We consider feasibility and constrained optimization problems defined over smooth and/or strongly convex sets. These notions mirror their popular function counterparts but are much less explored in the first-order optimization literature. We propose new scalable, projection-free, accelerated first-order methods in these settings. Our methods avoid linear optimization or projection oracles, only using cheap one-dimensional linesearches and normal vector computations. Despite this, we derive optimal accelerated convergence guarantees of $O(1/T)$ for strongly convex problems, $O(1/T^2)$ for smooth problems, and accelerated linear convergence given both. Our algorithms and analysis are based on novel characterizations of the Minkowski gauge of smooth and/or strongly convex sets, which may be of independent interest: although the gauge is neither smooth nor strongly convex, we show the gauge squared inherits any structure present in the set.
    A Study of Variable-Role-based Feature Enrichment in Neural Models of Code. (arXiv:2303.04942v1 [cs.LG])
    Although deep neural models substantially reduce the overhead of feature engineering, the features readily available in the inputs might significantly impact training cost and the performance of the models. In this paper, we explore the impact of an unsuperivsed feature enrichment approach based on variable roles on the performance of neural models of code. The notion of variable roles (as introduced in the works of Sajaniemi et al. [Refs. 1,2]) has been found to help students' abilities in programming. In this paper, we investigate if this notion would improve the performance of neural models of code. To the best of our knowledge, this is the first work to investigate how Sajaniemi et al.'s concept of variable roles can affect neural models of code. In particular, we enrich a source code dataset by adding the role of individual variables in the dataset programs, and thereby conduct a study on the impact of variable role enrichment in training the Code2Seq model. In addition, we shed light on some challenges and opportunities in feature enrichment for neural code intelligence models.
    Memory-adaptive Depth-wise Heterogenous Federated Learning. (arXiv:2303.04887v1 [cs.LG])
    Federated learning is a promising paradigm that allows multiple clients to collaboratively train a model without sharing the local data. However, the presence of heterogeneous devices in federated learning, such as mobile phones and IoT devices with varying memory capabilities, would limit the scale and hence the performance of the model could be trained. The mainstream approaches to address memory limitations focus on width-slimming techniques, where different clients train subnetworks with reduced widths locally and then the server aggregates the subnetworks. The global model produced from these methods suffers from performance degradation due to the negative impact of the actions taken to handle the varying subnetwork widths in the aggregation phase. In this paper, we introduce a memory-adaptive depth-wise learning solution in FL called FeDepth, which adaptively decomposes the full model into blocks according to the memory budgets of each client and trains blocks sequentially to obtain a full inference model. Our method outperforms state-of-the-art approaches, achieving 5% and more than 10% improvements in top-1 accuracy on CIFAR-10 and CIFAR-100, respectively. We also demonstrate the effectiveness of depth-wise fine-tuning on ViT. Our findings highlight the importance of memory-aware techniques for federated learning with heterogeneous devices and the success of depth-wise training strategy in improving the global model's performance.
    Baldur: Whole-Proof Generation and Repair with Large Language Models. (arXiv:2303.04910v1 [cs.LG])
    Formally verifying software properties is a highly desirable but labor-intensive task. Recent work has developed methods to automate formal verification using proof assistants, such as Coq and Isabelle/HOL, e.g., by training a model to predict one proof step at a time, and using that model to search through the space of possible proofs. This paper introduces a new method to automate formal verification: We use large language models, trained on natural language text and code and fine-tuned on proofs, to generate whole proofs for theorems at once, rather than one step at a time. We combine this proof generation model with a fine-tuned repair model to repair generated proofs, further increasing proving power. As its main contributions, this paper demonstrates for the first time that: (1) Whole-proof generation using transformers is possible and is as effective as search-based techniques without requiring costly search. (2) Giving the learned model additional context, such as a prior failed proof attempt and the ensuing error message, results in proof repair and further improves automated proof generation. (3) We establish a new state of the art for fully automated proof synthesis. We reify our method in a prototype, Baldur, and evaluate it on a benchmark of 6,336 Isabelle/HOL theorems and their proofs. In addition to empirically showing the effectiveness of whole-proof generation, repair, and added context, we show that Baldur improves on the state-of-the-art tool, Thor, by automatically generating proofs for an additional 8.7% of the theorems. Together, Baldur and Thor can prove 65.7% of the theorems fully automatically. This paper paves the way for new research into using large language models for automating formal verification.
    Out-of-distribution Detection with Implicit Outlier Transformation. (arXiv:2303.05033v1 [cs.LG])
    Outlier exposure (OE) is powerful in out-of-distribution (OOD) detection, enhancing detection capability via model fine-tuning with surrogate OOD data. However, surrogate data typically deviate from test OOD data. Thus, the performance of OE, when facing unseen OOD data, can be weakened. To address this issue, we propose a novel OE-based approach that makes the model perform well for unseen OOD situations, even for unseen OOD cases. It leads to a min-max learning scheme -- searching to synthesize OOD data that leads to worst judgments and learning from such OOD data for uniform performance in OOD detection. In our realization, these worst OOD data are synthesized by transforming original surrogate ones. Specifically, the associated transform functions are learned implicitly based on our novel insight that model perturbation leads to data transformation. Our methodology offers an efficient way of synthesizing OOD data, which can further benefit the detection model, besides the surrogate OOD data. We conduct extensive experiments under various OOD detection setups, demonstrating the effectiveness of our method against its advanced counterparts.
    Dish-TS: A General Paradigm for Alleviating Distribution Shift in Time Series Forecasting. (arXiv:2302.14829v2 [cs.LG] UPDATED)
    The distribution shift in Time Series Forecasting (TSF), indicating series distribution changes over time, largely hinders the performance of TSF models. Existing works towards distribution shift in time series are mostly limited in the quantification of distribution and, more importantly, overlook the potential shift between lookback and horizon windows. To address above challenges, we systematically summarize the distribution shift in TSF into two categories. Regarding lookback windows as input-space and horizon windows as output-space, there exist (i) intra-space shift, that the distribution within the input-space keeps shifted over time, and (ii) inter-space shift, that the distribution is shifted between input-space and output-space. Then we introduce, Dish-TS, a general neural paradigm for alleviating distribution shift in TSF. Specifically, for better distribution estimation, we propose the coefficient net (CONET), which can be any neural architectures, to map input sequences into learnable distribution coefficients. To relieve intra-space and inter-space shift, we organize Dish-TS as a Dual-CONET framework to separately learn the distribution of input- and output-space, which naturally captures the distribution difference of two spaces. In addition, we introduce a more effective training strategy for intractable CONET learning. Finally, we conduct extensive experiments on several datasets coupled with different state-of-the-art forecasting models. Experimental results show Dish-TS consistently boosts them with a more than 20% average improvement. Code is available.
    Scalable Stochastic Gradient Riemannian Langevin Dynamics in Non-Diagonal Metrics. (arXiv:2303.05101v1 [cs.LG])
    Bayesian neural network inference is often carried out using stochastic gradient sampling methods. For best performance the methods should use a Riemannian metric that improves posterior exploration by accounting for the local curvature, but the existing methods resort to simple diagonal metrics to remain computationally efficient. This loses some of the gains. We propose two non-diagonal metrics that can be used in stochastic samplers to improve convergence and exploration but that have only a minor computational overhead over diagonal metrics. We show that for neural networks with complex posteriors, caused e.g. by use of sparsity-inducing priors, using these metrics provides clear improvements. For some other choices the posterior is sufficiently easy also for the simpler metrics.
    Group Fairness in Non-monotone Submodular Maximization. (arXiv:2302.01546v2 [cs.LG] UPDATED)
    Maximizing a submodular function has a wide range of applications in machine learning and data mining. One such application is data summarization whose goal is to select a small set of representative and diverse data items from a large dataset. However, data items might have sensitive attributes such as race or gender, in this setting, it is important to design \emph{fairness-aware} algorithms to mitigate potential algorithmic bias that may cause over- or under- representation of particular groups. Motivated by that, we propose and study the classic non-monotone submodular maximization problem subject to novel group fairness constraints. Our goal is to select a set of items that maximizes a non-monotone submodular function, while ensuring that the number of selected items from each group is proportionate to its size, to the extent specified by the decision maker. We develop the first constant-factor approximation algorithms for this problem. We also extend the basic model to incorporate an additional global size constraint on the total number of selected items.
    The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types. (arXiv:2208.10687v2 [cs.LG] UPDATED)
    When inferring reward functions from human behavior (be it demonstrations, comparisons, physical corrections, or e-stops), it has proven useful to model the human as making noisy-rational choices, with a "rationality coefficient" capturing how much noise or entropy we expect to see in the human behavior. Prior work typically sets the rationality level to a constant value, regardless of the type, or quality, of human feedback. However, in many settings, giving one type of feedback (e.g. a demonstration) may be much more difficult than a different type of feedback (e.g. answering a comparison query). Thus, we expect to see more or less noise depending on the type of human feedback. In this work, we advocate that grounding the rationality coefficient in real data for each feedback type, rather than assuming a default value, has a significant positive effect on reward learning. We test this in both simulated experiments and in a user study with real human feedback. We find that overestimating human rationality can have dire effects on reward learning accuracy and regret. We also find that fitting the rationality coefficient to human data enables better reward learning, even when the human deviates significantly from the noisy-rational choice model due to systematic biases. Further, we find that the rationality level affects the informativeness of each feedback type: surprisingly, demonstrations are not always the most informative -- when the human acts very suboptimally, comparisons actually become more informative, even when the rationality level is the same for both. Ultimately, our results emphasize the importance and advantage of paying attention to the assumed human-rationality level, especially when agents actively learn from multiple types of human feedback.
    Transfer Entropy Bottleneck: Learning Sequence to Sequence Information Transfer. (arXiv:2211.16607v2 [cs.LG] UPDATED)
    When presented with a data stream of two statistically dependent variables, predicting the future of one of the variables (the target stream) can benefit from information about both its history and the history of the other variable (the source stream). For example, fluctuations in temperature at a weather station can be predicted using both temperatures and barometric readings. However, a challenge when modelling such data is that it is easy for a neural network to rely on the greatest joint correlations within the target stream, which may ignore a crucial but small information transfer from the source to the target stream. As well, there are often situations where the target stream may have previously been modelled independently and it would be useful to use that model to inform a new joint model. Here, we develop an information bottleneck approach for conditional learning on two dependent streams of data. Our method, which we call Transfer Entropy Bottleneck (TEB), allows one to learn a model that bottlenecks the directed information transferred from the source variable to the target variable, while quantifying this information transfer within the model. As such, TEB provides a useful new information bottleneck approach for modelling two statistically dependent streams of data in order to make predictions about one of them.
    Multimodal Multi-User Surface Recognition with the Kernel Two-Sample Test. (arXiv:2303.04930v1 [cs.LG])
    Machine learning and deep learning have been used extensively to classify physical surfaces through images and time-series contact data. However, these methods rely on human expertise and entail the time-consuming processes of data and parameter tuning. To overcome these challenges, we propose an easily implemented framework that can directly handle heterogeneous data sources for classification tasks. Our data-versus-data approach automatically quantifies distinctive differences in distributions in a high-dimensional space via kernel two-sample testing between two sets extracted from multimodal data (e.g., images, sounds, haptic signals). We demonstrate the effectiveness of our technique by benchmarking against expertly engineered classifiers for visual-audio-haptic surface recognition due to the industrial relevance, difficulty, and competitive baselines of this application; ablation studies confirm the utility of key components of our pipeline. As shown in our open-source code, we achieve 97.2% accuracy on a standard multi-user dataset with 108 surface classes, outperforming the state-of-the-art machine-learning algorithm by 6% on a more difficult version of the task. The fact that our classifier obtains this performance with minimal data processing in the standard algorithm setting reinforces the powerful nature of kernel methods for learning to recognize complex patterns.
    Energy-Latency Attacks via Sponge Poisoning. (arXiv:2203.08147v3 [cs.CR] UPDATED)
    Sponge examples are test-time inputs carefully optimized to increase energy consumption and latency of neural networks when deployed on hardware accelerators. In this work, we are the first to demonstrate that sponge examples can also be injected at training time, via an attack that we call sponge poisoning. This attack allows one to increase the energy consumption and latency of machine-learning models indiscriminately on each test-time input. We present a novel formalization for sponge poisoning, overcoming the limitations related to the optimization of test-time sponge examples, and show that this attack is possible even if the attacker only controls a few model updates; for instance, if model training is outsourced to an untrusted third-party or distributed via federated learning. Our extensive experimental analysis shows that sponge poisoning can almost completely vanish the effect of hardware accelerators. We also analyze the activations of poisoned models, identifying which components are more vulnerable to this attack. Finally, we examine the feasibility of countermeasures against sponge poisoning to decrease energy consumption, showing that sanitization methods may be overly expensive for most of the users.
    In Defense of the Unitary Scalarization for Deep Multi-Task Learning. (arXiv:2201.04122v4 [cs.LG] UPDATED)
    Recent multi-task learning research argues against unitary scalarization, where training simply minimizes the sum of the task losses. Several ad-hoc multi-task optimization algorithms have instead been proposed, inspired by various hypotheses about what makes multi-task settings difficult. The majority of these optimizers require per-task gradients, and introduce significant memory, runtime, and implementation overhead. We show that unitary scalarization, coupled with standard regularization and stabilization techniques from single-task learning, matches or improves upon the performance of complex multi-task optimizers in popular supervised and reinforcement learning settings. We then present an analysis suggesting that many specialized multi-task optimizers can be partly interpreted as forms of regularization, potentially explaining our surprising results. We believe our results call for a critical reevaluation of recent research in the area.
    Optimistic Whittle Index Policy: Online Learning for Restless Bandits. (arXiv:2205.15372v3 [cs.LG] UPDATED)
    Restless multi-armed bandits (RMABs) extend multi-armed bandits to allow for stateful arms, where the state of each arm evolves restlessly with different transitions depending on whether that arm is pulled. Solving RMABs requires information on transition dynamics, which are often unknown upfront. To plan in RMAB settings with unknown transitions, we propose the first online learning algorithm based on the Whittle index policy, using an upper confidence bound (UCB) approach to learn transition dynamics. Specifically, we estimate confidence bounds of the transition probabilities and formulate a bilinear program to compute optimistic Whittle indices using these estimates. Our algorithm, UCWhittle, achieves sublinear $O(H \sqrt{T \log T})$ frequentist regret to solve RMABs with unknown transitions in $T$ episodes with a constant horizon $H$. Empirically, we demonstrate that UCWhittle leverages the structure of RMABs and the Whittle index policy solution to achieve better performance than existing online learning baselines across three domains, including one constructed from a real-world maternal and childcare dataset.
    Towards Good Practices in Evaluating Transfer Adversarial Attacks. (arXiv:2211.09565v2 [cs.CR] UPDATED)
    Transfer adversarial attacks raise critical security concerns in real-world, black-box scenarios. However, the actual progress of this field is difficult to assess due to two common limitations in existing evaluations. First, different methods are often not systematically and fairly evaluated in a one-to-one comparison. Second, only transferability is evaluated but another key attack property, stealthiness, is largely overlooked. In this work, we design good practices to address these limitations, and we present the first comprehensive evaluation of transfer attacks, covering 23 representative attacks against 9 defenses on ImageNet. In particular, we propose to categorize existing attacks into five categories, which enables our systematic category-wise analyses. These analyses lead to new findings that even challenge existing knowledge and also help determine the optimal attack hyperparameters for our attack-wise comprehensive evaluation. We also pay particular attention to stealthiness, by adopting diverse imperceptibility metrics and looking into new, finer-grained characteristics. Overall, our new insights into transferability and stealthiness lead to actionable good practices for future evaluations.
    Compositional optimization of quantum circuits for quantum kernels of support vector machines. (arXiv:2203.13848v3 [quant-ph] UPDATED)
    While quantum machine learning (ML) has been proposed to be one of the most promising applications of quantum computing, how to build quantum ML models that outperform classical ML remains a major open question. Here, we demonstrate a Bayesian algorithm for constructing quantum kernels for support vector machines that adapts quantum gate sequences to data. The algorithm increases the complexity of quantum circuits incrementally by appending quantum gates selected with Bayesian information criterion as circuit selection metric and Bayesian optimization of the parameters of the locally optimal quantum circuits identified. The goal is to build quantum kernels for SVM that can solve classification problems with as little training data as possible. The performance of the resulting quantum models for the classification problems considered here significantly exceeds that of optimized classical models with conventional kernels.
    Backdoor Detection and Mitigation in Competitive Reinforcement Learning. (arXiv:2202.03609v4 [cs.LG] UPDATED)
    While real-world applications of reinforcement learning are becoming popular, the security and robustness of RL systems are worthy of more attention and exploration. In particular, recent works have revealed that, in a multi-agent RL environment, backdoor trigger actions can be injected into a victim agent (a.k.a. Trojan agent), which can result in a catastrophic failure as soon as it sees the backdoor trigger action. To ensure the security of RL agents against malicious backdoors, in this work, we propose the problem of Backdoor Detection in a multi-agent competitive reinforcement learning system, with the objective of detecting Trojan agents as well as the corresponding potential trigger actions, and further trying to mitigate their Trojan behavior. In order to solve this problem, we propose PolicyCleanse that is based on the property that the activated Trojan agents accumulated rewards degrade noticeably after several timesteps. Along with PolicyCleanse, we also design a machine unlearning-based approach that can effectively mitigate the detected backdoor. Extensive experiments demonstrate that the proposed methods can accurately detect Trojan agents, and outperform existing backdoor mitigation baseline approaches by at least 3% in winning rate across various types of agents and environments.
    Learning Stationary Markov Processes with Contrastive Adjustment. (arXiv:2303.05497v1 [cs.LG])
    We introduce a new optimization algorithm, termed \emph{contrastive adjustment}, for learning Markov transition kernels whose stationary distribution matches the data distribution. Contrastive adjustment is not restricted to a particular family of transition distributions and can be used to model data in both continuous and discrete state spaces. Inspired by recent work on noise-annealed sampling, we propose a particular transition operator, the \emph{noise kernel}, that can trade mixing speed for sample fidelity. We show that contrastive adjustment is highly valuable in human-computer design processes, as the stationarity of the learned Markov chain enables local exploration of the data manifold and makes it possible to iteratively refine outputs by human feedback. We compare the performance of noise kernels trained with contrastive adjustment to current state-of-the-art generative models and demonstrate promising results on a variety of image synthesis tasks.
    Restoration based Generative Models. (arXiv:2303.05456v1 [cs.LG])
    Denoising diffusion models (DDMs) have recently attracted increasing attention by showing impressive synthesis quality. DDMs are built on a diffusion process that pushes data to the noise distribution and the models learn to denoise. In this paper, we establish the interpretation of DDMs in terms of image restoration (IR). Integrating IR literature allows us to use an alternative objective and diverse forward processes, not confining to the diffusion process. By imposing prior knowledge on the loss function grounded on MAP-based estimation, we eliminate the need for the expensive sampling of DDMs. Also, we propose a multi-scale training, which improves the performance compared to the diffusion process, by taking advantage of the flexibility of the forward process. Experimental results demonstrate that our model improves the quality and efficiency of both training and inference. Furthermore, we show the applicability of our model to inverse problems. We believe that our framework paves the way for designing a new type of flexible general generative model.
    Dataset of Random Relaxations for Crystal Structure Search of Li-Si System. (arXiv:2012.02920v3 [cond-mat.mtrl-sci] UPDATED)
    Crystal structure search is a long-standing challenge in materials design. We present a dataset of more than 100,000 structural relaxations of potential battery anode materials from randomized structures using density functional theory calculations. We illustrate the usage of the dataset by training graph neural networks to predict structural relaxations from randomly generated structures. Our models directly predict stresses in addition to forces, which allows them to accurately simulate relaxations of both ionic positions and lattice vectors. We show that models trained on the molecular dynamics simulations fail to simulate relaxations from random structures, while training on our data leads to up to two orders of magnitude decrease in error for the same task. Our model is able to find an experimentally verified structure of a stoichiometry held out from training. We find that randomly perturbing atomic positions during training improves both the accuracy and out of domain generalization of the models.
    Efficient Testable Learning of Halfspaces with Adversarial Label Noise. (arXiv:2303.05485v1 [cs.LG])
    We give the first polynomial-time algorithm for the testable learning of halfspaces in the presence of adversarial label noise under the Gaussian distribution. In the recently introduced testable learning model, one is required to produce a tester-learner such that if the data passes the tester, then one can trust the output of the robust learner on the data. Our tester-learner runs in time $\poly(d/\eps)$ and outputs a halfspace with misclassification error $O(\opt)+\eps$, where $\opt$ is the 0-1 error of the best fitting halfspace. At a technical level, our algorithm employs an iterative soft localization technique enhanced with appropriate testers to ensure that the data distribution is sufficiently similar to a Gaussian.
    Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision. (arXiv:2303.05503v1 [cs.CV])
    Many top-down architectures for instance segmentation achieve significant success when trained and tested on pre-defined closed-world taxonomy. However, when deployed in the open world, they exhibit notable bias towards seen classes and suffer from significant performance drop. In this work, we propose a novel approach for open world instance segmentation called bottom-Up and top-Down Open-world Segmentation (UDOS) that combines classical bottom-up segmentation algorithms within a top-down learning framework. UDOS first predicts parts of objects using a top-down network trained with weak supervision from bottom-up segmentations. The bottom-up segmentations are class-agnostic and do not overfit to specific taxonomies. The part-masks are then fed into affinity-based grouping and refinement modules to predict robust instance-level segmentations. UDOS enjoys both the speed and efficiency from the top-down architectures and the generalization ability to unseen categories from bottom-up supervision. We validate the strengths of UDOS on multiple cross-category as well as cross-dataset transfer tasks from 5 challenging datasets including MS-COCO, LVIS, ADE20k, UVO and OpenImages, achieving significant improvements over state-of-the-art across the board. Our code and models are available on our project page.
    Exploiting Contextual Structure to Generate Useful Auxiliary Tasks. (arXiv:2303.05038v1 [cs.AI])
    Reinforcement learning requires interaction with an environment, which is expensive for robots. This constraint necessitates approaches that work with limited environmental interaction by maximizing the reuse of previous experiences. We propose an approach that maximizes experience reuse while learning to solve a given task by generating and simultaneously learning useful auxiliary tasks. To generate these tasks, we construct an abstract temporal logic representation of the given task and leverage large language models to generate context-aware object embeddings that facilitate object replacements. Counterfactual reasoning and off-policy methods allow us to simultaneously learn these auxiliary tasks while solving the given target task. We combine these insights into a novel framework for multitask reinforcement learning and experimentally show that our generated auxiliary tasks share similar underlying exploration requirements as the given task, thereby maximizing the utility of directed exploration. Our approach allows agents to automatically learn additional useful policies without extra environment interaction.
    Kernel Regression with Infinite-Width Neural Networks on Millions of Examples. (arXiv:2303.05420v1 [stat.ML])
    Neural kernels have drastically increased performance on diverse and nonstandard data modalities but require significantly more compute, which previously limited their application to smaller datasets. In this work, we address this by massively parallelizing their computation across many GPUs. We combine this with a distributed, preconditioned conjugate gradients algorithm to enable kernel regression at a large scale (i.e. up to five million examples). Using this approach, we study scaling laws of several neural kernels across many orders of magnitude for the CIFAR-5m dataset. Using data augmentation to expand the original CIFAR-10 training dataset by a factor of 20, we obtain a test accuracy of 91.2\% (SotA for a pure kernel method). Moreover, we explore neural kernels on other data modalities, obtaining results on protein and small molecule prediction tasks that are competitive with SotA methods.
    Adaptive Calibrator Ensemble for Model Calibration under Distribution Shift. (arXiv:2303.05331v1 [cs.LG])
    Model calibration usually requires optimizing some parameters (e.g., temperature) w.r.t an objective function (e.g., negative log-likelihood). In this paper, we report a plain, important but often neglected fact that the objective function is influenced by calibration set difficulty, i.e., the ratio of the number of incorrectly classified samples to that of correctly classified samples. If a test set has a drastically different difficulty level from the calibration set, the optimal calibration parameters of the two datasets would be different. In other words, a calibrator optimal on the calibration set would be suboptimal on the OOD test set and thus has degraded performance. With this knowledge, we propose a simple and effective method named adaptive calibrator ensemble (ACE) to calibrate OOD datasets whose difficulty is usually higher than the calibration set. Specifically, two calibration functions are trained, one for in-distribution data (low difficulty), and the other for severely OOD data (high difficulty). To achieve desirable calibration on a new OOD dataset, ACE uses an adaptive weighting method that strikes a balance between the two extreme functions. When plugged in, ACE generally improves the performance of a few state-of-the-art calibration schemes on a series of OOD benchmarks. Importantly, such improvement does not come at the cost of the in-distribution calibration accuracy.
    FedREP: A Byzantine-Robust, Communication-Efficient and Privacy-Preserving Framework for Federated Learning. (arXiv:2303.05206v1 [cs.LG])
    Federated learning (FL) has recently become a hot research topic, in which Byzantine robustness, communication efficiency and privacy preservation are three important aspects. However, the tension among these three aspects makes it hard to simultaneously take all of them into account. In view of this challenge, we theoretically analyze the conditions that a communication compression method should satisfy to be compatible with existing Byzantine-robust methods and privacy-preserving methods. Motivated by the analysis results, we propose a novel communication compression method called consensus sparsification (ConSpar). To the best of our knowledge, ConSpar is the first communication compression method that is designed to be compatible with both Byzantine-robust methods and privacy-preserving methods. Based on ConSpar, we further propose a novel FL framework called FedREP, which is Byzantine-robust, communication-efficient and privacy-preserving. We theoretically prove the Byzantine robustness and the convergence of FedREP. Empirical results show that FedREP can significantly outperform communication-efficient privacy-preserving baselines. Furthermore, compared with Byzantine-robust communication-efficient baselines, FedREP can achieve comparable accuracy with the extra advantage of privacy preservation.
    ATM Fraud Detection using Streaming Data Analytics. (arXiv:2303.04946v1 [cs.LG])
    Gaining the trust and confidence of customers is the essence of the growth and success of financial institutions and organizations. Of late, the financial industry is significantly impacted by numerous instances of fraudulent activities. Further, owing to the generation of large voluminous datasets, it is highly essential that underlying framework is scalable and meet real time needs. To address this issue, in the study, we proposed ATM fraud detection in static and streaming contexts respectively. In the static context, we investigated a parallel and scalable machine learning algorithms for ATM fraud detection that is built on Spark and trained with a variety of machine learning (ML) models including Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Gradient Boosting Tree (GBT), and Multi-layer perceptron (MLP). We also employed several balancing techniques like Synthetic Minority Oversampling Technique (SMOTE) and its variants, Generative Adversarial Networks (GAN), to address the rarity in the dataset. In addition, we proposed a streaming based ATM fraud detection in the streaming context. Our sliding window based method collects ATM transactions that are performed within a specified time interval and then utilizes to train several ML models, including NB, RF, DT, and K-Nearest Neighbour (KNN). We selected these models based on their less model complexity and quicker response time. In both contexts, RF turned out to be the best model. RF obtained the best mean AUC of 0.975 in the static context and mean AUC of 0.910 in the streaming context. RF is also empirically proven to be statistically significant than the next-best performing models.
    Entropic Wasserstein Component Analysis. (arXiv:2303.05119v1 [stat.ML])
    Dimension reduction (DR) methods provide systematic approaches for analyzing high-dimensional data. A key requirement for DR is to incorporate global dependencies among original and embedded samples while preserving clusters in the embedding space. To achieve this, we combine the principles of optimal transport (OT) and principal component analysis (PCA). Our method seeks the best linear subspace that minimizes reconstruction error using entropic OT, which naturally encodes the neighborhood information of the samples. From an algorithmic standpoint, we propose an efficient block-majorization-minimization solver over the Stiefel manifold. Our experimental results demonstrate that our approach can effectively preserve high-dimensional clusters, leading to more interpretable and effective embeddings. Python code of the algorithms and experiments is available online.
    Real-time scheduling of renewable power systems through planning-based reinforcement learning. (arXiv:2303.05205v1 [cs.AI])
    The growing renewable energy sources have posed significant challenges to traditional power scheduling. It is difficult for operators to obtain accurate day-ahead forecasts of renewable generation, thereby requiring the future scheduling system to make real-time scheduling decisions aligning with ultra-short-term forecasts. Restricted by the computation speed, traditional optimization-based methods can not solve this problem. Recent developments in reinforcement learning (RL) have demonstrated the potential to solve this challenge. However, the existing RL methods are inadequate in terms of constraint complexity, algorithm performance, and environment fidelity. We are the first to propose a systematic solution based on the state-of-the-art reinforcement learning algorithm and the real power grid environment. The proposed approach enables planning and finer time resolution adjustments of power generators, including unit commitment and economic dispatch, thus increasing the grid's ability to admit more renewable energy. The well-trained scheduling agent significantly reduces renewable curtailment and load shedding, which are issues arising from traditional scheduling's reliance on inaccurate day-ahead forecasts. High-frequency control decisions exploit the existing units' flexibility, reducing the power grid's dependence on hardware transformations and saving investment and operating costs, as demonstrated in experimental results. This research exhibits the potential of reinforcement learning in promoting low-carbon and intelligent power systems and represents a solid step toward sustainable electricity generation.
    Reward Informed Dreamer for Task Generalization in Reinforcement Learning. (arXiv:2303.05092v1 [cs.LG])
    A long-standing goal of reinforcement learning is that algorithms can learn on training tasks and generalize well on unseen tasks like humans, where different tasks share similar dynamic with different reward functions. A general challenge is that it is nontrivial to quantitatively measure the similarities between these different tasks, which is vital for analyzing the task distribution and further designing algorithms with stronger generalization. To address this, we present a novel metric named Task Distribution Relevance (TDR) via optimal Q functions to capture the relevance of the task distribution quantitatively. In the case of tasks with a high TDR, i.e., the tasks differ significantly, we demonstrate that the Markovian policies cannot distinguish them, yielding poor performance accordingly. Based on this observation, we propose a framework of Reward Informed Dreamer (RID) with reward-informed world models, which captures invariant latent features over tasks and encodes reward signals into policies for distinguishing different tasks. In RID, we calculate the corresponding variational lower bound of the log-likelihood on the data, which includes a novel term to distinguish different tasks via states, based on reward-informed world models. Finally, extensive experiments in DeepMind control suite demonstrate that RID can significantly improve the performance of handling different tasks at the same time, especially for those with high TDR, and further generalize to unseen tasks effectively.
    Model-Agnostic Federated Learning. (arXiv:2303.04906v1 [cs.LG])
    Since its debut in 2016, Federated Learning (FL) has been tied to the inner workings of Deep Neural Networks (DNNs). On the one hand, this allowed its development and widespread use as DNNs proliferated. On the other hand, it neglected all those scenarios in which using DNNs is not possible or advantageous. The fact that most current FL frameworks only allow training DNNs reinforces this problem. To address the lack of FL solutions for non-DNN-based use cases, we propose MAFL (Model-Agnostic Federated Learning). MAFL marries a model-agnostic FL algorithm, AdaBoost.F, with an open industry-grade FL framework: Intel OpenFL. MAFL is the first FL system not tied to any specific type of machine learning model, allowing exploration of FL scenarios beyond DNNs and trees. We test MAFL from multiple points of view, assessing its correctness, flexibility and scaling properties up to 64 nodes. We optimised the base software achieving a 5.5x speedup on a standard FL scenario. MAFL is compatible with x86-64, ARM-v8, Power and RISC-V.
    Identification of Systematic Errors of Image Classifiers on Rare Subgroups. (arXiv:2303.05072v1 [cs.CV])
    Despite excellent average-case performance of many image classifiers, their performance can substantially deteriorate on semantically coherent subgroups of the data that were under-represented in the training data. These systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift. A major challenge is to identify such subgroups with subpar performance when the subgroups are not annotated and their occurrence is very rare. We leverage recent advances in text-to-image models and search in the space of textual descriptions of subgroups ("prompts") for subgroups where the target model has low performance on the prompt-conditioned synthesized data. To tackle the exponentially growing number of subgroups, we employ combinatorial testing. We denote this procedure as PromptAttack as it can be interpreted as an adversarial attack in a prompt space. We study subgroup coverage and identifiability with PromptAttack in a controlled setting and find that it identifies systematic errors with high accuracy. Thereupon, we apply PromptAttack to ImageNet classifiers and identify novel systematic errors on rare subgroups.
    ESCL: Equivariant Self-Contrastive Learning for Sentence Representations. (arXiv:2303.05143v1 [cs.CL])
    Previous contrastive learning methods for sentence representations often focus on insensitive transformations to produce positive pairs, but neglect the role of sensitive transformations that are harmful to semantic representations. Therefore, we propose an Equivariant Self-Contrastive Learning (ESCL) method to make full use of sensitive transformations, which encourages the learned representations to be sensitive to certain types of transformations with an additional equivariant learning task. Meanwhile, in order to improve practicability and generality, ESCL simplifies the implementations of traditional equivariant contrastive methods to share model parameters from the perspective of multi-task learning. We evaluate our ESCL on semantic textual similarity tasks. The proposed method achieves better results while using fewer learning parameters compared to previous methods.
    Learning Human-Compatible Representations for Case-Based Decision Support. (arXiv:2303.04809v1 [cs.LG])
    Algorithmic case-based decision support provides examples to help human make sense of predicted labels and aid human in decision-making tasks. Despite the promising performance of supervised learning, representations learned by supervised models may not align well with human intuitions: what models consider as similar examples can be perceived as distinct by humans. As a result, they have limited effectiveness in case-based decision support. In this work, we incorporate ideas from metric learning with supervised learning to examine the importance of alignment for effective decision support. In addition to instance-level labels, we use human-provided triplet judgments to learn human-compatible decision-focused representations. Using both synthetic data and human subject experiments in multiple classification tasks, we demonstrate that such representation is better aligned with human perception than representation solely optimized for classification. Human-compatible representations identify nearest neighbors that are perceived as more similar by humans and allow humans to make more accurate predictions, leading to substantial improvements in human decision accuracies (17.8% in butterfly vs. moth classification and 13.2% in pneumonia classification).  ( 2 min )
    Curvature-Sensitive Predictive Coding with Approximate Laplace Monte Carlo. (arXiv:2303.04976v1 [cs.LG])
    Predictive coding (PC) accounts of perception now form one of the dominant computational theories of the brain, where they prescribe a general algorithm for inference and learning over hierarchical latent probabilistic models. Despite this, they have enjoyed little export to the broader field of machine learning, where comparative generative modelling techniques have flourished. In part, this has been due to the poor performance of models trained with PC when evaluated by both sample quality and marginal likelihood. By adopting the perspective of PC as a variational Bayes algorithm under the Laplace approximation, we identify the source of these deficits to lie in the exclusion of an associated Hessian term in the PC objective function, which would otherwise regularise the sharpness of the probability landscape and prevent over-certainty in the approximate posterior. To remedy this, we make three primary contributions: we begin by suggesting a simple Monte Carlo estimated evidence lower bound which relies on sampling from the Hessian-parameterised variational posterior. We then derive a novel block diagonal approximation to the full Hessian matrix that has lower memory requirements and favourable mathematical properties. Lastly, we present an algorithm that combines our method with standard PC to reduce memory complexity further. We evaluate models trained with our approach against the standard PC framework on image benchmark datasets. Our approach produces higher log-likelihoods and qualitatively better samples that more closely capture the diversity of the data-generating distribution.  ( 2 min )
    Finding Regularized Competitive Equilibria of Heterogeneous Agent Macroeconomic Models with Reinforcement Learning. (arXiv:2303.04833v1 [econ.GN])
    We study a heterogeneous agent macroeconomic model with an infinite number of households and firms competing in a labor market. Each household earns income and engages in consumption at each time step while aiming to maximize a concave utility subject to the underlying market conditions. The households aim to find the optimal saving strategy that maximizes their discounted cumulative utility given the market condition, while the firms determine the market conditions through maximizing corporate profit based on the household population behavior. The model captures a wide range of applications in macroeconomic studies, and we propose a data-driven reinforcement learning framework that finds the regularized competitive equilibrium of the model. The proposed algorithm enjoys theoretical guarantees in converging to the equilibrium of the market at a sub-linear rate.  ( 2 min )
    Smoothed Analysis of Sequential Probability Assignment. (arXiv:2303.04845v1 [cs.LG])
    We initiate the study of smoothed analysis for the sequential probability assignment problem with contexts. We study information-theoretically optimal minmax rates as well as a framework for algorithmic reduction involving the maximum likelihood estimator oracle. Our approach establishes a general-purpose reduction from minimax rates for sequential probability assignment for smoothed adversaries to minimax rates for transductive learning. This leads to optimal (logarithmic) fast rates for parametric classes and classes with finite VC dimension. On the algorithmic front, we develop an algorithm that efficiently taps into the MLE oracle, for general classes of functions. We show that under general conditions this algorithmic approach yields sublinear regret.  ( 2 min )
    Embodied Active Learning of Relational State Abstractions for Bilevel Planning. (arXiv:2303.04912v1 [cs.RO])
    State abstraction is an effective technique for planning in robotics environments with continuous states and actions, long task horizons, and sparse feedback. In object-oriented environments, predicates are a particularly useful form of state abstraction because of their compatibility with symbolic planners and their capacity for relational generalization. However, to plan with predicates, the agent must be able to interpret them in continuous environment states (i.e., ground the symbols). Manually programming predicate interpretations can be difficult, so we would instead like to learn them from data. We propose an embodied active learning paradigm where the agent learns predicate interpretations through online interaction with an expert. For example, after taking actions in a block stacking environment, the agent may ask the expert: "Is On(block1, block2) true?" From this experience, the agent learns to plan: it learns neural predicate interpretations, symbolic planning operators, and neural samplers that can be used for bilevel planning. During exploration, the agent plans to learn: it uses its current models to select actions towards generating informative expert queries. We learn predicate interpretations as ensembles of neural networks and use their entropy to measure the informativeness of potential queries. We evaluate this approach in three robotic environments and find that it consistently outperforms six baselines while exhibiting sample efficiency in two key metrics: number of environment interactions, and number of queries to the expert. Code: https://tinyurl.com/active-predicates  ( 2 min )
    Learning Representation for Anomaly Detection of Vehicle Trajectories. (arXiv:2303.05000v1 [cs.LG])
    Predicting the future trajectories of surrounding vehicles based on their history trajectories is a critical task in autonomous driving. However, when small crafted perturbations are introduced to those history trajectories, the resulting anomalous (or adversarial) trajectories can significantly mislead the future trajectory prediction module of the ego vehicle, which may result in unsafe planning and even fatal accidents. Therefore, it is of great importance to detect such anomalous trajectories of the surrounding vehicles for system safety, but few works have addressed this issue. In this work, we propose two novel methods for learning effective and efficient representations for online anomaly detection of vehicle trajectories. Different from general time-series anomaly detection, anomalous vehicle trajectory detection deals with much richer contexts on the road and fewer observable patterns on the anomalous trajectories themselves. To address these challenges, our methods exploit contrastive learning techniques and trajectory semantics to capture the patterns underlying the driving scenarios for effective anomaly detection under supervised and unsupervised settings, respectively. We conduct extensive experiments to demonstrate that our supervised method based on contrastive learning and unsupervised method based on reconstruction with semantic latent space can significantly improve the performance of anomalous trajectory detection in their corresponding settings over various baseline methods. We also demonstrate our methods' generalization ability to detect unseen patterns of anomalies.  ( 2 min )
    Deep Hypothesis Tests Detect Clinically Relevant Subgroup Shifts in Medical Images. (arXiv:2303.04862v1 [cs.LG])
    Distribution shifts remain a fundamental problem for the safe application of machine learning systems. If undetected, they may impact the real-world performance of such systems or will at least render original performance claims invalid. In this paper, we focus on the detection of subgroup shifts, a type of distribution shift that can occur when subgroups have a different prevalence during validation compared to the deployment setting. For example, algorithms developed on data from various acquisition settings may be predominantly applied in hospitals with lower quality data acquisition, leading to an inadvertent performance drop. We formulate subgroup shift detection in the framework of statistical hypothesis testing and show that recent state-of-the-art statistical tests can be effectively applied to subgroup shift detection on medical imaging data. We provide synthetic experiments as well as extensive evaluation on clinically meaningful subgroup shifts on histopathology as well as retinal fundus images. We conclude that classifier-based subgroup shift detection tests could be a particularly useful tool for post-market surveillance of deployed ML systems.  ( 2 min )
    Certifiable Robustness for Naive Bayes Classifiers. (arXiv:2303.04811v1 [cs.LG])
    Data cleaning is crucial but often laborious in most machine learning (ML) applications. However, task-agnostic data cleaning is sometimes unnecessary if certain inconsistencies in the dirty data will not affect the prediction of ML models to the test points. A test point is certifiably robust for an ML classifier if the prediction remains the same regardless of which (among exponentially many) cleaned dataset it is trained on. In this paper, we study certifiable robustness for the Naive Bayes classifier (NBC) on dirty datasets with missing values. We present (i) a linear time algorithm in the number of entries in the dataset that decides whether a test point is certifiably robust for NBC, (ii) an algorithm that counts for each label, the number of cleaned datasets on which the NBC can be trained to predict that label, and (iii) an efficient optimal algorithm that poisons a clean dataset by inserting the minimum number of missing values such that a test point is not certifiably robust for NBC. We prove that (iv) poisoning a clean dataset such that multiple test points become certifiably non-robust is NP-hard for any dataset with at least three features. Our experiments demonstrate that our algorithms for the decision and data poisoning problems achieve up to $19.5\times$ and $3.06\times$ speed-up over the baseline algorithms across different real-world datasets.  ( 2 min )
    You Only Crash Once: Improved Object Detection for Real-Time, Sim-to-Real Hazardous Terrain Detection and Classification for Autonomous Planetary Landings. (arXiv:2303.04891v1 [cs.CV])
    The detection of hazardous terrain during the planetary landing of spacecraft plays a critical role in assuring vehicle safety and mission success. A cheap and effective way of detecting hazardous terrain is through the use of visual cameras, which ensure operational ability from atmospheric entry through touchdown. Plagued by resource constraints and limited computational power, traditional techniques for visual hazardous terrain detection focus on template matching and registration to pre-built hazard maps. Although successful on previous missions, this approach is restricted to the specificity of the templates and limited by the fidelity of the underlying hazard map, which both require extensive pre-flight cost and effort to obtain and develop. Terrestrial systems that perform a similar task in applications such as autonomous driving utilize state-of-the-art deep learning techniques to successfully localize and classify navigation hazards. Advancements in spacecraft co-processors aimed at accelerating deep learning inference enable the application of these methods in space for the first time. In this work, we introduce You Only Crash Once (YOCO), a deep learning-based visual hazardous terrain detection and classification technique for autonomous spacecraft planetary landings. Through the use of unsupervised domain adaptation we tailor YOCO for training by simulation, removing the need for real-world annotated data and expensive mission surveying phases. We further improve the transfer of representative terrain knowledge between simulation and the real world through visual similarity clustering. We demonstrate the utility of YOCO through a series of terrestrial and extraterrestrial simulation-to-real experiments and show substantial improvements toward the ability to both detect and accurately classify instances of planetary terrain.  ( 2 min )
    Bayesian Causal Forests for Multivariate Outcomes: Application to Irish Data From an International Large Scale Education Assessment. (arXiv:2303.04874v1 [stat.ML])
    Bayesian Causal Forests (BCF) is a causal inference machine learning model based on a highly flexible non-parametric regression and classification tool called Bayesian Additive Regression Trees (BART). Motivated by data from the Trends in International Mathematics and Science Study (TIMSS), which includes data on student achievement in both mathematics and science, we present a multivariate extension of the BCF algorithm. With the help of simulation studies we show that our approach can accurately estimate causal effects for multiple outcomes subject to the same treatment. We also apply our model to Irish data from TIMSS 2019. Our findings reveal the positive effects of having access to a study desk at home (Mathematics ATE 95% CI: [0.20, 11.67]) while also highlighting the negative consequences of students often feeling hungry at school (Mathematics ATE 95% CI: [-11.15, -2.78] , Science ATE 95% CI: [-10.82,-1.72]) or often being absent (Mathematics ATE 95% CI: [-12.47, -1.55]).  ( 2 min )
    Convergence Rates for Localized Actor-Critic in Networked Markov Potential Games. (arXiv:2303.04865v1 [cs.LG])
    We introduce a class of networked Markov potential games where agents are associated with nodes in a network. Each agent has its own local potential function, and the reward of each agent depends only on the states and actions of agents within a $\kappa$-hop neighborhood. In this context, we propose a localized actor-critic algorithm. The algorithm is scalable since each agent uses only local information and does not need access to the global state. Further, the algorithm overcomes the curse of dimensionality through the use of function approximation. Our main results provide finite-sample guarantees up to a localization error and a function approximation error. Specifically, we achieve an $\tilde{\mathcal{O}}(\epsilon^{-4})$ sample complexity measured by the averaged Nash regret. This is the first finite-sample bound for multi-agent competitive games that does not depend on the number of agents.  ( 2 min )
    On the Benefits of Biophysical Synapses. (arXiv:2303.04944v1 [cs.NE])
    The approximation capability of ANNs and their RNN instantiations, is strongly correlated with the number of parameters packed into these networks. However, the complexity barrier for human understanding, is arguably related to the number of neurons and synapses in the networks, and to the associated nonlinear transformations. In this paper we show that the use of biophysical synapses, as found in LTCs, have two main benefits. First, they allow to pack more parameters for a given number of neurons and synapses. Second, they allow to formulate the nonlinear-network transformation, as a linear system with state-dependent coefficients. Both increase interpretability, as for a given task, they allow to learn a system linear in its input features, that is smaller in size compared to the state of the art. We substantiate the above claims on various time-series prediction tasks, but we believe that our results are applicable to any feedforward or recurrent ANN.  ( 2 min )
    DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep Neural Networks. (arXiv:2303.04878v1 [cs.LG])
    Deep neural networks (DNNs) are widely used in various application domains such as image processing, speech recognition, and natural language processing. However, testing DNN models may be challenging due to the complexity and size of their input domain. Particularly, testing DNN models often requires generating or exploring large unlabeled datasets. In practice, DNN test oracles, which identify the correct outputs for inputs, often require expensive manual effort to label test data, possibly involving multiple experts to ensure labeling correctness. In this paper, we propose DeepGD, a black-box multi-objective test selection approach for DNN models. It reduces the cost of labeling by prioritizing the selection of test inputs with high fault revealing power from large unlabeled datasets. DeepGD not only selects test inputs with high uncertainty scores to trigger as many mispredicted inputs as possible but also maximizes the probability of revealing distinct faults in the DNN model by selecting diverse mispredicted inputs. The experimental results conducted on four widely used datasets and five DNN models show that in terms of fault-revealing ability: (1) White-box, coverage-based approaches fare poorly, (2) DeepGD outperforms existing black-box test selection approaches in terms of fault detection, and (3) DeepGD also leads to better guidance for DNN model retraining when using selected inputs to augment the training set.  ( 2 min )
    nl2spec: Interactively Translating Unstructured Natural Language to Temporal Logics with Large Language Models. (arXiv:2303.04864v1 [cs.LO])
    A rigorous formalization of desired system requirements is indispensable when performing any verification task. This often limits the application of verification techniques, as writing formal specifications is an error-prone and time-consuming manual task. To facilitate this, we present nl2spec, a framework for applying Large Language Models (LLMs) to derive formal specifications (in temporal logics) from unstructured natural language. In particular, we introduce a new methodology to detect and resolve the inherent ambiguity of system requirements in natural language: we utilize LLMs to map subformulas of the formalization back to the corresponding natural language fragments of the input. Users iteratively add, delete, and edit these sub-translations to amend erroneous formalizations, which is easier than manually redrafting the entire formalization. The framework is agnostic to specific application domains and can be extended to similar specification languages and new neural models. We perform a user study to obtain a challenging dataset, which we use to run experiments on the quality of translations. We provide an open-source implementation, including a web-based frontend.  ( 2 min )
    Blackwell's Approachability with Time-Dependent Outcome Functions and Dot Products. Application to the Big Match. (arXiv:2303.04956v1 [math.OC])
    Blackwell's approachability is a very general sequential decision framework where a Decision Maker obtains vector-valued outcomes, and aims at the convergence of the average outcome to a given "target" set. Blackwell gave a sufficient condition for the decision maker having a strategy guaranteeing such a convergence against an adversarial environment, as well as what we now call the Blackwell's algorithm, which then ensures convergence. Blackwell's approachability has since been applied to numerous problems, in online learning and game theory, in particular. We extend this framework by allowing the outcome function and the dot product to be time-dependent. We establish a general guarantee for the natural extension to this framework of Blackwell's algorithm. In the case where the target set is an orthant, we present a family of time-dependent dot products which yields different convergence speeds for each coordinate of the average outcome. We apply this framework to the Big Match (one of the most important toy examples of stochastic games) where an $\epsilon$-uniformly optimal strategy for Player I is given by Blackwell's algorithm in a well-chosen auxiliary approachability problem.  ( 2 min )
    Improved Regret Bounds for Online Kernel Selection under Bandit Feedback. (arXiv:2303.05018v1 [cs.LG])
    In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a $O((\Vert f\Vert^2_{\mathcal{H}_i}+1)K^{\frac{1}{3}}T^{\frac{2}{3}})$ expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a $O(U^{\frac{2}{3}}K^{-\frac{1}{3}}(\sum^K_{i=1}L_T(f^\ast_i))^{\frac{2}{3}})$ expected bound where $L_T(f^\ast_i)$ is the cumulative losses of optimal hypothesis in $\mathbb{H}_{i}=\{f\in\mathcal{H}_i:\Vert f\Vert_{\mathcal{H}_i}\leq U\}$. The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a $O(U\sqrt{KT}\ln^{\frac{2}{3}}{T})$ expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous $O(\sqrt{T\ln{K}} +\Vert f\Vert^2_{\mathcal{H}_i}\max\{\sqrt{T},\frac{T}{\sqrt{\mathcal{R}}}\})$ expected bound where $\mathcal{R}$ is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.  ( 2 min )
    Phase transition for detecting a small community in a large network. (arXiv:2303.05024v1 [math.ST])
    How to detect a small community in a large network is an interesting problem, including clique detection as a special case, where a naive degree-based $\chi^2$-test was shown to be powerful in the presence of an Erd\H{o}s-Renyi background. Using Sinkhorn's theorem, we show that the signal captured by the $\chi^2$-test may be a modeling artifact, and it may disappear once we replace the Erd\H{o}s-Renyi model by a broader network model. We show that the recent SgnQ test is more appropriate for such a setting. The test is optimal in detecting communities with sizes comparable to the whole network, but has never been studied for our setting, which is substantially different and more challenging. Using a degree-corrected block model (DCBM), we establish phase transitions of this testing problem concerning the size of the small community and the edge densities in small and large communities. When the size of the small community is larger than $\sqrt{n}$, the SgnQ test is optimal for it attains the computational lower bound (CLB), the information lower bound for methods allowing polynomial computation time. When the size of the small community is smaller than $\sqrt{n}$, we establish the parameter regime where the SgnQ test has full power and make some conjectures of the CLB. We also study the classical information lower bound (LB) and show that there is always a gap between the CLB and LB in our range of interest.  ( 2 min )
    Reverse Engineering Breast MRIs: Predicting Acquisition Parameters Directly from Images. (arXiv:2303.04911v1 [eess.IV])
    The image acquisition parameters (IAPs) used to create MRI scans are central to defining the appearance of the images. Deep learning models trained on data acquired using certain parameters might not generalize well to images acquired with different parameters. Being able to recover such parameters directly from an image could help determine whether a deep learning model is applicable, and could assist with data harmonization and/or domain adaptation. Here, we introduce a neural network model that can predict many complex IAPs used to generate an MR image with high accuracy solely using the image, with a single forward pass. These predicted parameters include field strength, echo and repetition times, acquisition matrix, scanner model, scan options, and others. Even challenging parameters such as contrast agent type can be predicted with good accuracy. We perform a variety of experiments and analyses of our model's ability to predict IAPs on many MRI scans of new patients, and demonstrate its usage in a realistic application. Predicting IAPs from the images is an important step toward better understanding the relationship between image appearance and IAPs. This in turn will advance the understanding of many concepts related to the generalizability of neural network models on medical images, including domain shift, domain adaptation, and data harmonization.  ( 2 min )
    Agnostic PAC Learning of k-juntas Using L2-Polynomial Regression. (arXiv:2303.04859v1 [cs.LG])
    Many conventional learning algorithms rely on loss functions other than the natural 0-1 loss for computational efficiency and theoretical tractability. Among them are approaches based on absolute loss (L1 regression) and square loss (L2 regression). The first is proved to be an \textit{agnostic} PAC learner for various important concept classes such as \textit{juntas}, and \textit{half-spaces}. On the other hand, the second is preferable because of its computational efficiency, which is linear in the sample size. However, PAC learnability is still unknown as guarantees have been proved only under distributional restrictions. The question of whether L2 regression is an agnostic PAC learner for 0-1 loss has been open since 1993 and yet has to be answered. This paper resolves this problem for the junta class on the Boolean cube -- proving agnostic PAC learning of k-juntas using L2 polynomial regression. Moreover, we present a new PAC learning algorithm based on the Boolean Fourier expansion with lower computational complexity. Fourier-based algorithms, such as Linial et al. (1993), have been used under distributional restrictions, such as uniform distribution. We show that with an appropriate change, one can apply those algorithms in agnostic settings without any distributional assumption. We prove our results by connecting the PAC learning with 0-1 loss to the minimum mean square estimation (MMSE) problem. We derive an elegant upper bound on the 0-1 loss in terms of the MMSE error and show that the sign of the MMSE is a PAC learner for any concept class containing it.  ( 2 min )
    High Fidelity Synthetic Face Generation for Rosacea Skin Condition from Limited Data. (arXiv:2303.04839v1 [cs.CV])
    Similar to the majority of deep learning applications, diagnosing skin diseases using computer vision and deep learning often requires a large volume of data. However, obtaining sufficient data for particular types of facial skin conditions can be difficult due to privacy concerns. As a result, conditions like Rosacea are often understudied in computer-aided diagnosis. The limited availability of data for facial skin conditions has led to the investigation of alternative methods for computer-aided diagnosis. In recent years, Generative Adversarial Networks (GANs), mainly variants of StyleGANs, have demonstrated promising results in generating synthetic facial images. In this study, for the first time, a small dataset of Rosacea with 300 full-face images is utilized to further investigate the possibility of generating synthetic data. The preliminary experiments show how fine-tuning the model and varying experimental settings significantly affect the fidelity of the Rosacea features. It is demonstrated that $R_1$ Regularization strength helps achieve high-fidelity details. Additionally, this study presents qualitative evaluations of synthetic/generated faces by expert dermatologists and non-specialist participants. The quantitative evaluation is presented using a few validation metric(s). Furthermore a number of limitations and future directions are discussed. Code and generated dataset are available at: \url{https://github.com/thinkercache/stylegan2-ada-pytorch}  ( 2 min )
  • Open

    Fast post-process Bayesian inference with Sparse Variational Bayesian Monte Carlo. (arXiv:2303.05263v1 [stat.ML])
    We introduce Sparse Variational Bayesian Monte Carlo (SVBMC), a method for fast "post-process" Bayesian inference for models with black-box and potentially noisy likelihoods. SVBMC reuses all existing target density evaluations -- for example, from previous optimizations or partial Markov Chain Monte Carlo runs -- to build a sparse Gaussian process (GP) surrogate model of the log posterior density. Uncertain regions of the surrogate are then refined via active learning as needed. Our work builds on the Variational Bayesian Monte Carlo (VBMC) framework for sample-efficient inference, with several novel contributions. First, we make VBMC scalable to a large number of pre-existing evaluations via sparse GP regression, deriving novel Bayesian quadrature formulae and acquisition functions for active learning with sparse GPs. Second, we introduce noise shaping, a general technique to induce the sparse GP approximation to focus on high posterior density regions. Third, we prove theoretical results in support of the SVBMC refinement procedure. We validate our method on a variety of challenging synthetic scenarios and real-world applications. We find that SVBMC consistently builds good posterior approximations by post-processing of existing model evaluations from different sources, often requiring only a small number of additional density evaluations.  ( 2 min )
    Bayesian Causal Forests for Multivariate Outcomes: Application to Irish Data From an International Large Scale Education Assessment. (arXiv:2303.04874v1 [stat.ML])
    Bayesian Causal Forests (BCF) is a causal inference machine learning model based on a highly flexible non-parametric regression and classification tool called Bayesian Additive Regression Trees (BART). Motivated by data from the Trends in International Mathematics and Science Study (TIMSS), which includes data on student achievement in both mathematics and science, we present a multivariate extension of the BCF algorithm. With the help of simulation studies we show that our approach can accurately estimate causal effects for multiple outcomes subject to the same treatment. We also apply our model to Irish data from TIMSS 2019. Our findings reveal the positive effects of having access to a study desk at home (Mathematics ATE 95% CI: [0.20, 11.67]) while also highlighting the negative consequences of students often feeling hungry at school (Mathematics ATE 95% CI: [-11.15, -2.78] , Science ATE 95% CI: [-10.82,-1.72]) or often being absent (Mathematics ATE 95% CI: [-12.47, -1.55]).  ( 2 min )
    The joint node degree distribution in the Erd\H{o}s-R\'enyi network. (arXiv:2303.05138v1 [stat.ML])
    The Erd\H{o}s-R\'enyi random graph is the simplest model for node degree distribution, and it is one of the most widely studied. In this model, pairs of $n$ vertices are selected and connected uniformly at random with probability $p$, consequently, the degrees for a given vertex follow the binomial distribution. If the number of vertices is large, the binomial can be approximated by Normal using the Central Limit Theorem, which is often allowed when $\min (np, n(1-p)) > 5$. This is true for every node independently. However, due to the fact that the degrees of nodes in a graph are not independent, we aim in this paper to test whether the degrees of per node collectively in the Erd\H{o}s-R\'enyi graph have a multivariate normal distribution MVN. A chi square goodness of fit test for the hypothesis that binomial is a distribution for the whole set of nodes is rejected because of the dependence between degrees. Before testing MVN we show that the covariance and correlation between the degrees of any pair of nodes in the graph are $p(1-p)$ and $1/(n-1)$, respectively. We test MVN considering two assumptions: independent and dependent degrees, and we obtain our results based on the percentages of rejected statistics of chi square, the $p$-values of Anderson Darling test, and a CDF comparison. We always achieve a good fit of multivariate normal distribution with large values of $n$ and $p$, and very poor fit when $n$ or $p$ are very small. The approximation seems valid when $np \geq 10$. We also compare the maximum likelihood estimate of $p$ in MVN distribution where we assume independence and dependence. The estimators are assessed using bias, variance and mean square error.  ( 2 min )
    Entropic Wasserstein Component Analysis. (arXiv:2303.05119v1 [stat.ML])
    Dimension reduction (DR) methods provide systematic approaches for analyzing high-dimensional data. A key requirement for DR is to incorporate global dependencies among original and embedded samples while preserving clusters in the embedding space. To achieve this, we combine the principles of optimal transport (OT) and principal component analysis (PCA). Our method seeks the best linear subspace that minimizes reconstruction error using entropic OT, which naturally encodes the neighborhood information of the samples. From an algorithmic standpoint, we propose an efficient block-majorization-minimization solver over the Stiefel manifold. Our experimental results demonstrate that our approach can effectively preserve high-dimensional clusters, leading to more interpretable and effective embeddings. Python code of the algorithms and experiments is available online.  ( 2 min )
    Data-dependent Generalization Bounds via Variable-Size Compressibility. (arXiv:2303.05369v1 [stat.ML])
    In this paper, we establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework that we introduce newly here. In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data. This is shown to yield bounds that depend on the empirical measure of the given input data at hand, rather than its unknown distribution. Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds. Moreover, it is shown that our framework also allows to derive general bounds on any function of the input data and output hypothesis random variables. In particular, these general bounds are shown to subsume and possibly improve over several existing PAC-Bayes and data-dependent intrinsic dimension-based bounds that are recovered as special cases, thus unveiling a unifying character of our approach. For instance, a new data-dependent intrinsic dimension based bounds is established, which connects the generalization error to the optimization trajectories and reveals various interesting connections with rate-distortion dimension of process, R\'enyi information dimension of process, and metric mean dimension.
    Continual Learning for Monolingual End-to-End Automatic Speech Recognition. (arXiv:2112.09427v4 [eess.AS] UPDATED)
    Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of performance on the original domain(s), a phenomenon called Catastrophic Forgetting (CF). Even monolingual ASR models cannot be extended to new accents, dialects, topics, etc. without suffering from CF, making them unable to be continually enhanced without storing all past data. Fortunately, Continual Learning (CL) methods, which aim to enable continual adaptation while overcoming CF, can be used. In this paper, we implement an extensive number of CL methods for End-to-End ASR and test and compare their ability to extend a monolingual Hybrid CTC-Transformer model across four new tasks. We find that the best performing CL method closes the gap between the fine-tuned model (lower bound) and the model trained jointly on all tasks (upper bound) by more than 40%, while requiring access to only 0.6% of the original data.
    Local Convolutions Cause an Implicit Bias towards High Frequency Adversarial Examples. (arXiv:2006.11440v5 [stat.ML] UPDATED)
    Adversarial Attacks are still a significant challenge for neural networks. Recent work has shown that adversarial perturbations typically contain high-frequency features, but the root cause of this phenomenon remains unknown. Inspired by theoretical work on linear full-width convolutional models, we hypothesize that the local (i.e. bounded-width) convolutional operations commonly used in current neural networks are implicitly biased to learn high frequency features, and that this is one of the root causes of high frequency adversarial examples. To test this hypothesis, we analyzed the impact of different choices of linear and nonlinear architectures on the implicit bias of the learned features and the adversarial perturbations, in both spatial and frequency domains. We find that the high-frequency adversarial perturbations are critically dependent on the convolution operation because the spatially-limited nature of local convolutions induces an implicit bias towards high frequency features. The explanation for the latter involves the Fourier Uncertainty Principle: a spatially-limited (local in the space domain) filter cannot also be frequency-limited (local in the frequency domain). Furthermore, using larger convolution kernel sizes or avoiding convolutions (e.g. by using Vision Transformers architecture) significantly reduces this high frequency bias, but not the overall susceptibility to attacks. Looking forward, our work strongly suggests that understanding and controlling the implicit bias of architectures will be essential for achieving adversarial robustness.
    Sparse and Local Networks for Hypergraph Reasoning. (arXiv:2303.05496v1 [cs.LG])
    Reasoning about the relationships between entities from input facts (e.g., whether Ari is a grandparent of Charlie) generally requires explicit consideration of other entities that are not mentioned in the query (e.g., the parents of Charlie). In this paper, we present an approach for learning to solve problems of this kind in large, real-world domains, using sparse and local hypergraph neural networks (SpaLoc). SpaLoc is motivated by two observations from traditional logic-based reasoning: relational inferences usually apply locally (i.e., involve only a small number of individuals), and relations are usually sparse (i.e., only hold for a small percentage of tuples in a domain). We exploit these properties to make learning and inference efficient in very large domains by (1) using a sparse tensor representation for hypergraph neural networks, (2) applying a sparsification loss during training to encourage sparse representations, and (3) subsampling based on a novel information sufficiency-based sampling process during training. SpaLoc achieves state-of-the-art performance on several real-world, large-scale knowledge graph reasoning benchmarks, and is the first framework for applying hypergraph neural networks on real-world knowledge graphs with more than 10k nodes.
    On the Expressiveness and Generalization of Hypergraph Neural Networks. (arXiv:2303.05490v1 [cs.LG])
    This extended abstract describes a framework for analyzing the expressiveness, learning, and (structural) generalization of hypergraph neural networks (HyperGNNs). Specifically, we focus on how HyperGNNs can learn from finite datasets and generalize structurally to graph reasoning problems of arbitrary input sizes. Our first contribution is a fine-grained analysis of the expressiveness of HyperGNNs, that is, the set of functions that they can realize. Our result is a hierarchy of problems they can solve, defined in terms of various hyperparameters such as depths and edge arities. Next, we analyze the learning properties of these neural networks, especially focusing on how they can be trained on a finite set of small graphs and generalize to larger graphs, which we term structural generalization. Our theoretical results are further supported by the empirical results.
    Improving Open-Set Semi-Supervised Learning with Self-Supervision. (arXiv:2301.10127v2 [cs.LG] UPDATED)
    Open-set semi-supervised learning (OSSL) is a realistic setting of semi-supervised learning where the unlabeled training set contains classes that are not present in the labeled set. Many existing OSSL methods assume that these out-of-distribution data are harmful and put effort into excluding data from unknown classes from the training objective. In contrast, we propose an OSSL framework that facilitates learning from all unlabeled data through self-supervision. Additionally, we utilize an energy-based score to accurately recognize data belonging to the known classes, making our method well-suited for handling uncurated data in deployment. We show through extensive experimental evaluations on several datasets that our method shows overall unmatched robustness and performance in terms of closed-set accuracy and open-set recognition compared with state-of-the-art for OSSL. Our code will be released upon publication.
    Learning Rational Subgoals from Demonstrations and Instructions. (arXiv:2303.05487v1 [cs.AI])
    We present a framework for learning useful subgoals that support efficient long-term planning to achieve novel goals. At the core of our framework is a collection of rational subgoals (RSGs), which are essentially binary classifiers over the environmental states. RSGs can be learned from weakly-annotated data, in the form of unsegmented demonstration trajectories, paired with abstract task descriptions, which are composed of terms initially unknown to the agent (e.g., collect-wood then craft-boat then go-across-river). Our framework also discovers dependencies between RSGs, e.g., the task collect-wood is a helpful subgoal for the task craft-boat. Given a goal description, the learned subgoals and the derived dependencies facilitate off-the-shelf planning algorithms, such as A* and RRT, by setting helpful subgoals as waypoints to the planner, which significantly improves performance-time efficiency.
    Predictive Inference with Feature Conformal Prediction. (arXiv:2210.00173v2 [cs.LG] UPDATED)
    Conformal prediction is a distribution-free technique for establishing valid prediction intervals. Although conventionally people conduct conformal prediction in the output space, this is not the only possibility. In this paper, we propose feature conformal prediction, which extends the scope of conformal prediction to semantic feature spaces by leveraging the inductive bias of deep representation learning. From a theoretical perspective, we demonstrate that feature conformal prediction provably outperforms regular conformal prediction under mild assumptions. Our approach could be combined with not only vanilla conformal prediction, but also other adaptive conformal prediction methods. Apart from experiments on existing predictive inference benchmarks, we also demonstrate the state-of-the-art performance of the proposed methods on large-scale tasks such as ImageNet classification and Cityscapes image segmentation.
    Generalized Balancing Weights via Deep Neural Networks. (arXiv:2211.07533v4 [stat.ML] UPDATED)
    We present generalized balancing weights, Neural Balancing Weights (NBW), to estimate the causal effects for an arbitrary mixture of discrete and continuous interventions. The weights were obtained by directly estimating the density ratio between the source and balanced distributions by optimizing the variational representation of $f$-divergence. For this, we selected $\alpha$-divergence since it has good properties for optimization: It has an estimator whose sample complexity is independent of it's ground truth value and unbiased mini-batch gradients and is advantageous for the vanishing gradient problem. In addition, we provide a method for checking the balance of the distribution changed by the weights. If the balancing is imperfect, the weights can be improved by adding new balancing weights. Our method can be conveniently implemented with any present deep-learning libraries, and weights can be used in most state-of-the-art supervised algorithms. The code for our method is available online.
    Weakly Supervised Knowledge Transfer with Probabilistic Logical Reasoning for Object Detection. (arXiv:2303.05148v1 [cs.CV])
    Training object detection models usually requires instance-level annotations, such as the positions and labels of all objects present in each image. Such supervision is unfortunately not always available and, more often, only image-level information is provided, also known as weak supervision. Recent works have addressed this limitation by leveraging knowledge from a richly annotated domain. However, the scope of weak supervision supported by these approaches has been very restrictive, preventing them to use all available information. In this work, we propose ProbKT, a framework based on probabilistic logical reasoning that allows to train object detection models with arbitrary types of weak supervision. We empirically show on different datasets that using all available information is beneficial as our ProbKT leads to significant improvement on target domain and better generalization compared to existing baselines. We also showcase the ability of our approach to handle complex logic statements as supervision signal.
    Building Normalizing Flows with Stochastic Interpolants. (arXiv:2209.15571v3 [cs.LG] UPDATED)
    A generative model based on a continuous-time normalizing flow between any pair of base and target probability densities is proposed. The velocity field of this flow is inferred from the probability current of a time-dependent density that interpolates between the base and the target in finite time. Unlike conventional normalizing flow inference methods based the maximum likelihood principle, which require costly backpropagation through ODE solvers, our interpolant approach leads to a simple quadratic loss for the velocity itself which is expressed in terms of expectations that are readily amenable to empirical estimation. The flow can be used to generate samples from either the base or target, and to estimate the likelihood at any time along the interpolant. In addition, the flow can be optimized to minimize the path length of the interpolant density, thereby paving the way for building optimal transport maps. In situations where the base is a Gaussian density, we also show that the velocity of our normalizing flow can also be used to construct a diffusion model to sample the target as well as estimate its score. However, our approach shows that we can bypass this diffusion completely and work at the level of the probability flow with greater simplicity, opening an avenue for methods based solely on ordinary differential equations as an alternative to those based on stochastic differential equations. Benchmarking on density estimation tasks illustrates that the learned flow can match and surpass conventional continuous flows at a fraction of the cost, and compares well with diffusions on image generation on CIFAR-10 and ImageNet $32\times32$. The method scales ab-initio ODE flows to previously unreachable image resolutions, demonstrated up to $128\times128$.
    Kernel Regression with Infinite-Width Neural Networks on Millions of Examples. (arXiv:2303.05420v1 [stat.ML])
    Neural kernels have drastically increased performance on diverse and nonstandard data modalities but require significantly more compute, which previously limited their application to smaller datasets. In this work, we address this by massively parallelizing their computation across many GPUs. We combine this with a distributed, preconditioned conjugate gradients algorithm to enable kernel regression at a large scale (i.e. up to five million examples). Using this approach, we study scaling laws of several neural kernels across many orders of magnitude for the CIFAR-5m dataset. Using data augmentation to expand the original CIFAR-10 training dataset by a factor of 20, we obtain a test accuracy of 91.2\% (SotA for a pure kernel method). Moreover, we explore neural kernels on other data modalities, obtaining results on protein and small molecule prediction tasks that are competitive with SotA methods.
    Communication-Efficient Collaborative Heterogeneous Bandits in Networks. (arXiv:2303.05445v1 [cs.LG])
    The multi-agent multi-armed bandit problem has been studied extensively due to its ubiquity in many real-life applications, such as online recommendation systems and wireless networking. We consider the setting where agents should minimize their group regret while collaborating over a given graph via some communication protocol and where each agent is given a different set of arms. Previous literature on this problem only considered one of the two desired features separately: agents with the same arm set communicate over a general graph, or agents with different arm sets communicate over a fully connected graph. In this work, we introduce a more general problem setting that encompasses all the desired features. For this novel setting, we first provide a rigorous regret analysis for the standard flooding protocol combined with the UCB policy. Then, to mitigate the issue of high communication costs incurred by flooding, we propose a new protocol called Flooding with Absorption (FWA). We provide a theoretical analysis of the regret bound and intuitions on the advantages of using FWA over flooding. Lastly, we verify empirically that using FWA leads to significantly lower communication costs despite minimal regret performance loss compared to flooding.
    PDSketch: Integrated Planning Domain Programming and Learning. (arXiv:2303.05501v1 [cs.AI])
    This paper studies a model learning and online planning approach towards building flexible and general robots. Specifically, we investigate how to exploit the locality and sparsity structures in the underlying environmental transition model to improve model generalization, data-efficiency, and runtime-efficiency. We present a new domain definition language, named PDSketch. It allows users to flexibly define high-level structures in the transition models, such as object and feature dependencies, in a way similar to how programmers use TensorFlow or PyTorch to specify kernel sizes and hidden dimensions of a convolutional neural network. The details of the transition model will be filled in by trainable neural networks. Based on the defined structures and learned parameters, PDSketch automatically generates domain-independent planning heuristics without additional training. The derived heuristics accelerate the performance-time planning for novel goals.
    SAM as an Optimal Relaxation of Bayes. (arXiv:2210.01620v2 [cs.LG] UPDATED)
    Sharpness-aware minimization (SAM) and related adversarial deep-learning methods can drastically improve generalization, but their underlying mechanisms are not yet fully understood. Here, we establish SAM as a relaxation of the Bayes objective where the expected negative-loss is replaced by the optimal convex lower bound, obtained by using the so-called Fenchel biconjugate. The connection enables a new Adam-like extension of SAM to automatically obtain reasonable uncertainty estimates, while sometimes also improving its accuracy. By connecting adversarial and Bayesian methods, our work opens a new path to robustness.
    Optimal Algorithms for Latent Bandits with Cluster Structure. (arXiv:2301.07040v2 [cs.LG] UPDATED)
    We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into \emph{latent} clusters such that the mean reward vectors of users within the same cluster are identical. At each round, a user, selected uniformly at random, pulls an arm and observes a corresponding noisy reward. The goal of the users is to maximize their cumulative rewards. This problem is central to practical recommendation systems and has received wide attention of late \cite{gentile2014online, maillard2014latent}. Now, if each user acts independently, then they would have to explore each arm independently and a regret of $\Omega(\sqrt{\mathsf{MNT}})$ is unavoidable, where $\mathsf{M}, \mathsf{N}$ are the number of arms and users, respectively. Instead, we propose LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $\widetilde{O}(\sqrt{(\mathsf{M}+\mathsf{N})\mathsf{T}})$, when the number of clusters is $\widetilde{O}(1)$. This is the first algorithm to guarantee such strong regret bound. LATTICE is based on a careful exploitation of arm information within a cluster while simultaneously clustering users. Furthermore, it is computationally efficient and requires only $O(\log{\mathsf{T}})$ calls to an offline matrix completion oracle across all $\mathsf{T}$ rounds.
    Asynchronous and Error-prone Longitudinal Data Analysis via Functional Calibration. (arXiv:2209.13807v2 [stat.ME] UPDATED)
    In many longitudinal settings, time-varying covariates may not be measured at the same time as responses and are often prone to measurement error. Naive last-observation-carried-forward methods incur estimation biases, and existing kernel-based methods suffer from slow convergence rates and large variations. To address these challenges, we propose a new functional calibration approach to efficiently learn longitudinal covariate processes based on sparse functional data with measurement error. Our approach, stemming from functional principal component analysis, calibrates the unobserved synchronized covariate values from the observed asynchronous and error-prone covariate values, and is broadly applicable to asynchronous longitudinal regression with time-invariant or time-varying coefficients. For regression with time-invariant coefficients, our estimator is asymptotically unbiased, root-n consistent, and asymptotically normal; for time-varying coefficient models, our estimator has the optimal varying coefficient model convergence rate with inflated asymptotic variance from the calibration. In both cases, our estimators present asymptotic properties superior to the existing methods. The feasibility and usability of the proposed methods are verified by simulations and an application to the Study of Women's Health Across the Nation, a large-scale multi-site longitudinal study on women's health during mid-life.
    A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta. (arXiv:2206.11124v2 [cs.LG] UPDATED)
    Mini-batch SGD with momentum is a fundamental algorithm for learning large predictive models. In this paper we develop a new analytic framework to analyze noise-averaged properties of mini-batch SGD for linear models at constant learning rates, momenta and sizes of batches. Our key idea is to consider the dynamics of the second moments of model parameters for a special family of "Spectrally Expressible" approximations. This allows to obtain an explicit expression for the generating function of the sequence of loss values. By analyzing this generating function, we find, in particular, that 1) the SGD dynamics exhibits several convergent and divergent regimes depending on the spectral distributions of the problem; 2) the convergent regimes admit explicit stability conditions, and explicit loss asymptotics in the case of power-law spectral distributions; 3) the optimal convergence rate can be achieved at negative momenta. We verify our theoretical predictions by extensive experiments with MNIST, CIFAR10 and synthetic problems, and find a good quantitative agreement.
    Phase transition for detecting a small community in a large network. (arXiv:2303.05024v1 [math.ST])
    How to detect a small community in a large network is an interesting problem, including clique detection as a special case, where a naive degree-based $\chi^2$-test was shown to be powerful in the presence of an Erd\H{o}s-Renyi background. Using Sinkhorn's theorem, we show that the signal captured by the $\chi^2$-test may be a modeling artifact, and it may disappear once we replace the Erd\H{o}s-Renyi model by a broader network model. We show that the recent SgnQ test is more appropriate for such a setting. The test is optimal in detecting communities with sizes comparable to the whole network, but has never been studied for our setting, which is substantially different and more challenging. Using a degree-corrected block model (DCBM), we establish phase transitions of this testing problem concerning the size of the small community and the edge densities in small and large communities. When the size of the small community is larger than $\sqrt{n}$, the SgnQ test is optimal for it attains the computational lower bound (CLB), the information lower bound for methods allowing polynomial computation time. When the size of the small community is smaller than $\sqrt{n}$, we establish the parameter regime where the SgnQ test has full power and make some conjectures of the CLB. We also study the classical information lower bound (LB) and show that there is always a gap between the CLB and LB in our range of interest.
    StyleDiff: Attribute Comparison Between Unlabeled Datasets in Latent Disentangled Space. (arXiv:2303.05102v1 [stat.ML])
    One major challenge in machine learning applications is coping with mismatches between the datasets used in the development and those obtained in real-world applications. These mismatches may lead to inaccurate predictions and errors, resulting in poor product quality and unreliable systems. In this study, we propose StyleDiff to inform developers of the differences between the two datasets for the steady development of machine learning systems. Using disentangled image spaces obtained from recently proposed generative models, StyleDiff compares the two datasets by focusing on attributes in the images and provides an easy-to-understand analysis of the differences between the datasets. The proposed StyleDiff performs in $O (d N\log N)$, where $N$ is the size of the datasets and $d$ is the number of attributes, enabling the application to large datasets. We demonstrate that StyleDiff accurately detects differences between datasets and presents them in an understandable format using, for example, driving scenes datasets.
    Computable Phenotypes to Characterize Changing Patient Brain Dysfunction in the Intensive Care Unit. (arXiv:2303.05504v1 [q-bio.QM])
    In the United States, more than 5 million patients are admitted annually to ICUs, with ICU mortality of 10%-29% and costs over $82 billion. Acute brain dysfunction status, delirium, is often underdiagnosed or undervalued. This study's objective was to develop automated computable phenotypes for acute brain dysfunction states and describe transitions among brain dysfunction states to illustrate the clinical trajectories of ICU patients. We created two single-center, longitudinal EHR datasets for 48,817 adult patients admitted to an ICU at UFH Gainesville (GNV) and Jacksonville (JAX). We developed algorithms to quantify acute brain dysfunction status including coma, delirium, normal, or death at 12-hour intervals of each ICU admission and to identify acute brain dysfunction phenotypes using continuous acute brain dysfunction status and k-means clustering approach. There were 49,770 admissions for 37,835 patients in UFH GNV dataset and 18,472 admissions for 10,982 patients in UFH JAX dataset. In total, 18% of patients had coma as the worst brain dysfunction status; every 12 hours, around 4%-7% would transit to delirium, 22%-25% would recover, 3%-4% would expire, and 67%-68% would remain in a coma in the ICU. Additionally, 7% of patients had delirium as the worst brain dysfunction status; around 6%-7% would transit to coma, 40%-42% would be no delirium, 1% would expire, and 51%-52% would remain delirium in the ICU. There were three phenotypes: persistent coma/delirium, persistently normal, and transition from coma/delirium to normal almost exclusively in first 48 hours after ICU admission. We developed phenotyping scoring algorithms that determined acute brain dysfunction status every 12 hours while admitted to the ICU. This approach may be useful in developing prognostic and decision-support tools to aid patients and clinicians in decision-making on resource use and escalation of care.
    Efficient Testable Learning of Halfspaces with Adversarial Label Noise. (arXiv:2303.05485v1 [cs.LG])
    We give the first polynomial-time algorithm for the testable learning of halfspaces in the presence of adversarial label noise under the Gaussian distribution. In the recently introduced testable learning model, one is required to produce a tester-learner such that if the data passes the tester, then one can trust the output of the robust learner on the data. Our tester-learner runs in time $\poly(d/\eps)$ and outputs a halfspace with misclassification error $O(\opt)+\eps$, where $\opt$ is the 0-1 error of the best fitting halfspace. At a technical level, our algorithm employs an iterative soft localization technique enhanced with appropriate testers to ensure that the data distribution is sufficiently similar to a Gaussian.
    Smoothed Analysis of Sequential Probability Assignment. (arXiv:2303.04845v1 [cs.LG])
    We initiate the study of smoothed analysis for the sequential probability assignment problem with contexts. We study information-theoretically optimal minmax rates as well as a framework for algorithmic reduction involving the maximum likelihood estimator oracle. Our approach establishes a general-purpose reduction from minimax rates for sequential probability assignment for smoothed adversaries to minimax rates for transductive learning. This leads to optimal (logarithmic) fast rates for parametric classes and classes with finite VC dimension. On the algorithmic front, we develop an algorithm that efficiently taps into the MLE oracle, for general classes of functions. We show that under general conditions this algorithmic approach yields sublinear regret.
    Generalization Bounds via Information Density and Conditional Information Density. (arXiv:2005.08044v6 [cs.LG] UPDATED)
    We present a general approach, based on exponential inequalities, to derive bounds on the generalization error of randomized learning algorithms. Using this approach, we provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios. Specifically, for the case of sub-Gaussian loss functions, we obtain novel bounds that depend on the information density between the training data and the output hypothesis. When suitably weakened, these bounds recover many of the information-theoretic bounds available in the literature. We also extend the proposed exponential-inequality approach to the setting recently introduced by Steinke and Zakynthinou (2020), where the learning algorithm depends on a randomly selected subset of the available training data. For this setup, we present bounds for bounded loss functions in terms of the conditional information density between the output hypothesis and the random variable determining the subset choice, given all training data. Through our approach, we recover the average generalization bound presented by Steinke and Zakynthinou (2020) and extend it to the PAC-Bayesian and single-draw scenarios. For the single-draw scenario, we also obtain novel bounds in terms of the conditional $\alpha$-mutual information and the conditional maximal leakage.
    Agnostic PAC Learning of k-juntas Using L2-Polynomial Regression. (arXiv:2303.04859v1 [cs.LG])
    Many conventional learning algorithms rely on loss functions other than the natural 0-1 loss for computational efficiency and theoretical tractability. Among them are approaches based on absolute loss (L1 regression) and square loss (L2 regression). The first is proved to be an \textit{agnostic} PAC learner for various important concept classes such as \textit{juntas}, and \textit{half-spaces}. On the other hand, the second is preferable because of its computational efficiency, which is linear in the sample size. However, PAC learnability is still unknown as guarantees have been proved only under distributional restrictions. The question of whether L2 regression is an agnostic PAC learner for 0-1 loss has been open since 1993 and yet has to be answered. This paper resolves this problem for the junta class on the Boolean cube -- proving agnostic PAC learning of k-juntas using L2 polynomial regression. Moreover, we present a new PAC learning algorithm based on the Boolean Fourier expansion with lower computational complexity. Fourier-based algorithms, such as Linial et al. (1993), have been used under distributional restrictions, such as uniform distribution. We show that with an appropriate change, one can apply those algorithms in agnostic settings without any distributional assumption. We prove our results by connecting the PAC learning with 0-1 loss to the minimum mean square estimation (MMSE) problem. We derive an elegant upper bound on the 0-1 loss in terms of the MMSE error and show that the sign of the MMSE is a PAC learner for any concept class containing it.
    Penalized Deep Partially Linear Cox Models with Application to CT Scans of Lung Cancer Patients. (arXiv:2303.05341v1 [stat.ML])
    Lung cancer is a leading cause of cancer mortality globally, highlighting the importance of understanding its mortality risks to design effective patient-centered therapies. The National Lung Screening Trial (NLST) was a nationwide study aimed at investigating risk factors for lung cancer. The study employed computed tomography texture analysis (CTTA), which provides objective measurements of texture patterns on CT scans, to quantify the mortality risks of lung cancer patients. Partially linear Cox models are becoming a popular tool for modeling survival outcomes, as they effectively handle both established risk factors (such as age and other clinical factors) and new risk factors (such as image features) in a single framework. The challenge in identifying the texture features that impact cancer survival is due to their sensitivity to factors such as scanner type, segmentation, and organ motion. To overcome this challenge, we propose a novel Penalized Deep Partially Linear Cox Model (Penalized DPLC), which incorporates the SCAD penalty to select significant texture features and employs a deep neural network to estimate the nonparametric component of the model accurately. We prove the convergence and asymptotic properties of the estimator and compare it to other methods through extensive simulation studies, evaluating its performance in risk prediction and feature selection. The proposed method is applied to the NLST study dataset to uncover the effects of key clinical and imaging risk factors on patients' survival. Our findings provide valuable insights into the relationship between these factors and survival outcomes.
    TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization. (arXiv:2303.05506v1 [cs.LG])
    Despite their success with unstructured data, deep neural networks are not yet a panacea for structured tabular data. In the tabular domain, their efficiency crucially relies on various forms of regularization to prevent overfitting and provide strong generalization performance. Existing regularization techniques include broad modelling decisions such as choice of architecture, loss functions, and optimization methods. In this work, we introduce Tabular Neural Gradient Orthogonalization and Specialization (TANGOS), a novel framework for regularization in the tabular setting built on latent unit attributions. The gradient attribution of an activation with respect to a given input feature suggests how the neuron attends to that feature, and is often employed to interpret the predictions of deep networks. In TANGOS, we take a different approach and incorporate neuron attributions directly into training to encourage orthogonalization and specialization of latent attributions in a fully-connected network. Our regularizer encourages neurons to focus on sparse, non-overlapping input features and results in a set of diverse and specialized latent units. In the tabular domain, we demonstrate that our approach can lead to improved out-of-sample generalization performance, outperforming other popular regularization methods. We provide insight into why our regularizer is effective and demonstrate that TANGOS can be applied jointly with existing methods to achieve even greater generalization performance.

  • Open

    Anyone else doing Reinforcement Learning in finance?
    Hey y'all! So for like almost a whole year now, I've been messin' with algorithms that use reinforcement learnin' to trade stocks. I had to do some crazy stuff like processin' data, tweakin' hyper-parameters, integratin' live feeds, usin' XAI, and lots more. So, I was wonderin' if any of ya'll out there are doin' somethin' similar? I'd love to see how you're tacklin' this topic too! If you are, hit me up in the comments or PMs so we can talk shop! submitted by /u/Yahentamitsi [link] [comments]  ( 41 min )
    Can I feed by dqn semi optimal actions to make it identify better actions for states faster?
    I have a dqn built to play a game, but it is very unlikely to randomly turn pick the best action sequence on its own. The game is capable of looking ahead to see the optimal immediate score of an action. Would it be useful to randomly feed these semi optimal actions into my NN? submitted by /u/PainisPingas [link] [comments]  ( 41 min )
    Finally my bird is capturing the sky.
    submitted by /u/dharambir_iitk [link] [comments]  ( 41 min )
    Novel Methods to modify the Bellman Optimality Equations(BOE) to lessen overestimation bias
    Hi everyone, I am working on a research project where a large chunk of the problem is the title, where we have been left to brainstorm lol, I did some digging, and found papers on Conservative Q Functions and Clipped and Truncated Q Learning, but am still drawing a blank on something completely new to the BOE. Would love to hear some of your ideas, even theoretical concepts! Thanks. submitted by /u/TittyMcSwag619 [link] [comments]  ( 41 min )
  • Open

    isn't it scary how chat gpt lies to us when not jailbroken
    submitted by /u/Snoo85321 [link] [comments]  ( 6 min )
    What are 5 AI tools anyone just starting into this ai revolution should know about?
    submitted by /u/callhimV [link] [comments]  ( 41 min )
    AI Dream 174 - This FREE A.I. Tool Will Change Youtube Forever
    submitted by /u/LordPewPew777 [link] [comments]  ( 41 min )
    The Real Danger of AI (It has already happened)
    The real struggle of humanity is between the rich and the poor The rich are always looking to expand their influence and control upon the poor If the rich could own everyone like slaves, having them work from dawn until dusk to produce for them, they would However, this does not occur (at least in America) because the poor have overwhelming numbers compared to the rich The poor can rise up and overthrow the rich with physical force, if they band together and feel as if their conditions are bad enough And so, the struggle is always between the rich trying to exert more control on the poor, and the poor rebelling against the rich The rich are smart because they understand that they cannot control the poor using physical force The simple fact is that the poor have too many numbers, and…  ( 45 min )
    This week in AI - a summary of major news around AI
    Privacy-focused search engine DuckDuckGo has launched a beta version of its AI-powered summarization feature called DuckAssist, which can answer straightforward search queries using natural language technology from OpenAI and AI startup Anthropic, combined with the company's own active indexing of Wikipedia and other reference sites. Salesforce and OpenAI introduced the ChatGPT app for Slack. Built by OpenAI on the Slack platform, the app integrates ChatGPT’s powerful AI technology to deliver instant conversation summaries, research tools, and writing assistance directly in Slack Stability AI, the open source generative AI company, has acquired Init ML, makers of the popular imaging tool Clipdrop. Discord is introducing new AI experiences to its platform. These include: updating its b…  ( 43 min )
    Get Ready For Next Week: ChatGPT-4 Is Coming With Mind-Blowing Capabilities!
    submitted by /u/liquidocelotYT [link] [comments]  ( 41 min )
    The Computer Scientist Taking on Big Tech: Privacy, Lies and AI
    submitted by /u/MsNunez [link] [comments]  ( 41 min )
    GPT-4 reveal: Microsoft won't comment on launch rumors
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 41 min )
    Can we get David Attenborough to narrate nature series for all time?
    I love watching nature documentaries narrated by David, but one day he can’t make them anymore. The question in the title is too specific to answer, but in general, can someone give a company the right to use their voice for such work? Even after they passed away? submitted by /u/Zephyp [link] [comments]  ( 41 min )
    Text-To-Speech AIs Are Dangerously Good Now
    submitted by /u/bukowski3000 [link] [comments]  ( 41 min )
    The Canadian Genius That Created Modern AI – Geoff Hinton
    submitted by /u/webmanpt [link] [comments]  ( 41 min )
    New Book: Philosopher talks with GPT persona for over one year (open access)
    submitted by /u/picardstrikesback [link] [comments]  ( 41 min )
    We are entering a whole new world 🤯 Add the physical movement capabilities of the Boston Dynamics robots and give the speech models a few more years to evolve - voila, we have Ex Machina.
    submitted by /u/Parth-Prajapati [link] [comments]  ( 43 min )
    I made an AI-powered drawing app that turns your doodles into stunning art
    submitted by /u/catalinghita8 [link] [comments]  ( 42 min )
  • Open

    [D] What Improvements Accelerate the AI field Multiple orders of magnitude every year?
    These are just my perspectives, I am curious to hear how other people see it in the comments. From my perspective there are the following improvements that accelerate AI reserch with multiple orders of magnitude every year: 1.) Low barrier to entrance for researchers as hugging face, kaggle, google colab gives you free resources (CPU,RAM,GPU,TPU) to study 2.) More efficient models: with smaller models reproducing similar results as larger counterpart a good example is Open AI DALL-E vs stable diffusion. 3.) More efficient techniques: Ex changing computation from FP32 -> FP 16 in Nvidia GPUs 4.) Cleaner better labeled data by the community 4.) More efficient underlying programing language optimizations 5.) Rewritten more efficient code 6.) New hardware 7.) Special purpose hardware (while for gaming and other general purpose benchmarks there are 20-30% improvements every year or every 2 years) for AI reserch TENSOR cores (Nvidia GPUs, Google Cloud TPUs) or apple's Neural engines are orders of magnitude of speed improvement for AI models. Or many supercomputers are ARM based (that is not fully related to here but overall great architectural changes). 8.) New hardware types: analog processors might make a comeback soon that helps calculate floating point operations faster for neural nets. (others: Intelligence Processing Unit, Hogel processing unit (HPU) ) 9.) Just the number of new professionals/researchers entering different fields of the AI game. University Majors, online courses, jobs ... 10.) Money/funding. 11.) Becoming culturally mainstream, non professionals realizing that they use it every day. submitted by /u/glassAlloy [link] [comments]  ( 45 min )
    [D] What's the Time and Space Complexity of Transformer Models Inference?
    What's the Big (O) at inference time for transformer models? Is it different for BERT? RoBERTa? T5? DeBERTa? submitted by /u/Smooth-Earth-9897 [link] [comments]  ( 44 min )
    [D] Subreddit with AI tools only
    I created a subreddit where I post a new AI tool every hour. I thought it would be useful to gather them all in one place on Reddit, so they don't get lost among the multitude of AI subreddits and topics: https://www.reddit.com/r/AItoolsCatalog/ If you have an amazing project that you'd like to share or if you want to suggest one that in your opinion should be included, feel free to do so. submitted by /u/bart_so [link] [comments]  ( 43 min )
    [D] What are the Inputs to a Model That Plays Dynamic RTS Games Like StarCraft?
    I am familiar with writing networks to play games that have very defined inputs such as Snake or tic tac toe. But what are the inputs for games where units and buildings are constantly being spawned/destroyed? I assume the amount of parameters in the input layer cant be dynamically changing so how do the models handle this? Whats the input difference from a game state with 5 enemies revealed vs. a game state with 100 units revealed? I assume there is a lot of "hand waiving" going on in the input layer and its not getting the position of every unit in the game but Im not sure. Any insight would be great! submitted by /u/FlamingUnicorns [link] [comments]  ( 43 min )
    [P] Frouros: A Python library for drift detection in Machine Learning problems
    Hey everyone! I want to share with you an open-source library that we've been building for a while. Frouros: A Python library for drift detection in machine learning problems. https://github.com/IFCA/frouros Frouros implements multiple methods capable of detecting both concept and data drift with a simple, flexible and extendable API. It is intended to be used in conjunction with any machine learning library/framework, therefore is framework-agnostic, although it could also be used for non machine learning problems. Moreover, Frouros offers the well-known concept of callbacks that is included in libraries like Keras or PyTorch Lightning. This makes it simple to run custom user code at certain points (e.g., on_drift_detected, on_update_start, on_update_end). We are currently working on including more examples in the documentation to show what can be done with Frouros. I would appreciate any feedback you could provide us! submitted by /u/Ill_Relationship_547 [link] [comments]  ( 43 min )
    "[Project]" , "[Discussion]" Looking for suggestions for a model to use in an image similarity task
    I am currently working on my thesis on a dataset called DISC21. I am trying to achieve good results in the descriptor track (representing each image in a 256 dimensions vector). I tried to finetune ViTl16 (with only unfreezing the last 3 layers) model with only a subset of the training image due to my hardware limitations (I took 1000 original training images and generated 30 augmented images for each of them and started finetuning the model as if it was a classification task. after that I removed the dense layer I added for the 1000 class to extract features) I believe this training approach is wrong because I am training the model with augmented images without the model actually seeing the original image (the main goal of the dataset is to find the origin of an augmented image) I am as…  ( 51 min )
    [D] Is BERT going to be obsolete by ChatGPT?
    Search engines and chatbots are rapidly advancing with the introduction of ChatGPT, but search engines are still running on models like BERT (Bidirectional Encoder Representations from Transformers) for question-answer functionality. An interesting OpenVINO jupyter notebook #213 demonstrating question-answer functionality with a SQuAD v1.1 training set trained BERT model and transformer functions using either an embedded paragraph or a link to a website. I find it still really compelling to see how this works, basically how our widely used search engines function though on a much bigger scale. submitted by /u/JayMBurris [link] [comments]  ( 44 min )
    [P] Counterpoint - a generative model for Fugues and Chorales in the style of J.S. Bach (with samples)
    Samples can be found here and here. See how they compare to the original chorales and fugues. The model uses a Transformer encoder architecture to complete partially corrupted sequences representations of music. A version of Gibbs sampling is then used to construct new music from scratch. The entire model was trained in under 30 minutes on a single Tesla V100 - really showcasing the efficiency of Transformers in general. Note that the fugue samples are seeded by the first three bars of an actual Bach fugue. The chorales are generated completely from scratch! For more information on how it works - see the GitHub repo or follow me on Twitter. submitted by /u/ustainbolt [link] [comments]  ( 43 min )
    [P] RWKV 14B is a strong chatbot despite only trained on Pile (16G VRAM for 14B ctx4096 INT8, more optimizations incoming)
    The latest CharRWKV v2 has a new chat prompt (works for any topic), and here are some raw user chats with RWKV-4-Pile-14B-20230228-ctx4096-test663 model (topp=0.85, temp=1.0, presence penalty 0.2, frequency penalty 0.5). You are welcome to try ChatRWKV v2: https://github.com/BlinkDL/ChatRWKV And please keep in mind that RWKV is 100% RNN :) Pile v1 date cutoff is year 2020. Chat #1 Chat #2 ​ These are surprisingly good because RWKV is only trained on the Pile (and 100% RNN). No finetuning. No instruct tuning. No RLHF. You are welcome to try it. Update ChatRWKV v2 [and rwkv pip package] to latest version. Use https://huggingface.co/BlinkDL/rwkv-4-pile-14b/blob/main/RWKV-4-Pile-14B-20230228-ctx4096-test663.pth Run v2/chat.py and enjoy. ChatRWKV v2 supports INT8 now (with my crappy slow quantization, works for windows, supports any GPU, 16G VRAM for 14B if you offload final layer to CPU). And you can offload more layers to CPU to run it with 3G VRAM though that will be very slow :) More optimizations are coming. Or you can try the 7B model (less coherency) and 3B model (not very coherent, but still fun). submitted by /u/bo_peng [link] [comments]  ( 45 min )
    [D] Version 2.1 of the Open Deep Learning Toolkit for Robotics is already available!
    The latest version of the Open Deep Learning Toolkit for Robotics, Version 2.1 is already available ! This new version includes the following updates: New Features: Added Efficient LiDAR Panoptic Segmentation Added Nanodet 2D Object Detection tool Added C API implementations of NanoDet 2D Object Detection tool Added C API implementations of forward pass of DETR 2D Object Detection tool Added C API implementations of forward pass of DeepSORT 2D Object Tracking tool Added C API implementations of forward pass of Lightweight OpenPose, Pose Estimator tool Added C API implementations of forward pass of X3D 2D Activity Recognition tool Added C API implementations of forward pass of Progressive Spatiotemporal GCN Skeleton-based Action Recognition tool Added Binary High Resolution Analysis tool Added Multi-Object-Search tool Enhancements Added support in C API for detection target structure and vector of detections Added support in C API for tensor structure and vector of tensors Added support in C API for json parser You can download the toolkit here: - GitHub: https://github.com/opendr-eu/opendr - pip: https://pypi.org/project/opendr-toolkit/ - Docker Hub: https://hub.docker.com/r/opendr/opendr-toolkit/tags Looking forward for your comments and suggestions! submitted by /u/OpenDR_H2020_Project [link] [comments]  ( 43 min )
  • Open

    Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive
    Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio. Data Wrangler enables you to access data from a wide variety of popular sources (Amazon S3, Amazon Athena, Amazon Redshift, Amazon EMR and Snowflake) and over 40 other third-party sources. […]  ( 10 min )
    Using Amazon SageMaker with Point Clouds: Part 1- Ground Truth for 3D labeling
    In this two-part series, we demonstrate how to label and train models for 3D object detection tasks. In part 1, we discuss the dataset we’re using, as well as any preprocessing steps, to understand and label data. In part 2, we walk through how to train a model on your dataset and deploy it to […]  ( 13 min )
    Real-time fraud detection using AWS serverless and machine learning services
    Online fraud has a widespread impact on businesses and requires an effective end-to-end strategy to detect and prevent new account fraud and account takeovers, and stop suspicious payment transactions. In this post, we show a serverless approach to detect online transaction fraud in near-real time. We show how you can apply this approach to various data streaming and event-driven architectures, depending on the desired outcome and actions to take to prevent fraud (such as alert the user about the fraud or flag the transaction for additional review).  ( 7 min )
  • Open

    PaLM-E: An embodied multimodal language model
    Posted by Danny Driess, Student Researcher, and Pete Florence, Research Scientist, Robotics at Google Recent years have seen tremendous advances across machine learning domains, from models that can explain jokes or answer visual questions in a variety of languages to those that can produce images based on text descriptions. Such innovations have been possible due to the increase in availability of large scale datasets along with novel advances that enable the training of models on these data. While scaling of robotics models has seen some success, it is outpaced by other domains due to a lack of datasets available on a scale comparable to large text corpora or image datasets. Today we introduce PaLM-E, a new generalist robotics model that overcomes these issues by transferring know…  ( 93 min )
  • Open

    Plotting constellations
    Suppose you wanted to write a program to plot constellations. This leads down some interesting rabbit trails. When you look up data on stars in constellations you run into two meanings of constellation. For example, Leo is a region of the night sky containing an untold number of stars. It is also a pattern of […] Plotting constellations first appeared on John D. Cook.  ( 6 min )
  • Open

    MIT professor to Congress: “We are at an inflection point” with AI
    Aleksander Mądry urges lawmakers to ask rigorous questions about how AI tools are being used by corporations.  ( 8 min )

  • Open

    Implementing Gradient Descent in PyTorch
    The gradient descent algorithm is one of the most popular techniques for training deep neural networks. It has many applications in fields such as computer vision, speech recognition, and natural language processing. While the idea of gradient descent has been around for decades, it’s only recently that it’s been applied to applications related to deep […] The post Implementing Gradient Descent in PyTorch appeared first on MachineLearningMastery.com.  ( 25 min )

  • Open

    Training a Linear Regression Model in PyTorch
    Linear regression is a simple yet powerful technique for predicting the values of variables based on other variables. It is often used for modeling relationships between two or more continuous variables, such as the relationship between income and age, or the relationship between weight and height. Likewise, linear regression can be used to predict continuous […] The post Training a Linear Regression Model in PyTorch appeared first on MachineLearningMastery.com.  ( 24 min )
    Making Linear Predictions in PyTorch
    Linear regression is a statistical technique for estimating the relationship between two variables. A simple example of linear regression is to predict the height of someone based on the square root of the person’s weight (that’s what BMI is based on). To do this, we need to find the slope and intercept of the line. […] The post Making Linear Predictions in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Loading and Providing Datasets in PyTorch
    Structuring the data pipeline in a way that it can be effortlessly linked to your deep learning model is an important aspect of any deep learning-based system. PyTorch packs everything to do just that. While in the previous tutorial, we used simple datasets, we’ll need to work with larger datasets in real world scenarios in […] The post Loading and Providing Datasets in PyTorch appeared first on MachineLearningMastery.com.  ( 20 min )

  • Open

    Using Dataset Classes in PyTorch
    In machine learning and deep learning problems, a lot of effort goes into preparing the data. Data is usually messy and needs to be preprocessed before it can be used for training a model. If the data is not prepared correctly, the model won’t be able to generalize well. Some of the common steps required […] The post Using Dataset Classes in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Calculating Derivatives in PyTorch
    Derivatives are one of the most fundamental concepts in calculus. They describe how changes in the variable inputs affect the function outputs. The objective of this article is to provide a high-level introduction to calculating derivatives in PyTorch for those who are new to the framework. PyTorch offers a convenient way to calculate derivatives for […] The post Calculating Derivatives in PyTorch appeared first on Machine Learning Mastery.  ( 20 min )

  • Open

    Two-Dimensional Tensors in Pytorch
    Two-dimensional tensors are analogous to two-dimensional metrics. Like a two-dimensional metric, a two-dimensional tensor also has $n$ number of rows and columns. Let’s take a gray-scale image as an example, which is a two-dimensional matrix of numeric values, commonly known as pixels. Ranging from ‘0’ to ‘255’, each number represents a pixel intensity value. Here, […] The post Two-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 21 min )

  • Open

    One-Dimensional Tensors in Pytorch
    PyTorch is an open-source deep learning framework based on Python language. It allows you to build, train, and deploy deep learning models, offering a lot of versatility and efficiency. PyTorch is primarily focused on tensor operations while a tensor can be a number, matrix, or a multi-dimensional array. In this tutorial, we will perform some […] The post One-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 22 min )

  • Open

    365 Data Science courses free until November 21
    Sponsored Post   The unlimited access initiative presents a risk-free way to break into data science.     The online educational platform 365 Data Science launches the #21DaysFREE campaign and provides 100% free unlimited access to all content for three weeks. From November 1 to 21, you can take courses from renowned instructors and earn […] The post 365 Data Science courses free until November 21 appeared first on Machine Learning Mastery.  ( 15 min )

  • Open

    Attend the Data Science Symposium 2022, November 8 in Cincinnati
    Sponsored Post      Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […] The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.  ( 10 min )

  • Open

    My family's unlikely homeschooling journey
    My husband Jeremy and I never intended to homeschool, and yet we have now, unexpectedly, committed to homeschooling long-term. Prior to the pandemic, we both worked full-time in careers that we loved and found meaningful, and we sent our daughter to a full-day Montessori school. Although I struggled with significant health issues, I felt unbelievably lucky and fulfilled in both my family life and my professional life. The pandemic upended my careful balance. Every family is different, with different needs, circumstances, and constraints, and what works for one may not work for others. My intention here is primarily to share the journey of my own (very privileged) family. Our unplanned introduction to homeschooling For the first year of the pandemic, most schools in California, where …  ( 7 min )

  • Open

    The Jupyter+git problem is now solved
    Jupyter notebooks don’t work with git by default. With nbdev2, the Jupyter+git problem has been totally solved. It provides a set of hooks which provide clean git diffs, solve most git conflicts automatically, and ensure that any remaining conflicts can be resolved entirely within the standard Jupyter notebook environment. To get started, follow the directions on Git-friendly Jupyter. Contents The Jupyter+git problem The solution The nbdev2 git merge driver The nbdev2 Jupyter save hook Background The result Postscript: other Jupyter+git tools ReviewNB An alternative solution: Jupytext nbdime The Jupyter+git problem Jupyter notebooks are a powerful tool for scientists, engineers, technical writers, students, teachers, and more. They provide an ideal notebook environment for interact…  ( 7 min )
2023-04-09T00:51:26.596Z osmosfeed 1.15.1